我定义了一个测试工作:
apiVersion: batch/v1
kind: Job
metadata:
name: testjob
spec:
activeDeadlineSeconds: 100
backoffLimit: 3
template:
spec:
containers:
- name: testjob
image: bitnami/kubectl:1.20
imagePullPolicy: IfNotPresent
command:
- /bin/sh
- -c
- echo "Test" && exit 1
restartPolicy: Never
所有的 pod 都“正常”失败,但是作业的持续时间计数器不会停止。
$ kubectl get pods,jobs
NAME READY STATUS RESTARTS AGE
pod/testjob-s2cbf 0/1 Error 0 3m15s
pod/testjob-nhfgn 0/1 Error 0 3m14s
pod/testjob-8jw74 0/1 Error 0 3m4s
pod/testjob-jh7hl 0/1 Error 0 2m24s
NAME COMPLETIONS DURATION AGE
job.batch/testjob 0/1 3m15s 3m15s
$ kubectl describe job testjob
Name: testjob
Namespace: default
Selector: controller-uid=8a1f31c7-8d9d-4b4d-a687-e8e297509a71
Labels: controller-uid=8a1f31c7-8d9d-4b4d-a687-e8e297509a71
job-name=testjob
Annotations: <none>
Parallelism: 1
Completions: 1
Start Time: Wed, 17 Mar 2021 18:13:56 +0000
Active Deadline Seconds: 100s
Pods Statuses: 0 Running / 0 Succeeded / 4 Failed
Pod Template:
Labels: controller-uid=8a1f31c7-8d9d-4b4d-a687-e8e297509a71
job-name=testjob
Containers:
testjob:
Image: bitnami/kubectl:1.20
Port: <none>
Host Port: <none>
Command:
/bin/sh
-c
echo "Test" && exit 1
Environment: <none>
Mounts: <none>
Volumes: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 4m11s job-controller Created pod: testjob-s2cbf
Normal SuccessfulCreate 4m10s job-controller Created pod: testjob-nhfgn
Normal SuccessfulCreate 4m job-controller Created pod: testjob-8jw74
Normal SuccessfulCreate 3m20s job-controller Created pod: testjob-jh7hl
Warning BackoffLimitExceeded 2m job-controller Job has reached the specified backoff limit
但是,如果其中一个 pod 成功完成(状态:已完成),则持续时间计数器将按预期停止。
这里有什么问题?
答案1
如果作业成功完成(type=Complete
),它将.status.completionTime
被设置为特定日期。如果作业为Failed
(type=Failed
),则它.status.completionTime
根本没有设置,因此DURATION
会不断增加(说实话,我不确定这是否是一个错误)。
我创建了一个简单的示例来说明它是如何工作的。
我有两份工作:(testjob
)type=Failed
和testjob-2
(type=Complete
):
$ kubectl get jobs
NAME COMPLETIONS DURATION AGE
testjob 0/1 3m15s 3m15s
testjob-2 1/1 1s 2m49s
我们可以使用以下选项显示更多信息-o custom-columns=
:
笔记:正如您所见,.status.completionTime
没有为失败的作业设置。
$ kubectl get jobs testjob testjob-2 -o custom-columns=NAME:.metadata.name,TYPE:.status.conditions[].type,REASON:.status.conditions[].reason,COMPLETIONTIME:.status.completionTime
NAME TYPE REASON COMPLETIONTIME
testjob Failed BackoffLimitExceeded <none>
testjob-2 Complete <none> 2021-03-23T15:51:33Z
此外,您还可以在 Github 上找到有用的信息:作业状态的 API 文档。