我正在部署一个应用程序使用 ArgoCD。部署清单包括工作为应用程序执行一些一次性初始化。Job 资源如下所示:
apiVersion: batch/v1
kind: Job
metadata:
labels:
app.kubernetes.io/instance: house
app.kubernetes.io/name: step-certificates
name: create-acme-provisioner
namespace: step-certificates
spec:
backoffLimit: 100
template:
metadata:
labels:
app.kubernetes.io/instance: house
app.kubernetes.io/name: step-certificates
spec:
containers:
- command:
- /bin/bash
- -c
- |
while ! step ca health; do
echo "waiting for ca"
sleep 1
done
if ! step ca provisioner list | grep -q '"name": "acme"'; then
step ca provisioner add acme --type ACME \
--admin-subject step \
--password-file /home/step/secrets/passwords/password \
--admin-provisioner "Admin JWK"
fi
image: cr.step.sm/smallstep/step-ca:0.22.1
name: create-acme-provisioner
volumeMounts:
- mountPath: /home/step/certs
name: certs
readOnly: true
- mountPath: /home/step/config
name: config
readOnly: true
- mountPath: /home/step/secrets
name: secrets
readOnly: true
- mountPath: /home/step/secrets/passwords
name: ca-password
readOnly: true
restartPolicy: Never
securityContext:
fsGroup: 1000
runAsGroup: 1000
runAsNonRoot: true
runAsUser: 1000
volumes:
- configMap:
name: step-certificates-certs
name: certs
- configMap:
name: step-certificates-config
name: config
- name: secrets
secret:
secretName: step-certificates-secrets
- name: ca-password
secret:
secretName: step-certificates-ca-password
ttlSecondsAfterFinished: 60
它按预期工作 - 在主应用程序启动时它会失败几次,但随后它运行,并且一切看起来都很好:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
create-acme-provisioner-7zhp2 0/1 Completed 0 12s
step-certificates-0 2/2 Running 0 54m
$ kubectl get jobs
NAME COMPLETIONS DURATION AGE
create-acme-provisioner 1/1 3s 20s
问题在于 ArgoCD 每分钟都会重新同步作业资源,因此作业会再次运行……一次又一次……等等。来自 argocd-application-controller pod 的日志如下所示:
time="2022-09-30T16:20:42Z" level=info msg="Initialized new operation: {&SyncOperation{Revision:114442fcfb789190cfb9e7353a636369e7113c01,Prune:true,DryRun:false,SyncStrategy:nil,Resources:[]SyncOperationResource{SyncOperationResource{Group:batch,Kind:Job,Name:create-acme-provisioner,Namespace:,},},Source:nil,Manifests:[],SyncOptions:[CreateNamespace=true],} { true} [] {-1 &Backoff{Duration:30s,Factor:*2,MaxDuration:10m,}}}" application=step-certificates-infra
time="2022-09-30T16:20:42Z" level=info msg="Tasks (dry-run)" application=step-certificates-infra syncId=00259-Dpgma tasks="[Sync/0 resource batch/Job:step-certificates/create-acme-provisioner nil->obj (,,)]"
time="2022-09-30T16:20:42Z" level=info msg="Applying resource Job/create-acme-provisioner in cluster: https://10.96.0.1:443, namespace: step-certificates"
time="2022-09-30T16:20:42Z" level=info msg="Applying resource Job/create-acme-provisioner in cluster: https://10.96.0.1:443, namespace: step-certificates"
time="2022-09-30T16:20:42Z" level=info msg="Adding resource result, status: 'Synced', phase: 'Running', message: 'job.batch/create-acme-provisioner created'" application=step-certificates-infra kind=Job name=create-acme-provisioner namespace=step-certificates phase=Sync syncId=00259-Dpgma
time="2022-09-30T16:21:45Z" level=info msg="Initialized new operation: {&SyncOperation{Revision:114442fcfb789190cfb9e7353a636369e7113c01,Prune:true,DryRun:false,SyncStrategy:nil,Resources:[]SyncOperationResource{SyncOperationResource{Group:batch,Kind:Job,Name:create-acme-provisioner,Namespace:,},},Source:nil,Manifests:[],SyncOptions:[CreateNamespace=true],} { true} [] {-1 &Backoff{Duration:30s,Factor:*2,MaxDuration:10m,}}}" application=step-certificates-infra
time="2022-09-30T16:21:45Z" level=info msg="Tasks (dry-run)" application=step-certificates-infra syncId=00260-KsLXq tasks="[Sync/0 resource batch/Job:step-certificates/create-acme-provisioner nil->obj (,,)]"
time="2022-09-30T16:21:45Z" level=info msg="Applying resource Job/create-acme-provisioner in cluster: https://10.96.0.1:443, namespace: step-certificates"
time="2022-09-30T16:21:45Z" level=info msg="Applying resource Job/create-acme-provisioner in cluster: https://10.96.0.1:443, namespace: step-certificates"
time="2022-09-30T16:21:45Z" level=info msg="Adding resource result, status: 'Synced', phase: 'Running', message: 'job.batch/create-acme-provisioner created'" application=step-certificates-infra kind=Job name=create-acme-provisioner namespace=step-certificates phase=Sync syncId=00260-KsLXq
time="2022-09-30T16:22:49Z" level=info msg="Initialized new operation: {&SyncOperation{Revision:114442fcfb789190cfb9e7353a636369e7113c01,Prune:true,DryRun:false,SyncStrategy:nil,Resources:[]SyncOperationResource{SyncOperationResource{Group:batch,Kind:Job,Name:create-acme-provisioner,Namespace:,},},Source:nil,Manifests:[],SyncOptions:[CreateNamespace=true],} { true} [] {-1 &Backoff{Duration:30s,Factor:*2,MaxDuration:10m,}}}" application=step-certificates-infra
time="2022-09-30T16:22:49Z" level=info msg="Tasks (dry-run)" application=step-certificates-infra syncId=00261-itFqU tasks="[Sync/0 resource batch/Job:step-certificates/create-acme-provisioner nil->obj (,,)]"
time="2022-09-30T16:22:49Z" level=info msg="Applying resource Job/create-acme-provisioner in cluster: https://10.96.0.1:443, namespace: step-certificates"
time="2022-09-30T16:22:49Z" level=info msg="Applying resource Job/create-acme-provisioner in cluster: https://10.96.0.1:443, namespace: step-certificates"
time="2022-09-30T16:22:49Z" level=info msg="Adding resource result, status: 'Synced', phase: 'Running', message: 'job.batch/create-acme-provisioner created'" application=step-certificates-infra kind=Job name=create-acme-provisioner namespace=step-certificates phase=Sync syncId=00261-itFqU
为什么 ArgoCD 要重新同步此资源,我该如何让它停止?
答案1
我知道发生了什么事。
该作业配置为ttlSecondsAfterFinished
,已记录这里。我误读了文档,以为这会清理该作业创建的 Pod,但实际上它会导致作业本身被删除。
由于该作业由 ArgoCD 管理,因此当由于ttlSecondsAfterFinished
设置而删除它时,ArgoCD 会提示重新创建它。
正如 @SYN 在评论中所建议的那样,另一种解决方案是将作业配置为具有以下内容的 ArgoCD PostSync 钩子hook-delete-policy
:
apiVersion: batch/v1
kind: Job
metadata:
name: create-acme-provisioner
annotations:
argocd.argoproj.io/hook: PostSync
argocd.argoproj.io/hook-delete-policy: HookSucceeded
spec:
当 ArgoCD 成功同步应用程序时,它将创建此作业,当作业成功时,ArgoCD 将删除它。
这意味着该作业在每次同步时运行一次,但这没问题。它不再每 60 秒运行一次。