为什么 argocd 不断重新同步我的作业?

为什么 argocd 不断重新同步我的作业?

我正在部署一个应用程序使用 ArgoCD。部署清单包括工作为应用程序执行一些一次性初始化。Job 资源如下所示:

apiVersion: batch/v1
kind: Job
metadata:
  labels:
    app.kubernetes.io/instance: house
    app.kubernetes.io/name: step-certificates
  name: create-acme-provisioner
  namespace: step-certificates
spec:
  backoffLimit: 100
  template:
    metadata:
      labels:
        app.kubernetes.io/instance: house
        app.kubernetes.io/name: step-certificates
    spec:
      containers:
      - command:
        - /bin/bash
        - -c
        - |
          while ! step ca health; do
            echo "waiting for ca"
            sleep 1
          done

          if ! step ca provisioner list | grep -q '"name": "acme"'; then
            step ca provisioner add acme --type ACME \
              --admin-subject step \
              --password-file /home/step/secrets/passwords/password \
              --admin-provisioner "Admin JWK"
          fi
        image: cr.step.sm/smallstep/step-ca:0.22.1
        name: create-acme-provisioner
        volumeMounts:
        - mountPath: /home/step/certs
          name: certs
          readOnly: true
        - mountPath: /home/step/config
          name: config
          readOnly: true
        - mountPath: /home/step/secrets
          name: secrets
          readOnly: true
        - mountPath: /home/step/secrets/passwords
          name: ca-password
          readOnly: true
      restartPolicy: Never
      securityContext:
        fsGroup: 1000
        runAsGroup: 1000
        runAsNonRoot: true
        runAsUser: 1000
      volumes:
      - configMap:
          name: step-certificates-certs
        name: certs
      - configMap:
          name: step-certificates-config
        name: config
      - name: secrets
        secret:
          secretName: step-certificates-secrets
      - name: ca-password
        secret:
          secretName: step-certificates-ca-password
  ttlSecondsAfterFinished: 60

它按预期工作 - 在主应用程序启动时它会失败几次,但随后它运行,并且一切看起来都很好:

$ kubectl get pods
NAME                            READY   STATUS      RESTARTS   AGE
create-acme-provisioner-7zhp2   0/1     Completed   0          12s
step-certificates-0             2/2     Running     0          54m
$ kubectl get jobs
NAME                      COMPLETIONS   DURATION   AGE
create-acme-provisioner   1/1           3s         20s

问题在于 ArgoCD 每分钟都会重新同步作业资源,因此作业会再次运行……一次又一次……等等。来自 argocd-application-controller pod 的日志如下所示:

time="2022-09-30T16:20:42Z" level=info msg="Initialized new operation: {&SyncOperation{Revision:114442fcfb789190cfb9e7353a636369e7113c01,Prune:true,DryRun:false,SyncStrategy:nil,Resources:[]SyncOperationResource{SyncOperationResource{Group:batch,Kind:Job,Name:create-acme-provisioner,Namespace:,},},Source:nil,Manifests:[],SyncOptions:[CreateNamespace=true],} { true} [] {-1 &Backoff{Duration:30s,Factor:*2,MaxDuration:10m,}}}" application=step-certificates-infra
time="2022-09-30T16:20:42Z" level=info msg="Tasks (dry-run)" application=step-certificates-infra syncId=00259-Dpgma tasks="[Sync/0 resource batch/Job:step-certificates/create-acme-provisioner nil->obj (,,)]"
time="2022-09-30T16:20:42Z" level=info msg="Applying resource Job/create-acme-provisioner in cluster: https://10.96.0.1:443, namespace: step-certificates"
time="2022-09-30T16:20:42Z" level=info msg="Applying resource Job/create-acme-provisioner in cluster: https://10.96.0.1:443, namespace: step-certificates"
time="2022-09-30T16:20:42Z" level=info msg="Adding resource result, status: 'Synced', phase: 'Running', message: 'job.batch/create-acme-provisioner created'" application=step-certificates-infra kind=Job name=create-acme-provisioner namespace=step-certificates phase=Sync syncId=00259-Dpgma
time="2022-09-30T16:21:45Z" level=info msg="Initialized new operation: {&SyncOperation{Revision:114442fcfb789190cfb9e7353a636369e7113c01,Prune:true,DryRun:false,SyncStrategy:nil,Resources:[]SyncOperationResource{SyncOperationResource{Group:batch,Kind:Job,Name:create-acme-provisioner,Namespace:,},},Source:nil,Manifests:[],SyncOptions:[CreateNamespace=true],} { true} [] {-1 &Backoff{Duration:30s,Factor:*2,MaxDuration:10m,}}}" application=step-certificates-infra
time="2022-09-30T16:21:45Z" level=info msg="Tasks (dry-run)" application=step-certificates-infra syncId=00260-KsLXq tasks="[Sync/0 resource batch/Job:step-certificates/create-acme-provisioner nil->obj (,,)]"
time="2022-09-30T16:21:45Z" level=info msg="Applying resource Job/create-acme-provisioner in cluster: https://10.96.0.1:443, namespace: step-certificates"
time="2022-09-30T16:21:45Z" level=info msg="Applying resource Job/create-acme-provisioner in cluster: https://10.96.0.1:443, namespace: step-certificates"
time="2022-09-30T16:21:45Z" level=info msg="Adding resource result, status: 'Synced', phase: 'Running', message: 'job.batch/create-acme-provisioner created'" application=step-certificates-infra kind=Job name=create-acme-provisioner namespace=step-certificates phase=Sync syncId=00260-KsLXq
time="2022-09-30T16:22:49Z" level=info msg="Initialized new operation: {&SyncOperation{Revision:114442fcfb789190cfb9e7353a636369e7113c01,Prune:true,DryRun:false,SyncStrategy:nil,Resources:[]SyncOperationResource{SyncOperationResource{Group:batch,Kind:Job,Name:create-acme-provisioner,Namespace:,},},Source:nil,Manifests:[],SyncOptions:[CreateNamespace=true],} { true} [] {-1 &Backoff{Duration:30s,Factor:*2,MaxDuration:10m,}}}" application=step-certificates-infra
time="2022-09-30T16:22:49Z" level=info msg="Tasks (dry-run)" application=step-certificates-infra syncId=00261-itFqU tasks="[Sync/0 resource batch/Job:step-certificates/create-acme-provisioner nil->obj (,,)]"
time="2022-09-30T16:22:49Z" level=info msg="Applying resource Job/create-acme-provisioner in cluster: https://10.96.0.1:443, namespace: step-certificates"
time="2022-09-30T16:22:49Z" level=info msg="Applying resource Job/create-acme-provisioner in cluster: https://10.96.0.1:443, namespace: step-certificates"
time="2022-09-30T16:22:49Z" level=info msg="Adding resource result, status: 'Synced', phase: 'Running', message: 'job.batch/create-acme-provisioner created'" application=step-certificates-infra kind=Job name=create-acme-provisioner namespace=step-certificates phase=Sync syncId=00261-itFqU

为什么 ArgoCD 要重新同步此资源,我该如何让它停止?

答案1

我知道发生了什么事。

该作业配置为ttlSecondsAfterFinished,已记录这里。我误读了文档,以为这会清理该作业创建的 Pod,但实际上它会导致作业本身被删除。

由于该作业由 ArgoCD 管理,因此当由于ttlSecondsAfterFinished设置而删除它时,ArgoCD 会提示重新创建它。

正如 @SYN 在评论中所建议的那样,另一种解决方案是将作业配置为具有以下内容的 ArgoCD PostSync 钩子hook-delete-policy

apiVersion: batch/v1
kind: Job
metadata:
  name: create-acme-provisioner
  annotations:
    argocd.argoproj.io/hook: PostSync
    argocd.argoproj.io/hook-delete-policy: HookSucceeded
spec:

当 ArgoCD 成功同步应用程序时,它将创建此作业,当作业成功时,ArgoCD 将删除它。

这意味着该作业在每次同步时运行一次,但这没问题。它不再每 60 秒运行一次。

相关内容