在 kubernetes 部署“准备就绪”之前,如何确定其正在进行/准备状态?

在 kubernetes 部署“准备就绪”之前,如何确定其正在进行/准备状态?

作为我的 CI 测试逻辑的一部分,我有一个脚本,它为一组专用节点中的每一个创建一个 Kubernetes 部署文件,删除它们上的任何先前部署,然后启动新的部署。(配置附加在底部,因为它可能并不重要。)一旦我用它们运行测试,它们就会关闭,为下一次测试运行做好准备。节点只运行我的部署,所以我不必费心声明它们需要多少 CPU/内存/任何东西,我也没有任何就绪脚本,因为容器自己处理这些东西,我只需要与状态监控服务通信,一旦它有 IP 地址。

通常它们在一分钟左右的时间内就准备就绪并开始工作 - 我的脚本监视以下命令的输出,直到没有任何内容报告“false” - 但有时它们不会在我允许的时间内启动:如果出现问题,我不想等待不确定的时间 - 我需要收集地址以将它们提供给下游进程,以便使用已完成的部署设置我的测试 - 但如果 kubernetes 不能向我展示有意义的进展或对为什么事情进展缓慢的诊断,我只能中止不完整的部署。

kubectl get pods -l pod-agent=$AGENT_NAME \
      -o 'jsonpath={range .items[*]}{..status.conditions[?(@.type=="Ready")].status}:{.status.podIP}:{.status.phase}:{.metadata.name} '

我推测可能是其中一个容器之前没有在该主机上使用过,而且可能将其复制到每个主机花费了太长时间,导致整体部署超出了我的脚本的超时时间,因此我添加了这个(忽略 | cat | - 这是 IntelliJ 终端错误的解决方法)

kubectl describe pod $REPLY | cat | sed -n '/Events:/,$p; /emulator.*:/,/Ready:/p'

为了让我知道每个 pod 在做什么,每次第一个命令都返回“false”,但我得到的结果看起来不一致:虽然“事件”部分声称容器已被拉取并启动,但同一命令的结构化输出显示容器为“ContainerCreating”:

     1  False::Pending:kubulator-mysh-automation11-dlan-666b96d788-6gfl7
  emulator-5554:
    Container ID:   
    Image:          dockerio.dlan/auto/android-avd-10a29v8-emu29_0_11_kuber-snapshot-skin_name-540x1060-hw_lcd_density-240
    Image ID:       
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
  emulator-5556:
    Container ID:   
    Image:          dockerio.dlan/auto/android-avd-10a29v8-emu29_0_11_kuber-snapshot-skin_name-540x1060-hw_lcd_density-240
    Image ID:       
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False

..更多相同的,然后

Events:
  Type    Reason     Age   From                        Message
  ----    ------     ----  ----                        -------
  Normal  Scheduled  23s   default-scheduler           Successfully assigned auto/kubulator-mysh-automation11-dlan-666b96d788-6gfl7 to automation11.dlan
  Normal  Pulling    16s   kubelet, automation11.dlan  Pulling image "dockerio.dlan/auto/ticket-machine"
  Normal  Pulled     16s   kubelet, automation11.dlan  Successfully pulled image "dockerio.dlan/auto/ticket-machine"
  Normal  Created    16s   kubelet, automation11.dlan  Created container ticket-machine
  Normal  Started    16s   kubelet, automation11.dlan  Started container ticket-machine
  Normal  Pulling    16s   kubelet, automation11.dlan  Pulling image "dockerio.dlan/qa/cgi-bin-remote"
  Normal  Created    15s   kubelet, automation11.dlan  Created container cgi-adb-remote
  Normal  Pulled     15s   kubelet, automation11.dlan  Successfully pulled image "dockerio.dlan/qa/cgi-bin-remote"
  Normal  Started    15s   kubelet, automation11.dlan  Started container cgi-adb-remote
  Normal  Pulling    15s   kubelet, automation11.dlan  Pulling image "dockerio.dlan/auto/android-avd-10a29v8-emu29_0_11_kuber-snapshot-skin_name-540x1060-hw_lcd_density-240"
  Normal  Pulled     15s   kubelet, automation11.dlan  Successfully pulled image "dockerio.dlan/auto/android-avd-10a29v8-emu29_0_11_kuber-snapshot-skin_name-540x1060-hw_lcd_density-240"
  Normal  Created    15s   kubelet, automation11.dlan  Created container emulator-5554
  Normal  Started    15s   kubelet, automation11.dlan  Started container emulator-5554
  Normal  Pulled     15s   kubelet, automation11.dlan  Successfully pulled image "dockerio.dlan/auto/android-avd-10a29v8-emu29_0_11_kuber-snapshot-skin_name-540x1060-hw_lcd_density-240"
  Normal  Pulling    15s   kubelet, automation11.dlan  Pulling image "dockerio.dlan/auto/android-avd-10a29v8-emu29_0_11_kuber-snapshot-skin_name-540x1060-hw_lcd_density-240"
  Normal  Created    14s   kubelet, automation11.dlan  Created container emulator-5556
  Normal  Started    14s   kubelet, automation11.dlan  Started container emulator-5556
  Normal  Pulling    14s   kubelet, automation11.dlan  Pulling image "dockerio.dlan/auto/android-avd-10a29v8-emu29_0_11_kuber-snapshot-skin_name-540x1060-hw_lcd_density-240"

因此,事件声称容器已启动,但结构化数据与之相矛盾。我会将事件视为权威,但尽管服务器未设置任何事件速率限制配置,但它们在前 26 个条目处被截断,这相当奇怪。

我在最后添加了一个容器的完整描述,事件声称该容器已经“启动”,但是在完整输出中我没有看到任何线索。

一旦部署开始 - 即第一行显示“true”,所有容器都会突然显示为“正在运行”。

因此,我的基本问题是,我如何确定我的部署的实际状态 - 显然以“事件”为代表 - 以了解为什么以及它在失败时卡在哪里,因为它describe pod显然不可靠和/或不完整?

除了“kubectl get pods”之外,还有什么可以用来找到真实状态的东西吗?(最好不要使用像 ssh 到服务器并嗅探其原始日志这样复杂的东西。)

谢谢。

kubectl version 客户端版本:version.Info{Major:“1”,Minor:“16”,GitVersion:“v1.16.3”,GitCommit:“b3cbbae08ec52a7fc73d334838e18d17e8512749”,GitTreeState:“clean”,BuildDate:“2019-11-13T11:23:11Z”,GoVersion:“go1.12.12”,编译器:“gc”,平台:“linux / amd64”} 服务器版本:version.Info{Major:“1”,Minor:“15”,GitVersion:“v1.15.3”,GitCommit:“2d3c76f9091b6bec110a5e63777c332469e0cba2”,GitTreeState:“clean”, BuildDate:“2019-08-19T11:05:50Z”,GoVersion:“go1.12.9”,编译器:“gc”,平台:“linux / amd64”}


我的部署文件:

apiVersion: v1
kind: Service
metadata:
  name: kubulator-mysh-automation11-dlan
  labels:
    run: kubulator-mysh-automation11-dlan
    pod-agent: mysh
spec:
  type: ClusterIP
  clusterIP: None
  ports:
    - name: http
      protocol: TCP
      port: 8088
      targetPort: 8088
    - name: adb-remote
      protocol: TCP
      port: 8080
      targetPort: 8080
    - name: adb
      protocol: TCP
      port: 9100
      targetPort: 9100
  selector:
    run: kubulator-mysh-automation11-dlan
    kubernetes.io/hostname: automation11.dlan
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kubulator-mysh-automation11-dlan
  labels:
    pod-agent: mysh
spec:
  selector:
    matchLabels:
      run: kubulator-mysh-automation11-dlan
      pod-agent: mysh
  replicas: 1
  template:
    metadata:
      labels:
        run: kubulator-mysh-automation11-dlan
        pod-agent: mysh
    spec:
      nodeSelector:
        kubernetes.io/hostname: automation11.dlan
      volumes:
        - name: dev-kvm
          hostPath:
            path: /dev/kvm
            type: CharDevice
        - name: logs
          emptyDir: {}
      containers:
- name: ticket-machine
  image: dockerio.dlan/auto/ticket-machine
  args: ['--', '--count', '20']  # --adb /local/adb-....
  imagePullPolicy: Always
  volumeMounts:
    - mountPath: /logs
      name: logs
  ports:
    - containerPort: 8088
  env:
    - name: ANDROID_ADB_SERVER_PORT
      value: "9100"
    - name: ANDROID_ADB_SERVER
      value: host
- name: cgi-adb-remote
  image: dockerio.dlan/qa/cgi-bin-remote
  args: ['/root/git/CgiAdbRemote/CgiAdbRemote.pl', '-foreground', '-port=8080', "-adb=/root/adb-8aug-usbbus-maxemu-v39"]
  imagePullPolicy: Always
  ports:
    - containerPort: 8080
  env:
    - name: ADB_SERVER_SOCKET
      value: "tcp:localhost:9100"
    - name: ANDROID_ADB_SERVER
      value: host
- name: emulator-5554
  image: dockerio.dlan/auto/android-avd-10a29v8-emu29_0_11_kuber-snapshot-skin_name-540x1060-hw_lcd_density-240
  imagePullPolicy: Always
  securityContext:
    privileged: true
  volumeMounts:
    - mountPath: /logs
      name: logs
    - mountPath: /dev/kvm
      name: dev-kvm
  env:
    - name: ANDROID_ADB_VERSION
      value: v39
    - name: ANDROID_ADB_SERVER_PORT
      value: '9100'
    - name: EMULATOR_PORT
      value: '5554'
    - name: EMULATOR_MAX_SECS
      value: '2400'
    - name: ANDROID_ADB_SERVER
      value: host
    - name: EMU_WINDOW
      value: '2'
- name: emulator-5556
  image: dockerio.dlan/auto/android-avd-10a29v8-emu29_0_11_kuber-snapshot-skin_name-540x1060-hw_lcd_density-240
... etc - several more of these emulator containers.

对通过事件声明为“已启动”的容器的完整“描述”如下:

  emulator-5554:
    Container ID:   
    Image:          dockerio.dlan/auto/android-avd-10a29v8-emu29_0_11_kuber-snapshot-skin_name-540x1060-hw_lcd_density-240
    Image ID:       
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:
      ANDROID_ADB_VERSION:      v39
      ANDROID_ADB_SERVER_PORT:  9100
      EMULATOR_PORT:            5554
      EMULATOR_MAX_SECS:        2400
      ANDROID_ADB_SERVER:       host
      EMU_WINDOW:               2
    Mounts:
      /dev/kvm from dev-kvm (rw)
      /logs from logs (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-2jrv5 (ro)

答案1

您可以使用kubectl 等待暂停测试执行,直到 pod 处于Ready状态为止。

请记住,如果您没有为您的应用程序使用就绪概率,那么 pod 处于某种Ready状态并不意味着您的应用程序实际上已准备好接收流量,这可能会使您的测试变得不稳定。

相关内容