作为我的 CI 测试逻辑的一部分,我有一个脚本,它为一组专用节点中的每一个创建一个 Kubernetes 部署文件,删除它们上的任何先前部署,然后启动新的部署。(配置附加在底部,因为它可能并不重要。)一旦我用它们运行测试,它们就会关闭,为下一次测试运行做好准备。节点只运行我的部署,所以我不必费心声明它们需要多少 CPU/内存/任何东西,我也没有任何就绪脚本,因为容器自己处理这些东西,我只需要与状态监控服务通信,一旦它有 IP 地址。
通常它们在一分钟左右的时间内就准备就绪并开始工作 - 我的脚本监视以下命令的输出,直到没有任何内容报告“false” - 但有时它们不会在我允许的时间内启动:如果出现问题,我不想等待不确定的时间 - 我需要收集地址以将它们提供给下游进程,以便使用已完成的部署设置我的测试 - 但如果 kubernetes 不能向我展示有意义的进展或对为什么事情进展缓慢的诊断,我只能中止不完整的部署。
kubectl get pods -l pod-agent=$AGENT_NAME \
-o 'jsonpath={range .items[*]}{..status.conditions[?(@.type=="Ready")].status}:{.status.podIP}:{.status.phase}:{.metadata.name} '
我推测可能是其中一个容器之前没有在该主机上使用过,而且可能将其复制到每个主机花费了太长时间,导致整体部署超出了我的脚本的超时时间,因此我添加了这个(忽略 | cat | - 这是 IntelliJ 终端错误的解决方法)
kubectl describe pod $REPLY | cat | sed -n '/Events:/,$p; /emulator.*:/,/Ready:/p'
为了让我知道每个 pod 在做什么,每次第一个命令都返回“false”,但我得到的结果看起来不一致:虽然“事件”部分声称容器已被拉取并启动,但同一命令的结构化输出显示容器为“ContainerCreating”:
1 False::Pending:kubulator-mysh-automation11-dlan-666b96d788-6gfl7
emulator-5554:
Container ID:
Image: dockerio.dlan/auto/android-avd-10a29v8-emu29_0_11_kuber-snapshot-skin_name-540x1060-hw_lcd_density-240
Image ID:
Port: <none>
Host Port: <none>
State: Waiting
Reason: ContainerCreating
Ready: False
emulator-5556:
Container ID:
Image: dockerio.dlan/auto/android-avd-10a29v8-emu29_0_11_kuber-snapshot-skin_name-540x1060-hw_lcd_density-240
Image ID:
Port: <none>
Host Port: <none>
State: Waiting
Reason: ContainerCreating
Ready: False
..更多相同的,然后
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 23s default-scheduler Successfully assigned auto/kubulator-mysh-automation11-dlan-666b96d788-6gfl7 to automation11.dlan
Normal Pulling 16s kubelet, automation11.dlan Pulling image "dockerio.dlan/auto/ticket-machine"
Normal Pulled 16s kubelet, automation11.dlan Successfully pulled image "dockerio.dlan/auto/ticket-machine"
Normal Created 16s kubelet, automation11.dlan Created container ticket-machine
Normal Started 16s kubelet, automation11.dlan Started container ticket-machine
Normal Pulling 16s kubelet, automation11.dlan Pulling image "dockerio.dlan/qa/cgi-bin-remote"
Normal Created 15s kubelet, automation11.dlan Created container cgi-adb-remote
Normal Pulled 15s kubelet, automation11.dlan Successfully pulled image "dockerio.dlan/qa/cgi-bin-remote"
Normal Started 15s kubelet, automation11.dlan Started container cgi-adb-remote
Normal Pulling 15s kubelet, automation11.dlan Pulling image "dockerio.dlan/auto/android-avd-10a29v8-emu29_0_11_kuber-snapshot-skin_name-540x1060-hw_lcd_density-240"
Normal Pulled 15s kubelet, automation11.dlan Successfully pulled image "dockerio.dlan/auto/android-avd-10a29v8-emu29_0_11_kuber-snapshot-skin_name-540x1060-hw_lcd_density-240"
Normal Created 15s kubelet, automation11.dlan Created container emulator-5554
Normal Started 15s kubelet, automation11.dlan Started container emulator-5554
Normal Pulled 15s kubelet, automation11.dlan Successfully pulled image "dockerio.dlan/auto/android-avd-10a29v8-emu29_0_11_kuber-snapshot-skin_name-540x1060-hw_lcd_density-240"
Normal Pulling 15s kubelet, automation11.dlan Pulling image "dockerio.dlan/auto/android-avd-10a29v8-emu29_0_11_kuber-snapshot-skin_name-540x1060-hw_lcd_density-240"
Normal Created 14s kubelet, automation11.dlan Created container emulator-5556
Normal Started 14s kubelet, automation11.dlan Started container emulator-5556
Normal Pulling 14s kubelet, automation11.dlan Pulling image "dockerio.dlan/auto/android-avd-10a29v8-emu29_0_11_kuber-snapshot-skin_name-540x1060-hw_lcd_density-240"
因此,事件声称容器已启动,但结构化数据与之相矛盾。我会将事件视为权威,但尽管服务器未设置任何事件速率限制配置,但它们在前 26 个条目处被截断,这相当奇怪。
我在最后添加了一个容器的完整描述,事件声称该容器已经“启动”,但是在完整输出中我没有看到任何线索。
一旦部署开始 - 即第一行显示“true”,所有容器都会突然显示为“正在运行”。
因此,我的基本问题是,我如何确定我的部署的实际状态 - 显然以“事件”为代表 - 以了解为什么以及它在失败时卡在哪里,因为它describe pod
显然不可靠和/或不完整?
除了“kubectl get pods”之外,还有什么可以用来找到真实状态的东西吗?(最好不要使用像 ssh 到服务器并嗅探其原始日志这样复杂的东西。)
谢谢。
kubectl version 客户端版本:version.Info{Major:“1”,Minor:“16”,GitVersion:“v1.16.3”,GitCommit:“b3cbbae08ec52a7fc73d334838e18d17e8512749”,GitTreeState:“clean”,BuildDate:“2019-11-13T11:23:11Z”,GoVersion:“go1.12.12”,编译器:“gc”,平台:“linux / amd64”} 服务器版本:version.Info{Major:“1”,Minor:“15”,GitVersion:“v1.15.3”,GitCommit:“2d3c76f9091b6bec110a5e63777c332469e0cba2”,GitTreeState:“clean”, BuildDate:“2019-08-19T11:05:50Z”,GoVersion:“go1.12.9”,编译器:“gc”,平台:“linux / amd64”}
我的部署文件:
apiVersion: v1
kind: Service
metadata:
name: kubulator-mysh-automation11-dlan
labels:
run: kubulator-mysh-automation11-dlan
pod-agent: mysh
spec:
type: ClusterIP
clusterIP: None
ports:
- name: http
protocol: TCP
port: 8088
targetPort: 8088
- name: adb-remote
protocol: TCP
port: 8080
targetPort: 8080
- name: adb
protocol: TCP
port: 9100
targetPort: 9100
selector:
run: kubulator-mysh-automation11-dlan
kubernetes.io/hostname: automation11.dlan
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: kubulator-mysh-automation11-dlan
labels:
pod-agent: mysh
spec:
selector:
matchLabels:
run: kubulator-mysh-automation11-dlan
pod-agent: mysh
replicas: 1
template:
metadata:
labels:
run: kubulator-mysh-automation11-dlan
pod-agent: mysh
spec:
nodeSelector:
kubernetes.io/hostname: automation11.dlan
volumes:
- name: dev-kvm
hostPath:
path: /dev/kvm
type: CharDevice
- name: logs
emptyDir: {}
containers:
- name: ticket-machine
image: dockerio.dlan/auto/ticket-machine
args: ['--', '--count', '20'] # --adb /local/adb-....
imagePullPolicy: Always
volumeMounts:
- mountPath: /logs
name: logs
ports:
- containerPort: 8088
env:
- name: ANDROID_ADB_SERVER_PORT
value: "9100"
- name: ANDROID_ADB_SERVER
value: host
- name: cgi-adb-remote
image: dockerio.dlan/qa/cgi-bin-remote
args: ['/root/git/CgiAdbRemote/CgiAdbRemote.pl', '-foreground', '-port=8080', "-adb=/root/adb-8aug-usbbus-maxemu-v39"]
imagePullPolicy: Always
ports:
- containerPort: 8080
env:
- name: ADB_SERVER_SOCKET
value: "tcp:localhost:9100"
- name: ANDROID_ADB_SERVER
value: host
- name: emulator-5554
image: dockerio.dlan/auto/android-avd-10a29v8-emu29_0_11_kuber-snapshot-skin_name-540x1060-hw_lcd_density-240
imagePullPolicy: Always
securityContext:
privileged: true
volumeMounts:
- mountPath: /logs
name: logs
- mountPath: /dev/kvm
name: dev-kvm
env:
- name: ANDROID_ADB_VERSION
value: v39
- name: ANDROID_ADB_SERVER_PORT
value: '9100'
- name: EMULATOR_PORT
value: '5554'
- name: EMULATOR_MAX_SECS
value: '2400'
- name: ANDROID_ADB_SERVER
value: host
- name: EMU_WINDOW
value: '2'
- name: emulator-5556
image: dockerio.dlan/auto/android-avd-10a29v8-emu29_0_11_kuber-snapshot-skin_name-540x1060-hw_lcd_density-240
... etc - several more of these emulator containers.
对通过事件声明为“已启动”的容器的完整“描述”如下:
emulator-5554:
Container ID:
Image: dockerio.dlan/auto/android-avd-10a29v8-emu29_0_11_kuber-snapshot-skin_name-540x1060-hw_lcd_density-240
Image ID:
Port: <none>
Host Port: <none>
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment:
ANDROID_ADB_VERSION: v39
ANDROID_ADB_SERVER_PORT: 9100
EMULATOR_PORT: 5554
EMULATOR_MAX_SECS: 2400
ANDROID_ADB_SERVER: host
EMU_WINDOW: 2
Mounts:
/dev/kvm from dev-kvm (rw)
/logs from logs (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-2jrv5 (ro)
答案1
您可以使用kubectl 等待暂停测试执行,直到 pod 处于Ready
状态为止。
请记住,如果您没有为您的应用程序使用就绪概率,那么 pod 处于某种Ready
状态并不意味着您的应用程序实际上已准备好接收流量,这可能会使您的测试变得不稳定。