升级到最新的 docker (18.09.0) 和 kubernetes (1.12.2) 后,我的 Kubernetes 节点在部署重新启动的安全更新时中断containerd
。
我有:/etc/docker/daemon.json
:
{
"storage-driver": "overlay2",
"live-restore": true
}
这足以允许docker在过去重新启动而无需重新启动pod。
Kubelet 启动如下:
/usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=cgroupfs --cni-bin-dir=/opt/cni/bin --cni-conf-dir=/etc/cni/net.d --network-plugin=cni --fail-swap-on=false --feature-gates=PodPriority=true
现在,重新启动containerd
将保留旧的 pod,但也会在新的containerd
进程下重新创建它们。
重启前的初始情况:
/usr/bin/containerd
/usr/bin/dockerd -H unix://
\_ containerd --config /var/run/docker/containerd/containerd.toml --log-level info
\_ containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/1a36f40f3c3531d13b8bc493049a1900662822e01e2c911f8
| \_ /usr/bin/dumb-init /bin/bash /entrypoint.sh /nginx-ingress-controller --default-backend-service=infra/old-nginx-php --election-id=ingress-controller-leader
| \_ /bin/bash /entrypoint.sh /nginx-ingress-controller --default-backend-service=infra/old-nginx-php --election-id=ingress-controller-leader --ingress-class
| \_ /nginx-ingress-controller --default-backend-service=infra/old-nginx-php --election-id=ingress-controller-leader --ingress-class=nginx-php --configma
| \_ nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
| \_ nginx: worker process
| \_ nginx: worker process
\_ containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/c9a82204115c50788d132aa6c11735d90412dacb48a219d31
| \_ /usr/local/bin/kube-proxy --config=/var/lib/kube-proxy/config.conf
\_ containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/3004e3fa5f7e2b45865c6cc33abb884d9140af16f2594a11d
| \_ /sbin/runsvdir -P /etc/service/enabled
| \_ runsv bird
| | \_ bird -R -s /var/run/calico/bird.ctl -d -c /etc/calico/confd/config/bird.cfg
| \_ runsv bird6
| | \_ bird6 -R -s /var/run/calico/bird6.ctl -d -c /etc/calico/confd/config/bird6.cfg
| \_ runsv confd
| | \_ calico-node -confd
| \_ runsv felix
| \_ calico-node -felix
\_ containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/1f3c48e28c7fde2f67c40d5641abfa9a29e3dfcbc436321f6
| \_ /bin/sh /install-cni.sh
| \_ sleep 10
\_ containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/8371571ce29be4959951cf8ad70e57aa1f4a146f5ca43435b
\_ /coredns -conf /etc/coredns/Corefile
重启 containerd/docker 后,这些旧容器找不到了,它们都在新containerd
进程下重新创建。这为所有 pod 提供了重复的进程!
看起来,containerd 完全忘记了旧容器,因为killall containerd-shim
,它不会仅仅杀死那些旧容器,而只是重新设置 init 下的子容器的父级:
/usr/bin/dumb-init /bin/bash /entrypoint.sh /nginx-ingress-controller --default-backend-service=infra/old-nginx-php --election-id=ingress-controller-leader --ingress-cl
\_ /bin/bash /entrypoint.sh /nginx-ingress-controller --default-backend-service=infra/old-nginx-php --election-id=ingress-controller-leader --ingress-class=nginx-php -
\_ /nginx-ingress-controller --default-backend-service=infra/old-nginx-php --election-id=ingress-controller-leader --ingress-class=nginx-php --configmap=infra/phpi
\_ nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
\_ nginx: worker process
\_ nginx: worker process
/usr/local/bin/kube-proxy --config=/var/lib/kube-proxy/config.conf
/sbin/runsvdir -P /etc/service/enabled
\_ bird -R -s /var/run/calico/bird.ctl -d -c /etc/calico/confd/config/bird.cfg
\_ bird6 -R -s /var/run/calico/bird6.ctl -d -c /etc/calico/confd/config/bird6.cfg
/bin/sh /install-cni.sh
\_ sleep 10
显然,如果旧的 calico 和 nginx 仍然存在,主机端口就会继续使用,因此新的 pod 无法启动,节点将完全无法使用。手动终止所有旧进程或重新启动似乎是唯一的选择。
是否需要一些新设置来确保 kubelet 找到那些旧containerd
实例?还是因为containerd
docker 启动了一个全局和版本?
答案1
我昨天遇到了同样的问题,在容器重启后,我也无法执行正在运行的 pod。问题出在 docker 本身。
一旦容器重新启动,docker 守护进程仍会尝试根据旧套接字句柄处理事件流。此后,当客户端无法连接到容器时的错误处理会导致机器上的 CPU 峰值。
从这种情况中恢复的唯一方法是 docker restart(systemctl restart docker)。
该问题已通过以下票据修复:
https://github.com/moby/moby/pull/36173
希望这可以帮助。