Kubernetes 节点遇到 DiskPressure 并发生 ImageGC 失败

Kubernetes 节点遇到 DiskPressure 并发生 ImageGC 失败

因此,最近我在本地 docker-desktop(Windows)Kubernetes 集群中遇到了一些问题。

时不时地,集群似乎会随机遇到 DiskPressure,并且无法再安排任何 Pod(所有 Pod 都处于 Pending 状态)。

因此,我检查了节点上出了什么问题,发现它一直处于 DiskPressure 状态。

我能够找到kubectl describe nodes以下日志(ImageGCFailed):

kubelet, docker-desktop     wanted to free 5180592947 bytes, but freed 0 bytes space with errors in image deletion: [rpc error: code = Unknown desc = Error response from daemon: conflict: unable to remove repository reference "moomedprofilepictureservice:dev" (must force) - container d8ef807bb674 is using its referenced image e2a36258ddf3, rpc error: code = Unknown desc = Error response from daemon: conflict: unable to remove repository reference "rancher/rancher:latest" (must force) - container 06af804517fc is using its referenced image 4251f6ed7d4e, rpc error: code = Unknown desc = Error response from daemon: conflict: unable to remove repository reference "prom/prometheus:latest" (must force) - container b08daf935e5d is using its referenced image 6fa696e177e3, rpc error: code = Unknown desc = Error response from daemon: conflict: unable to remove repository reference "moomedmonitoring:dev" (must force) - container bcda6e3e0d79 is using its referenced image 63c070c7b160, rpc error: code = Unknown desc = Error response from daemon: conflict: unable to remove repository reference "grafana/grafana:latest" (must force) - container 141c6909f9c3 is using its referenced image 651ff2dc930f, rpc error: code = Unknown desc = Error response from daemon: conflict: unable to remove repository reference "moomedsavingservice:dev" (must force) - container 13350d549f44 is using its referenced image 4649805f5c2f, rpc error: code = Unknown desc = Error response from daemon: conflict: unable to remove repository reference "moomedfrontend:dev" (must force) - container e917511c30db is using its referenced image 0dc1d2af3433, rpc error: code = Unknown desc = Error response from daemon: conflict: unable to remove repository reference "rabbitmq:3.8.6-management" (must force) - container 7252761ee146 is using its referenced image 64a1f920fb0d, rpc error: code = Unknown desc = Error response from daemon: conflict: unable to remove repository reference "moomedaccountvalidationservice:dev" (must force) - container 09ea0357c333 is using its referenced image 0329c6ba62a1, rpc error: code = Unknown desc = Error response from daemon: conflict: unable to remove repository reference "moomedsessionservice:dev" (must force) - container d2b33cb31611 is using its referenced image 21d801ad9175, rpc error: code = Unknown desc = Error response from daemon: conflict: unable to remove repository reference "moomedaccountservice:dev" (must force) - container 23c16e0a05ff is using its referenced image 6b3ba9041cca, rpc error: code = Unknown desc = Error response from daemon: conflict: unable to remove repository reference "moomedsearchservice:dev" (must force) - container b5f55d1e7246 is using its referenced image e4d40671cbc6, rpc error: code = Unknown desc = Error response from daemon: conflict: unable to remove repository reference "redis:latest" (must force) - container 960762cb6661 is using its referenced image 74d107221092, rpc error: code = Unknown desc = Error response from daemon: conflict: unable to remove repository reference "moomedchatservice:dev" (must force) - container ea893d0a4bc7 is using its referenced image cabc2a451580, rpc error: code = Unknown desc = Error response from daemon: conflict: unable to remove repository reference "moomedfinanceservice:dev" (must force) - container effa172e3f0a is using its referenced image f092e21dbab3]

因此,从本质上讲,人们试图通过垃圾收集图像来释放一些空间,但是现在我真正想知道的有几个问题:

  1. 这里引用的大多数图像甚至都没有被 kubernetes 使用 - 我的所有图像都被标记为 :testing,而 :dev 仅由我的本地 docker-compose 使用(它同时运行)。它们都是相同的图像,只是带有不同的标签,但为什么我的集群会担心尝试清理它甚至不应该控制的东西?
  2. 为什么我的集群一直处于 DiskPressure 状态?我再次检查了一下,我为 docker-desktop 实例提供了高达 88GB 的​​存储空间,这绝对还没有填满。这是我的节点的容量:

在此处输入图片描述

所以,我现在有点不知道该怎么做。当我扩大或缩小 docker-desktop 文件系统的使用量时,问题似乎会自行修复,而且我也不明白这种状态到底是怎么回事,但它总是不断出现,所以一定有什么问题。

我该怎么办?

答案1

库贝莱特有一个垃圾收集器,其目的是删除不必要的 k8s 对象以利用资源。

如果对象不属于任何所有者,则意味着它是孤儿。 Kubernetes 中有一种模式,称为 Kubernetes 中的所有权. 每当一个节点经历磁盘压力,Kubelet 守护进程将拼命尝试通过删除(据称)未使用的镜像来回收磁盘空间。读取源代码显示 Kubelet 按自上次用于创建 Pod 以来的时间来对要删除的镜像进行排序。您收到的错误 error response from daemon: conflict: unable to remove repository reference 表明容器正在使用所引用的镜像。请检查服务器中的容器和镜像。

列出您正在使用的容器:

$ docker ps -a

列出您正在使用的图像:

$ docker images

然后使用以下命令停止容器:

$ docker stop <container_ID>

稍后使用以下命令删除容器:

$ docker rm <container_ID>

最后,使用以下命令删除图像:

$ docker rmi <image_ID>

或者强制使用:

$ docker rm -f <image-id>

同时执行命令:

$ docker system prune

它将删除未使用的数据。它将释放 GB 并删除 DiskPressure 污点。然后您可以重新创建容器。

看一看:存储库冲突参考docker-prune

相关内容