我对我们的本地 Kubernetes 安装(Kubelet 版本 == 1.24.4)有疑问。
我们正在使用 Kubespray 安装 Kubernetes。
我知道 Stackoverflow 上有几个关于修复 Kubernetes 中的 KubeletHasDiskPressure 标志的相关问题/答案,例如[1],[2],[3], 和[4]。
但是,在我们的案例中,我们有意使用空间非常有限的主节点,因此需要调整 Kubernetes 中的默认 DiskPressure 值。
我尝试过几种方法:
1-尝试将以下内容添加到 /etc/kubernetes/kubeadm-config.php 的末尾
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
clusterDNS:
- 10.233.0.10
evictionHard:
memory.available: "500Mi"
nodefs.available: "500Mi"
imagefs.available: "1Gi"
nodefs.inodesFree: "500Mi"
evictionMinimumReclaim:
memory.available: "0Mi"
nodefs.available: "500Mi"
imagefs.available: "1Gi"
nodefs.inodesFree: "500Mi"
然后重新启动了主节点,但是并没有解决问题。
2-尝试使用以下命令取消对节点的污染:kubectl taint nodes node1 node.kubernetes.io/disk-pressure-
以下是我拥有的所有 pod 的列表:
以下是结果kubectl describe node node1
(base) m@node1:~$ kubectl describe node node1
Name: node1
Roles: control-plane
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=node1
kubernetes.io/os=linux
node-role.kubernetes.io/control-plane=
node.kubernetes.io/exclude-from-external-load-balancers=
Annotations: kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/cri-dockerd.sock
node.alpha.kubernetes.io/ttl: 0
projectcalico.org/IPv4Address: 10.0.0.47/24
projectcalico.org/IPv4VXLANTunnelAddr: 10.233.102.128
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Wed, 31 Aug 2022 00:55:59 +0200
Taints: node.kubernetes.io/disk-pressure:NoSchedule
Unschedulable: false
Lease:
HolderIdentity: node1
AcquireTime: <unset>
RenewTime: Tue, 06 Sep 2022 22:52:14 +0200
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Tue, 06 Sep 2022 13:15:51 +0200 Tue, 06 Sep 2022 13:15:51 +0200 CalicoIsUp Calico is running on this node
MemoryPressure False Tue, 06 Sep 2022 22:52:14 +0200 Tue, 06 Sep 2022 13:01:58 +0200 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure True Tue, 06 Sep 2022 22:52:14 +0200 Tue, 06 Sep 2022 13:16:29 +0200 KubeletHasDiskPressure kubelet has disk pressure
PIDPressure False Tue, 06 Sep 2022 22:52:14 +0200 Tue, 06 Sep 2022 13:01:58 +0200 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Tue, 06 Sep 2022 22:52:14 +0200 Tue, 06 Sep 2022 13:11:18 +0200 KubeletReady kubelet is posting ready status. AppArmor enabled
Addresses:
InternalIP: 10.0.0.47
Hostname: node1
Capacity:
cpu: 8
ephemeral-storage: 102101944Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 24523172Ki
pods: 110
Allocatable:
cpu: 7800m
ephemeral-storage: 94097151435
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 23896484Ki
pods: 110
System Info:
Machine ID: 9e0b0071a62e4393a09de330b66c7062
System UUID: cb61b600-9ee4-11e7-88a4-c6d8d9353300
Boot ID: 9021e1e7-2a1d-4b4b-a7c0-d1f195626660
Kernel Version: 5.15.0-47-generic
OS Image: Ubuntu 22.04.1 LTS
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://20.10.17
Kubelet Version: v1.24.4
Kube-Proxy Version: v1.24.4
PodCIDR: 10.233.64.0/24
PodCIDRs: 10.233.64.0/24
Non-terminated Pods: (6 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
--------- ---- ------------ ---------- --------------- ------------- ---
kube-system calico-node-qfbms 150m (1%) 300m (3%) 64M (0%) 500M (2%) 6d21h
kube-system kube-apiserver-node1 250m (3%) 0 (0%) 0 (0%) 0 (0%) 6d21h
kube-system kube-controller-manager-node1 200m (2%) 0 (0%) 0 (0%) 0 (0%) 6d21h
kube-system kube-proxy-h8wzs 0 (0%) 0 (0%) 0 (0%) 0 (0%) 4d
kube-system kube-scheduler-node1 100m (1%) 0 (0%) 0 (0%) 0 (0%) 6d21h
kube-system nodelocaldns-gjgwp 100m (1%) 0 (0%) 70Mi (0%) 200Mi (0%) 6d21h
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 800m (10%) 300m (3%)
memory 137400320 (0%) 709715200 (2%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning ImageGCFailed 25m (x101 over 8h) kubelet (combined from similar events): wanted to free 11413992243 bytes, but freed 0 bytes space with errors in image deletion: rpc error: code = Unknown desc = Error response from daemon: conflict: unable to remove repository reference "registry.k8s.io/pause:3.6" (must force) - container b419b95ee297 is using its referenced image 6270bb605e12
Warning EvictionThresholdMet 63s (x2423 over 9h) kubelet Attempting to reclaim ephemeral-storage
(base) m@node1:~$
您知道如何删除 node1 的这个标志吗?我不知道这是否有帮助,但这一切都是在我应用 nginx 控制器部署/守护进程集后开始的。
如果您有关于如何解决此问题的任何想法,我们将不胜感激。
答案1
将以下内容添加到 /etc/kubernetes/kubelet.env 解决了该问题:
--eviction-hard=nodefs.available<1%,imagefs.available<1%,nodefs.inodesFree<1%
我相信 KubeletConfiguration 应该仍然有效,但我不确定为什么它没有任何效果。在 Kubernetes git 存储库中报告了此问题以跟进。