我有一个使用 microk8s 的 k8 集群,并且该集群开始出现故障,在调度 pod、管理节点等时无响应/提供陈旧状态。
我已经检查并重新启动了所有管理器节点,并删除了所有工作节点,以减少噪音,并希望能够驱逐问题节点(因为我看到中的响应时间/超时很长)/var/logs/syslogs
。
重新启动时,如果我查看,kubectl get events
我会看到kubelet
服务正在不断重新启动:
14m Normal Starting node/node1 Starting kubelet.
14m Warning InvalidDiskCapacity node/node1 invalid capacity 0 on image filesystem
14m Normal Starting node/node3 Starting kubelet.
14m Normal NodeHasSufficientMemory node/node1 Node node1 status is now: NodeHasSufficientMemory
14m Normal NodeHasNoDiskPressure node/node1 Node node1 status is now: NodeHasNoDiskPressure
14m Normal Starting node/node3
14m Normal NodeHasSufficientPID node/node1 Node node1 status is now: NodeHasSufficientPID
14m Warning InvalidDiskCapacity node/node3 invalid capacity 0 on image filesystem
14m Normal NodeHasSufficientMemory node/node3 Node node3 status is now: NodeHasSufficientMemory
14m Normal NodeAllocatableEnforced node/node1 Updated Node Allocatable limit across pods
14m Normal NodeHasNoDiskPressure node/node3 Node node3 status is now: NodeHasNoDiskPressure
14m Normal NodeHasSufficientPID node/node3 Node node3 status is now: NodeHasSufficientPID
14m Normal NodeAllocatableEnforced node/node3 Updated Node Allocatable limit across pods
11m Normal Starting node/node1
11m Normal Starting node/node1 Starting kubelet.
11m Warning InvalidDiskCapacity node/node1 invalid capacity 0 on image filesystem
11m Normal NodeHasSufficientMemory node/node1 Node node1 status is now: NodeHasSufficientMemory
11m Normal NodeHasNoDiskPressure node/node1 Node node1 status is now: NodeHasNoDiskPressure
11m Normal NodeHasSufficientPID node/node1 Node node1 status is now: NodeHasSufficientPID
11m Normal NodeAllocatableEnforced node/node1 Updated Node Allocatable limit across pods
10m Normal Starting node/node3 Starting kubelet.
10m Warning InvalidDiskCapacity node/node3 invalid capacity 0 on image filesystem
10m Normal NodeAllocatableEnforced node/node3 Updated Node Allocatable limit across pods
10m Normal NodeHasSufficientMemory node/node3 Node node3 status is now: NodeHasSufficientMemory
10m Normal NodeHasNoDiskPressure node/node3 Node node3 status is now: NodeHasNoDiskPressure
10m Normal NodeHasSufficientPID node/node3 Node node3 status is now: NodeHasSufficientPID
8m1s Normal Starting node/node1
7m57s Normal Starting node/node1 Starting kubelet.
7m57s Warning InvalidDiskCapacity node/node1 invalid capacity 0 on image filesystem
7m57s Normal NodeHasSufficientMemory node/node1 Node node1 status is now: NodeHasSufficientMemory
7m9s Normal Starting node/node3 Starting kubelet.
7m57s Normal NodeHasNoDiskPressure node/node1 Node node1 status is now: NodeHasNoDiskPressure
7m8s Normal Starting node/node3
7m57s Normal NodeHasSufficientPID node/node1 Node node1 status is now: NodeHasSufficientPID
7m9s Warning InvalidDiskCapacity node/node3 invalid capacity 0 on image filesystem
7m8s Normal NodeHasSufficientMemory node/node3 Node node3 status is now: NodeHasSufficientMemory
7m57s Normal NodeAllocatableEnforced node/node1 Updated Node Allocatable limit across pods
7m8s Normal NodeHasNoDiskPressure node/node3 Node node3 status is now: NodeHasNoDiskPressure
7m8s Normal NodeHasSufficientPID node/node3 Node node3 status is now: NodeHasSufficientPID
7m8s Normal NodeAllocatableEnforced node/node3 Updated Node Allocatable limit across pods
我不知道在哪里可以找到日志来解释这些重启的原因,在谷歌上搜索警告时invalid capacity 0
,人们似乎说可以忽略它。此外,这是一个警告,而不是错误,也不是重启前的最终日志,所以我认为它不会阻止启动。不过,我不确定。
我正在寻找更多日志,以便更详细地了解此服务在节点上失败的原因。我查看journalctl -f -u snap.microk8s.daemon-kubelite
了文档说 kubelet 日志已经合并到那里了。
我无法包含日志,因为它超出了 ServerFault 上的字符限制,而且您似乎无法上传文件。我也许可以提供片段,或者搜索特定内容。
这里有一些突出的问题,但不知道其中任何一个是否会导致 kubelet 无法启动:
...
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.025847 459922 server.go:1251] "Started kubelet"
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.025894 459922 server.go:177] "Starting to listen read-only" address="0.0.0.0" port=10255
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.025926 459922 server.go:150] "Starting to listen" address="0.0.0.0" port=10250
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.026588 459922 server.go:410] "Adding debug handlers to kubelet server"
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.027415 459922 fs_resource_analyzer.go:67] "Starting FS ResourceAnalyzer"
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.027521 459922 volume_manager.go:294] "Starting Kubelet Volume Manager"
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.027975 459922 desired_state_of_world_populator.go:151] "Desired state populator starts to run"
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: E0818 12:25:46.028881 459922 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="unable to find data in memory cache" mountpoint="/var/snap/microk8s/common/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs"
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: E0818 12:25:46.029007 459922 kubelet.go:1351] "Image garbage collection failed once. Stats initialization may not have completed yet" err="invalid capacity 0 on image filesystem"
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.061667 459922 kubelet_network_linux.go:57] "Initialized protocol iptables rules." protocol=IPv4
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.072440 459922 kubelet_network_linux.go:57] "Initialized protocol iptables rules." protocol=IPv6
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.072452 459922 status_manager.go:161] "Starting to sync pod status with apiserver"
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.072461 459922 kubelet.go:2031] "Starting kubelet main sync loop"
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: E0818 12:25:46.072492 459922 kubelet.go:2055] "Skipping pod synchronization" err="[container runtime status check may not have completed yet, PLEG is not healthy: pleg has yet to be successful]"
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.122991 459922 cpu_manager.go:213] "Starting CPU manager" policy="none"
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.123003 459922 cpu_manager.go:214] "Reconciling" reconcilePeriod="10s"
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.123016 459922 state_mem.go:36] "Initialized new in-memory state store"
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.123144 459922 state_mem.go:88] "Updated default CPUSet" cpuSet=""
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.123156 459922 state_mem.go:96] "Updated CPUSet assignments" assignments=map[]
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.123162 459922 policy_none.go:49] "None policy: Start"
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.124388 459922 memory_manager.go:168] "Starting memorymanager" policy="None"
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.124403 459922 state_mem.go:35] "Initializing new in-memory state store"
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.124496 459922 state_mem.go:75] "Updated machine memory state"
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.125518 459922 manager.go:611] "Failed to read data from checkpoint" checkpoint="kubelet_internal_checkpoint" err="checkpoint is not found"
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.125689 459922 plugin_manager.go:114] "Starting Kubelet Plugin Manager"
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.126270 459922 csi_plugin.go:99] kubernetes.io/csi: Trying to validate a new CSI Driver with name: cstor.csi.openebs.io endpoint: /var/snap/microk8s/common/var/lib/kubelet/plugins/cstor.csi.openebs.io/csi.sock versions: 1.0.0
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.126286 459922 csi_plugin.go:112] kubernetes.io/csi: Register new plugin with name: cstor.csi.openebs.io at endpoint: /var/snap/microk8s/common/var/lib/kubelet/plugins/cstor.csi.openebs.io/csi.sock
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.128371 459922 kubelet_node_status.go:70] "Attempting to register node" node="node3"
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.159655 459922 serving.go:348] Generated self-signed cert in-memory
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.173125 459922 pod_container_deletor.go:79] "Container not found in pod's containers" containerID="7d5dfde734a4022cf57816fb8fd2bfd9b30dc4c44fefb262d866854e9905fedd"
...
将不胜感激任何帮助找到这里重要错误的方法。
谢谢。