无法将节点加入 kubernetes 集群;kubelet 错误阻止 kubeadm 加入怀疑

无法将节点加入 kubernetes 集群;kubelet 错误阻止 kubeadm 加入怀疑

我在 10.0.0.100 上创建了一个新集群,经过一些调整后,所有 pod 都成功启动并运行:

NAME                                    READY   STATUS    RESTARTS       AGE   IP           NODE            NOMINATED NODE   READINESS GATES
coredns-6d4b75cb6d-fwjnr                1/1     Running   0              47s   10.244.0.4   ts-k8s-master   <none>           <none>
coredns-6d4b75cb6d-l6hs2                1/1     Running   0              41s   10.244.0.5   ts-k8s-master   <none>           <none>
etcd-ts-k8s-master                      1/1     Running   76 (17h ago)   22h   10.0.0.100   ts-k8s-master   <none>           <none>
kube-apiserver-ts-k8s-master            1/1     Running   70 (17h ago)   18h   10.0.0.100   ts-k8s-master   <none>           <none>
kube-controller-manager-ts-k8s-master   1/1     Running   79 (15h ago)   22h   10.0.0.100   ts-k8s-master   <none>           <none>
kube-proxy-zmzdr                        1/1     Running   1 (37m ago)    21h   10.0.0.100   ts-k8s-master   <none>           <none>
kube-scheduler-ts-k8s-master            1/1     Running   81 (17h ago)   22h   10.0.0.100   ts-k8s-master   <none>           <none>

因此,我现在准备将节点加入集群(IP 10.0.0.101、10.0.0.102 等),但收到以下信息:

sudo kubeadm join 10.0.0.100:6443 --token l18xdm.eemusxu5rqf22gmx --discovery-token-ca-cert-hash sha256:e6451ec2e9ef26ddb1f2675e6dd7332e3d239db278516b567c7d9a33e6403ec9
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
[kubelet-check] Initial timeout of 40s passed.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.

Unfortunately, an error has occurred:
        timed out waiting for the condition

This error is likely caused by:
        - The kubelet is not running
        - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
        - 'systemctl status kubelet'
        - 'journalctl -xeu kubelet'
error execution phase kubelet-start: timed out waiting for the condition

因此,我检查了一下kubetlet,发现这确实是要向集群传达节点试图加入的信息,并执行引导。看起来kubelet节点上出现了问题:

systemctl status kubelet
* kubelet.service - kubelet: The Kubernetes Node Agent
     Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/kubelet.service.d
             `-10-kubeadm.conf
     Active: activating (auto-restart) (Result: exit-code) since Thu 2023-07-27 10:00:16 UTC; 757ms ago
       Docs: https://kubernetes.io/docs/home/
    Process: 191475 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=1/FAILURE)
   Main PID: 191475 (code=exited, status=1/FAILURE)
        CPU: 202ms

作为比较,kubelet控制平面上的情况看起来很好,并产生以下内容:

● kubelet.service - kubelet: The Kubernetes Node Agent
     Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/kubelet.service.d
             └─10-kubeadm.conf
     Active: active (running) since Wed 2023-07-26 15:55:15 BST; 24h ago
       Docs: https://kubernetes.io/docs/home/
   Main PID: 64452 (kubelet)
      Tasks: 19 (limit: 2081)
     Memory: 129.2M
        CPU: 2h 6min 33.994s
     CGroup: /system.slice/kubelet.service
             └─64452 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime=remote --container-runtime-endpoint=unix:///var/run/containerd/containerd.sock --pod-infra-container-image=registry.k8s.io/pause:3.7

我已经更新了控制平面和节点防火墙,主节点的端口为 6443,节点的端口为 10248(不确定是否需要)

我相信我已经正确设置了 CGroup,并且 containerd 正在运行:

* containerd.service - containerd container runtime
     Loaded: loaded (/lib/systemd/system/containerd.service; enabled; vendor preset: enabled)
     Active: active (running) since Wed 2023-07-26 13:23:16 UTC; 20h ago
       Docs: https://containerd.io
   Main PID: 133048 (containerd)
      Tasks: 10
     Memory: 19.8M
        CPU: 4min 9.603s
     CGroup: /system.slice/containerd.service
             `-133048 /usr/bin/containerd

节点的 kublet 配置 YAML 中似乎没有太多明显配置错误(至少对我来说),但迹象似乎指向节点的 kubelet 存在问题,因此无法引导,因此节点无法加入集群:

apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
  anonymous:
    enabled: false
  webhook:
    cacheTTL: 0s
    enabled: true
  x509:
    clientCAFile: /etc/kubernetes/pki/ca.crt
authorization:
  mode: Webhook
  webhook:
    cacheAuthorizedTTL: 0s
    cacheUnauthorizedTTL: 0s
cgroupDriver: systemd
clusterDNS:
- 10.96.0.10
clusterDomain: cluster.local
cpuManagerReconcilePeriod: 0s
evictionPressureTransitionPeriod: 0s
fileCheckFrequency: 0s
healthzBindAddress: 127.0.0.1
healthzPort: 10248
httpCheckFrequency: 0s
imageMinimumGCAge: 0s
kind: KubeletConfiguration
logging:
  flushFrequency: 0
  options:
    json:
      infoBufferSize: "0"
  verbosity: 0
memorySwap: {}
nodeStatusReportFrequency: 0s
nodeStatusUpdateFrequency: 0s
resolvConf: /run/systemd/resolve/resolv.conf
rotateCertificates: true
runtimeRequestTimeout: 0s
shutdownGracePeriod: 0s
shutdownGracePeriodCriticalPods: 0s
staticPodPath: /etc/kubernetes/manifests
streamingConnectionIdleTimeout: 0s
syncFrequency: 0s
volumeStatsAggPeriod: 0s

我现在有点不知该怎么做。任何帮助都非常感谢。

相关内容