kubeadm 1.25 init 在 Debian 11 上使用 containerd 失败 -> 连接被拒绝

kubeadm 1.25 init 在 Debian 11 上使用 containerd 失败 -> 连接被拒绝

我尝试使用 kubeadm 版本 1.25.4-00 初始化在 Debian GNU/Linux 11(bullseye)系统上运行的 kubernetes 主节点。

我遵循了 kubernetes.io 上的官方指南。我已安装并containerd设置。SystemdCgroup = true/etc/containerd/config.toml

  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
    [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
      runtime_type = "io.containerd.runc.v2"
      runtime_engine = ""
      runtime_root = ""
      privileged_without_host_devices = false
      base_runtime_spec = ""
      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
        SystemdCgroup = true

containerd 看起来没什么问题:

$ sudo systemctl status containerd
● containerd.service - containerd container runtime
     Loaded: loaded (/lib/systemd/system/containerd.service; enabled; vendor preset: enabled)
     Active: active (running) since Mon 2022-11-21 08:12:35 UTC; 1min 7s ago
       Docs: https://containerd.io
   Main PID: 7897 (containerd)
      Tasks: 8
     Memory: 10.5M
        CPU: 470ms
     CGroup: /system.slice/containerd.service
             └─7897 /usr/bin/containerd

Nov 21 08:12:35 master-1 containerd[7897]: time="2022-11-21T08:12:35.900148031Z" level=info msg=serving... address=/run/containerd/containerd.sock.ttrpc
Nov 21 08:12:35 master-1 containerd[7897]: time="2022-11-21T08:12:35.900245191Z" level=info msg=serving... address=/run/containerd/containerd.sock
Nov 21 08:12:35 master-1 containerd[7897]: time="2022-11-21T08:12:35.900338622Z" level=info msg="containerd successfully booted in 0.046780s"
Nov 21 08:12:35 master-1 systemd[1]: Started containerd container runtime.
Nov 21 08:12:35 master-1 containerd[7897]: time="2022-11-21T08:12:35.909836633Z" level=info msg="Start subscribing containerd event"
Nov 21 08:12:35 master-1 containerd[7897]: time="2022-11-21T08:12:35.909931756Z" level=info msg="Start recovering state"
Nov 21 08:12:35 master-1 containerd[7897]: time="2022-11-21T08:12:35.910044670Z" level=info msg="Start event monitor"
Nov 21 08:12:35 master-1 containerd[7897]: time="2022-11-21T08:12:35.910056885Z" level=info msg="Start snapshots syncer"
Nov 21 08:12:35 master-1 containerd[7897]: time="2022-11-21T08:12:35.910069145Z" level=info msg="Start cni network conf syncer"
Nov 21 08:12:35 master-1 containerd[7897]: time="2022-11-21T08:12:35.910079607Z" level=info msg="Start streaming server"
....

当我运行 kubeadm init 时,系统挂起并在 4 分钟后超时:

$ sudo kubeadm init --pod-network-cidr=10.244.0.0/16 -v=9

似乎没有防火墙问题,并且 kubeadm 似乎正确检测到了 containerd 和 cgroups:

I1121 08:16:46.935270    8096 initconfiguration.go:117] detected and using CRI socket: unix:///var/run/containerd/containerd.sock
I1121 08:16:46.935936    8096 interface.go:432] Looking for default routes with IPv4 addresses
I1121 08:16:46.936037    8096 interface.go:437] Default route transits interface "eth0"
I1121 08:16:46.936268    8096 interface.go:209] Interface eth0 is up
I1121 08:16:46.936427    8096 interface.go:257] Interface "eth0" has 3 addresses :[x.x.y.y/32 .........::1/64 ......../64].
I1121 08:16:46.936525    8096 interface.go:224] Checking addr  x.x.y.y/32.
I1121 08:16:46.936596    8096 interface.go:231] IP found x.x.y.y
I1121 08:16:46.936616    8096 interface.go:263] Found valid IPv4 address x.x.y.y for interface "eth0".
I1121 08:16:46.936710    8096 interface.go:443] Found active IP x.x.y.y 
I1121 08:16:46.936803    8096 kubelet.go:218] the value of KubeletConfiguration.cgroupDriver is empty; setting it to "systemd"
I1121 08:16:46.948350    8096 version.go:186] fetching Kubernetes version from URL: https://dl.k8s.io/release/stable-1.txt
I1121 08:16:47.327247    8096 version.go:255] remote version is much newer: v1.25.4; falling back to: stable-1.24
I1121 08:16:47.327368    8096 version.go:186] fetching Kubernetes version from URL: https://dl.k8s.io/release/stable-1.25.txt
[init] Using Kubernetes version: v1.25.4
[preflight] Running pre-flight checks
I1121 08:16:47.716620    8096 checks.go:570] validating Kubernetes and kubeadm version
I1121 08:16:47.716770    8096 checks.go:170] validating if the firewall is enabled and active
I1121 08:16:47.731470    8096 checks.go:205] validating availability of port 6443
I1121 08:16:47.732017    8096 checks.go:205] validating availability of port 10259
....

等待 kubelet 启动时会出现以下警告。此消息​​会一直显示,直到 4 分钟后超时:

[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
I1121 08:17:12.320743    8096 round_trippers.go:466] curl -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.25.4 (linux/amd64) kubernetes/fdc7750" 'https://x.x.y.y:6443/healthz?timeout=10s'
I1121 08:17:12.321047    8096 round_trippers.go:508] HTTP Trace: Dial to tcp:x.x.y.y:6443 failed: dial tcp x.x.y.y:6443: connect: connection refused
I1121 08:17:12.321112    8096 round_trippers.go:553] GET https://x.x.y.y:6443/healthz?timeout=10s  in 0 milliseconds
I1121 08:17:12.321157    8096 round_trippers.go:570] HTTP Statistics: DNSLookup 0 ms Dial 0 ms TLSHandshake 0 ms Duration 0 ms
I1121 08:17:12.321209    8096 round_trippers.go:577] Response Headers:
I1121 08:17:12.821526    8096 round_trippers.go:466] curl -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.25.4 (linux/amd64) kubernetes/fdc7750" 'https://x.x.y.y:6443/healthz?timeout=10s'
I1121 08:17:12.821882    8096 round_trippers.go:508] HTTP Trace: Dial to tcp:x.x.y.y:6443 failed: dial tcp x.x.y.y:6443: connect: connection refused
.....

检查 kublet 状态显示:

$ sudo systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
     Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/kubelet.service.d
             └─10-kubeadm.conf
     Active: active (running) since Mon 2022-11-21 08:17:12 UTC; 4min 30s ago
       Docs: https://kubernetes.io/docs/home/
   Main PID: 8228 (kubelet)
      Tasks: 14 (limit: 4556)
     Memory: 52.0M
        CPU: 6.246s
     CGroup: /system.slice/kubelet.service
             └─8228 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-ru>

Nov 21 08:21:42 master-1 kubelet[8228]: E1121 08:21:42.526642    8228 kubelet.go:2424] "Error getting node" err="node \"master-1\" not found"
Nov 21 08:21:42 master-1 kubelet[8228]: E1121 08:21:42.626872    8228 kubelet.go:2424] "Error getting node" err="node \"master-1\" not found"
Nov 21 08:21:42 master-1 kubelet[8228]: E1121 08:21:42.727919    8228 kubelet.go:2424] "Error getting node" err="node \"master-1\" not found"
Nov 21 08:21:42 master-1 kubelet[8228]: E1121 08:21:42.829055    8228 kubelet.go:2424] "Error getting node" err="node \"master-1\" not found"
Nov 21 08:21:42 master-1 kubelet[8228]: E1121 08:21:42.930002    8228 kubelet.go:2424] "Error getting node" err="node \"master-1\" not found"
Nov 21 08:21:42 master-1 kubelet[8228]: E1121 08:21:42.959961    8228 eviction_manager.go:254] "Eviction manager: failed to get summary stats" err="failed to get node info: node \"master->
Nov 21 08:21:43 master-1 kubelet[8228]: E1121 08:21:43.029432    8228 kubelet.go:2349] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady >
Nov 21 08:21:43 master-1 kubelet[8228]: E1121 08:21:43.030749    8228 kubelet.go:2424] "Error getting node" err="node \"master-1\" not found"
Nov 21 08:21:43 master-1 kubelet[8228]: E1121 08:21:43.130874    8228 kubelet.go:2424] "Error getting node" err="node \"master-1\" not found"
Nov 21 08:21:43 master-1 kubelet[8228]: E1121 08:21:43.231537    8228 kubelet.go:2424] "Error getting node" err="node \"master-1\" not found"

检查 journalctl 显示:

$ sudo journalctl -xeu kubelet
Nov 21 08:22:37 master-1 kubelet[8228]: E1121 08:22:37.585238    8228 kubelet.go:2424] "Error getting node" err="node \"master-1\" not found"
Nov 21 08:22:37 master-1 kubelet[8228]: E1121 08:22:37.685464    8228 kubelet.go:2424] "Error getting node" err="node \"master-1\" not found"
Nov 21 08:22:37 master-1 kubelet[8228]: E1121 08:22:37.786279    8228 kubelet.go:2424] "Error getting node" err="node \"master-1\" not found"
Nov 21 08:22:37 master-1 kubelet[8228]: E1121 08:22:37.887211    8228 kubelet.go:2424] "Error getting node" err="node \"master-1\" not found"
Nov 21 08:22:37 master-1 kubelet[8228]: E1121 08:22:37.987526    8228 kubelet.go:2424] "Error getting node" err="node \"master-1\" not found"
Nov 21 08:22:38 master-1 kubelet[8228]: E1121 08:22:38.045350    8228 kubelet.go:2349] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady >
Nov 21 08:22:38 master-1 kubelet[8228]: E1121 08:22:38.088201    8228 kubelet.go:2424] "Error getting node" err="node \"master-1\" not found"
....
Nov 21 08:22:40 master-1 kubelet[8228]: E1121 08:22:40.500610    8228 controller.go:144] failed to ensure lease exists, will retry in 7s, error: Get "https://x.x.y.y:6443/apis/coordin>
Nov 21 08:22:40 master-1 kubelet[8228]: E1121 08:22:40.512026    8228 kubelet.go:2424] "Error getting node" err="node \"master-1\" not found"
Nov 21 08:22:40 master-1 kubelet[8228]: E1121 08:22:40.613041    8228 kubelet.go:2424] "Error getting node" err="node \"master-1\" not found"
Nov 21 08:22:40 master-1 kubelet[8228]: I1121 08:22:40.700243    8228 kubelet_node_status.go:70] "Attempting to register node" node="master-1"
Nov 21 08:22:40 master-1 kubelet[8228]: E1121 08:22:40.701021    8228 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://x.x.y.y:6443/api/v1/node>
...
.....

我如何才能找出这个问题的根源?日志文件并没有真正给我提供任何有用的提示。

笔记:如果我安装 CRI-O 而不是 containerd,kubeadm 就会运行得很好。

答案1

我在 kubeadm v1.25.4 和 containerd v1.4.13 中遇到了同样的问题。

Containerd 似乎也很好,并且 Kubelet 服务处于活动状态,但 kubelet-api 与所有控制平面吊舱一起处于关闭状态。

kubectl get pods --all-namespaces
The connection to the server localhost:8080 was refused - did you specify the right host or port?

我的系统日志文件中还有其他日志:

Nov 25 09:39:08 master-1 kubelet[2809]: E1125 09:39:08.517592    2809 kubelet.go:2448] "Error getting node" err="node \"master-1\" not found"
Nov 25 09:39:08 master-1 kubelet[2809]: E1125 09:39:08.618103    2809 kubelet.go:2448] "Error getting node" err="node \"master-1\" not found"
Nov 25 09:39:08 master-1 kubelet[2809]: E1125 09:39:08.718895    2809 kubelet.go:2448] "Error getting node" err="node \"master-1\" not found"
Nov 25 09:39:08 master-1 containerd[450]: time="2022-11-25T09:39:08.774397538Z" level=info msg="RunPodsandbox for &PodSandboxMetadata{Name:kube-scheduler-master-1,Uid:c8fdb264532b280b4098380e628d113d,Namespace:kube-system,Attempt:0,}"
Nov 25 09:39:08 master-1 containerd[450]: time="2022-11-25T09:39:08.774397563Z" level=info msg="RunPodsandbox for &PodSandboxMetadata{Name:kube-apiserver-master-1,Uid:e8e76556f3e67024151f36c60b85b622,Namespace:kube-system,Attempt:0,}"
Nov 25 09:39:08 master-1 containerd[450]: time="2022-11-25T09:39:08.800116714Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:kube-scheduler-master-1,Uid:c8fdb264532b280b4098380e628d113d,Namespace:kube-system,Attempt:0,} failed, error" error="rpc error: code = InvalidArgument desc = failed to create containerd container: create container failed validation: container.Runtime.Name must be set: invalid argument"
Nov 25 09:39:08 master-1 kubelet[2809]: E1125 09:39:08.800548    2809 remote_runtime.go:233] "RunPodSandbox from runtime service failed" err="rpc error: code = InvalidArgument desc = failed to create containerd container: create container failed validation: container.Runtime.Name must be set: invalid argument"
Nov 25 09:39:08 master-1 kubelet[2809]: E1125 09:39:08.800620    2809 kuberuntime_sandbox.go:71] "Failed to create sandbox for pod" err="rpc error: code = InvalidArgument desc = failed to create containerd container: create container failed validation: container.Runtime.Name must be set: invalid argument" pod="kube-system/kube-scheduler-master-1"
Nov 25 09:39:08 master-1 kubelet[2809]: E1125 09:39:08.800653    2809 kuberuntime_manager.go:772] "CreatePodSandbox for pod failed" err="rpc error: code = InvalidArgument desc = failed to create containerd container: create container failed validation: container.Runtime.Name must be set: invalid argument" pod="kube-system/kube-scheduler-master-1"
Nov 25 09:39:08 master-1 kubelet[2809]: E1125 09:39:08.800729    2809 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"kube-scheduler-master-1_kube-system(c8fdb264532b280b4098380e628d113d)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"kube-scheduler-master-1_kube-system(c8fdb264532b280b4098380e628d113d)\\\": rpc error: code = InvalidArgument desc = failed to create containerd container: create container failed validation: container.Runtime.Name must be set: invalid argument\"" pod="kube-system/kube-scheduler-master-1" podUID=c8fdb264532b280b4098380e628d113d

如果有人有解决方案或线索,我会关注您的主题。

相关内容