RKE2 安装后所有节点未就绪。等待 canal 和 cordns

RKE2 安装后所有节点未就绪。等待 canal 和 cordns

我分别在控制平面机器和工作节点机器上安装了 rke2-server 和 rke2-agent。(machine-0 是控制平面,machine-1 和 2 是工作节点)。这些机器是物理机,我正在尝试手动安装它们(https://github.com/clemenko/rke_install_blog) 安装没有任何错误。但是当我查看kubectl get nodes节点(甚至控制节点)时,它处于 NotReady 状态。我看到许多 canal 和 coredns pod 处于待处理状态。我该如何修复它以让它们脱离待处理状态?我确实在控制节点上释放了大约 20GB 的内存,但 pod 状态没有改变。(有没有办法重新启动 pod?)

NAME              STATUS     ROLES                       AGE     VERSION           INTERNAL-IP    EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION           CONTAINER-RUNTIME
machine-0       NotReady   control-plane,etcd,master   4h8m    v1.26.12+rke2r1   192.169.0.10   <none>        Ubuntu 22.04.3 LTS   5.15.0-91-generic        containerd://1.7.11-k3s2
machine-1       NotReady   <none>                      3h56m   v1.26.12+rke2r1   192.169.0.11   <none>        Ubuntu 22.04.3 LTS   5.15.45-051545-generic   containerd://1.7.11-k3s2
machine-2       NotReady   <none>                      3h53m   v1.26.12+rke2r1   192.169.1.3    <none>        Ubuntu 22.04.3 LTS   6.2.0-25-generic         containerd://1.7.11-k3s2
root@machine-0:/home/normal#

kubectl 获取所有 --all-namespaces

root@machine-0:/home/normal# kubectl get all --all-namespaces
NAMESPACE       NAME                                                      READY   STATUS    RESTARTS       AGE
cattle-system   pod/rancher-64db9f48c-mc7vg                               0/1     Pending   0              12m
cert-manager    pod/cert-manager-55cf8685cb-5zmn9                         0/1     Pending   0              26m
cert-manager    pod/cert-manager-cainjector-fbd548cb8-9lzqt               0/1     Pending   0              26m
cert-manager    pod/cert-manager-startupapicheck-7cjzs                    0/1     Pending   0              26m
cert-manager    pod/cert-manager-webhook-655b4d58fb-rcws6                 0/1     Pending   0              26m
kube-system     pod/cloud-controller-manager-machine-0                  1/1     Running   3 (170m ago)   4h10m
kube-system     pod/etcd-machine-0                                      1/1     Running   1              4h9m
kube-system     pod/helm-install-rke2-canal-c45mv                         0/1     Pending   0              4h10m
kube-system     pod/helm-install-rke2-canal-kmhh6                         0/1     Evicted   0              4h10m
kube-system     pod/helm-install-rke2-coredns-n8g9t                       0/1     Pending   0              4h10m
kube-system     pod/helm-install-rke2-coredns-tcdmd                       0/1     Evicted   0              4h10m
kube-system     pod/helm-install-rke2-coredns-wrnh5                       0/1     Evicted   0              4h10m
kube-system     pod/helm-install-rke2-metrics-server-ms788                0/1     Pending   0              4h10m
kube-system     pod/helm-install-rke2-snapshot-controller-crd-rqjkj       0/1     Pending   0              4h10m
kube-system     pod/helm-install-rke2-snapshot-controller-j78sj           0/1     Pending   0              4h10m
kube-system     pod/helm-install-rke2-snapshot-validation-webhook-t5t24   0/1     Pending   0              4h10m
kube-system     pod/kube-apiserver-machine-0                            1/1     Running   1              4h10m
kube-system     pod/kube-controller-manager-machine-0                   1/1     Running   2 (170m ago)   4h10m
kube-system     pod/kube-proxy-machine-0                                1/1     Running   0              169m
kube-system     pod/kube-proxy-machine-1                                1/1     Running   0              3h58m
kube-system     pod/kube-proxy-machine-2                            1/1     Running   0              3h55m
kube-system     pod/kube-scheduler-machine-0                            1/1     Running   1 (170m ago)   4h10m

NAMESPACE       NAME                           TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)          AGE
cattle-system   service/rancher                ClusterIP   10.43.138.79   <none>        80/TCP,443/TCP   12m
cert-manager    service/cert-manager           ClusterIP   10.43.219.64   <none>        9402/TCP         26m
cert-manager    service/cert-manager-webhook   ClusterIP   10.43.15.146   <none>        443/TCP          26m
default         service/kubernetes             ClusterIP   10.43.0.1      <none>        443/TCP          4h10m

NAMESPACE     NAME                         DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
kube-system   daemonset.apps/kube-vip-ds   1         0         0       0            0           <none>          4h10m

NAMESPACE       NAME                                      READY   UP-TO-DATE   AVAILABLE   AGE
cattle-system   deployment.apps/rancher                   0/1     1            0           12m
cert-manager    deployment.apps/cert-manager              0/1     1            0           26m
cert-manager    deployment.apps/cert-manager-cainjector   0/1     1            0           26m
cert-manager    deployment.apps/cert-manager-webhook      0/1     1            0           26m

NAMESPACE       NAME                                                DESIRED   CURRENT   READY   AGE
cattle-system   replicaset.apps/rancher-64db9f48c                   1         1         0       12m
cert-manager    replicaset.apps/cert-manager-55cf8685cb             1         1         0       26m
cert-manager    replicaset.apps/cert-manager-cainjector-fbd548cb8   1         1         0       26m
cert-manager    replicaset.apps/cert-manager-webhook-655b4d58fb     1         1         0       26m

NAMESPACE      NAME                                                      COMPLETIONS   DURATION   AGE
cert-manager   job.batch/cert-manager-startupapicheck                    0/1           26m        26m
kube-system    job.batch/helm-install-rke2-canal                         0/1           4h10m      4h10m
kube-system    job.batch/helm-install-rke2-coredns                       0/1           4h10m      4h10m
kube-system    job.batch/helm-install-rke2-metrics-server                0/1           4h10m      4h10m
kube-system    job.batch/helm-install-rke2-snapshot-controller           0/1           4h10m      4h10m
kube-system    job.batch/helm-install-rke2-snapshot-controller-crd       0/1           4h10m      4h10m
kube-system    job.batch/helm-install-rke2-snapshot-validation-webhook   0/1           4h10m      4h10m

Kubectl 事件日志

root@machine-0:/home/normal# kubectl get events --all-namespaces
NAMESPACE       LAST SEEN   TYPE      REASON                OBJECT                                                    MESSAGE
cattle-system   4m27s       Warning   FailedScheduling      pod/rancher-64db9f48c-mc7vg                               0/3 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/disk-pressure: }, 2 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling..
cattle-system   24m         Normal    SuccessfulCreate      replicaset/rancher-64db9f48c                              Created pod: rancher-64db9f48c-mc7vg
cattle-system   24m         Normal    ScalingReplicaSet     deployment/rancher                                        Scaled up replica set rancher-64db9f48c to 1
cert-manager    3m57s       Warning   FailedScheduling      pod/cert-manager-55cf8685cb-5zmn9                         0/3 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/disk-pressure: }, 2 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling..
cert-manager    39m         Normal    SuccessfulCreate      replicaset/cert-manager-55cf8685cb                        Created pod: cert-manager-55cf8685cb-5zmn9
cert-manager    3m57s       Warning   FailedScheduling      pod/cert-manager-cainjector-fbd548cb8-9lzqt               0/3 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/disk-pressure: }, 2 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling..
cert-manager    39m         Normal    SuccessfulCreate      replicaset/cert-manager-cainjector-fbd548cb8              Created pod: cert-manager-cainjector-fbd548cb8-9lzqt
cert-manager    39m         Normal    ScalingReplicaSet     deployment/cert-manager-cainjector                        Scaled up replica set cert-manager-cainjector-fbd548cb8 to 1
cert-manager    3m57s       Warning   FailedScheduling      pod/cert-manager-startupapicheck-7cjzs                    0/3 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/disk-pressure: }, 2 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling..
cert-manager    39m         Normal    SuccessfulCreate      job/cert-manager-startupapicheck                          Created pod: cert-manager-startupapicheck-7cjzs
cert-manager    3m57s       Warning   FailedScheduling      pod/cert-manager-webhook-655b4d58fb-rcws6                 0/3 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/disk-pressure: }, 2 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling..
cert-manager    39m         Normal    SuccessfulCreate      replicaset/cert-manager-webhook-655b4d58fb                Created pod: cert-manager-webhook-655b4d58fb-rcws6
cert-manager    39m         Normal    ScalingReplicaSet     deployment/cert-manager-webhook                           Scaled up replica set cert-manager-webhook-655b4d58fb to 1
cert-manager    39m         Normal    ScalingReplicaSet     deployment/cert-manager                                   Scaled up replica set cert-manager-55cf8685cb to 1
default         2m53s       Warning   FreeDiskSpaceFailed   node/machine-0                                          (combined from similar events): Failed to garbage collect required amount of images. Attempted to free 165305339904 bytes, but only found 0 bytes eligible to free.
kube-system     117s        Warning   FailedScheduling      pod/helm-install-rke2-canal-c45mv                         0/3 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/disk-pressure: }, 2 node(s) didn't match Pod's node affinity/selector. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling..
kube-system     117s        Warning   FailedScheduling      pod/helm-install-rke2-coredns-n8g9t                       0/3 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/disk-pressure: }, 2 node(s) didn't match Pod's node affinity/selector. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling..
kube-system     2m27s       Warning   FailedScheduling      pod/helm-install-rke2-metrics-server-ms788                0/3 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/disk-pressure: }, 2 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling..
kube-system     117s        Warning   FailedScheduling      pod/helm-install-rke2-snapshot-controller-crd-rqjkj       0/3 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/disk-pressure: }, 2 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling..
kube-system     117s        Warning   FailedScheduling      pod/helm-install-rke2-snapshot-controller-j78sj           0/3 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/disk-pressure: }, 2 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling..
kube-system     117s        Warning   FailedScheduling      pod/helm-install-rke2-snapshot-validation-webhook-t5t24   0/3 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/disk-pressure: }, 2 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling..
kube-system     10m         Warning   FailedCreate          daemonset/kube-vip-ds                                     Error creating: pods "kube-vip-ds-" is forbidden: error looking up service account kube-system/kube-vip: serviceaccount "kube-vip" not found

控制平面节点描述

root@machine-0:/home/normal# kubectl describe node machine-0
Name:               machine-0
Roles:              control-plane,etcd,master
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=rke2
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=machine-0
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/control-plane=true
                    node-role.kubernetes.io/etcd=true
                    node-role.kubernetes.io/master=true
                    node.kubernetes.io/instance-type=rke2
Annotations:        alpha.kubernetes.io/provided-node-ip: 192.169.0.10
                    etcd.rke2.cattle.io/local-snapshots-timestamp: 2024-01-24T19:07:31Z
                    etcd.rke2.cattle.io/node-address: 192.169.0.10
                    etcd.rke2.cattle.io/node-name: machine-0-0a9550ae
                    node.alpha.kubernetes.io/ttl: 0
                    rke2.io/encryption-config-hash: start-4e1ae4c38a662bdf4101d2ff62004a662978e39159b4bee9cd399cc50fc7c74b
                    rke2.io/hostname: machine-0
                    rke2.io/internal-ip: 192.169.0.10
                    rke2.io/node-args:
                      ["server","--tls-san","192.169.3.50","--tls-san","192.169.0.11","--write-kubeconfig-mode","420","--disable","rke2-ingress-nginx"]
                    rke2.io/node-config-hash: WJDPFSWIXMXMLY4CUXWVURZC7VSCEYFGRCD2ZU326HX7AW2YXJTA====
                    rke2.io/node-env: {}
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Wed, 24 Jan 2024 17:46:10 +0000
Taints:             node.kubernetes.io/disk-pressure:NoSchedule
                    node.kubernetes.io/not-ready:NoSchedule
Unschedulable:      false
Lease:
  HolderIdentity:  machine-0
  AcquireTime:     <unset>
  RenewTime:       Wed, 24 Jan 2024 22:10:45 +0000
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Wed, 24 Jan 2024 22:10:11 +0000   Wed, 24 Jan 2024 18:51:02 +0000   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     True    Wed, 24 Jan 2024 22:10:11 +0000   Wed, 24 Jan 2024 19:06:28 +0000   KubeletHasDiskPressure       kubelet has disk pressure
  PIDPressure      False   Wed, 24 Jan 2024 22:10:11 +0000   Wed, 24 Jan 2024 18:51:02 +0000   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            False   Wed, 24 Jan 2024 22:10:11 +0000   Wed, 24 Jan 2024 18:51:02 +0000   KubeletNotReady              container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
  EtcdIsVoter      True    Wed, 24 Jan 2024 17:46:13 +0000   Wed, 24 Jan 2024 17:46:13 +0000   MemberNotLearner             Node is a voting member of the etcd cluster
Addresses:
  InternalIP:  192.169.0.10
  Hostname:    machine-0
Capacity:
  cpu:                96
  ephemeral-storage:  919162120Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             528186908Ki
  pods:               110
Allocatable:
  cpu:                96
  ephemeral-storage:  894160909635
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             528186908Ki
  pods:               110
System Info:
  Machine ID:                 937916b85ff34816bd088d35e639c67f
  System UUID:                26360400-dbe5-11eb-8000-3ceceff3813a
  Boot ID:                    f16b8922-f6e9-4037-bd56-ce123feae027
  Kernel Version:             5.15.0-91-generic
  OS Image:                   Ubuntu 22.04.3 LTS
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  containerd://1.7.11-k3s2
  Kubelet Version:            v1.26.12+rke2r1
  Kube-Proxy Version:         v1.26.12+rke2r1
PodCIDR:                      10.42.0.0/24
PodCIDRs:                     10.42.0.0/24
ProviderID:                   rke2://machine-0
Non-terminated Pods:          (6 in total)
  Namespace                   Name                                    CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                    ------------  ----------  ---------------  -------------  ---
  kube-system                 cloud-controller-manager-machine-0    100m (0%)     0 (0%)      128Mi (0%)       0 (0%)         4h24m
  kube-system                 etcd-machine-0                        200m (0%)     0 (0%)      512Mi (0%)       0 (0%)         4h24m
  kube-system                 kube-apiserver-machine-0              250m (0%)     0 (0%)      1Gi (0%)         0 (0%)         4h24m
  kube-system                 kube-controller-manager-machine-0     200m (0%)     0 (0%)      256Mi (0%)       0 (0%)         4h24m
  kube-system                 kube-proxy-machine-0                  250m (0%)     0 (0%)      128Mi (0%)       0 (0%)         3h3m
  kube-system                 kube-scheduler-machine-0              100m (0%)     0 (0%)      128Mi (0%)       0 (0%)         4h24m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests     Limits
  --------           --------     ------
  cpu                1100m (1%)   0 (0%)
  memory             2176Mi (0%)  0 (0%)
  ephemeral-storage  0 (0%)       0 (0%)
  hugepages-1Gi      0 (0%)       0 (0%)
  hugepages-2Mi      0 (0%)       0 (0%)
Events:
  Type     Reason               Age                    From     Message
  ----     ------               ----                   ----     -------
  Warning  FreeDiskSpaceFailed  4m33s (x27 over 134m)  kubelet  (combined from similar events): Failed to garbage collect required amount of images. Attempted to free 165305339904 bytes, but only found 0 bytes eligible to free.

由于某种原因,df -h充满了这些覆盖和共享内存

Filesystem                         Size  Used Avail Use% Mounted on
tmpfs                               51G   25M   51G   1% /run
/dev/mapper/ubuntu--vg-ubuntu--lv  877G  805G   36G  96% /
tmpfs                              252G   24K  252G   1% /dev/shm
tmpfs                              5.0M     0  5.0M   0% /run/lock
/dev/sda2                          2.0G  306M  1.5G  17% /boot
/dev/sda1                          1.1G  6.1M  1.1G   1% /boot/efi
shm                                 64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/86d1e80c0ca12b1b307204e61507ff1b8a632192fe537000bc201724051fa132/shm
shm                                 64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/b2a52985151b1344962a653977184abff5a25d565217cc332fcda646fafd087b/shm
shm                                 64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/fa42998d60f6a0764f6f67c8f94316dc1b05549fa1df559afdf0899eb9e6721f/shm
overlay                            877G  805G   36G  96% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/86d1e80c0ca12b1b307204e61507ff1b8a632192fe537000bc201724051fa132/rootfs
overlay                            877G  805G   36G  96% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/fa42998d60f6a0764f6f67c8f94316dc1b05549fa1df559afdf0899eb9e6721f/rootfs
overlay                            877G  805G   36G  96% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/b2a52985151b1344962a653977184abff5a25d565217cc332fcda646fafd087b/rootfs
overlay                            877G  805G   36G  96% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/106276b255dc3e8c2b607a9cf8884f59da0b744268f6215bd72854e74a108057/rootfs
shm                                 64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/72f6e448e221991030687587ec2e1b0fc35cc1046e94071f8ae302a11eaf4e79/shm
overlay                            877G  805G   36G  96% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/72f6e448e221991030687587ec2e1b0fc35cc1046e94071f8ae302a11eaf4e79/rootfs
overlay                            877G  805G   36G  96% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/ae3f0a4e0d9680c30a0814db9055d13e374db44f601756e0a63684496622ab2a/rootfs
overlay                            877G  805G   36G  96% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/6ddfec478b12b5fbb045934d244bb652e290b2371532c6d5374a7a89f9a68879/rootfs
shm                                 64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/4c686224be121bfa63afa2b5b9d7d7ea6223ec1dd3d864ab8cd67ffb70c65d1c/shm
overlay                            877G  805G   36G  96% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/4c686224be121bfa63afa2b5b9d7d7ea6223ec1dd3d864ab8cd67ffb70c65d1c/rootfs
overlay                            877G  805G   36G  96% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/626435c560b90557e8c3488b5e746951d5a7a9a9336fdfff0f6f4d90dd8b8883/rootfs
overlay                            877G  805G   36G  96% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/d83aef92824ae10737a1994fcebb57aa5cca7e7d42cc9ca0fbd85431ac2c154f/rootfs
shm                                 64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/3cc4fedaa79032bf5fa18c9e939a73aecb352d6ed54e70167ffda8ae69a8854e/shm
overlay                            877G  805G   36G  96% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/3cc4fedaa79032bf5fa18c9e939a73aecb352d6ed54e70167ffda8ae69a8854e/rootfs
overlay                            877G  805G   36G  96% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/21221658a71644d3fda4c89a0564aba0c1c2ca57e59dd884478ef2d8c762244d/rootfs
tmpfs                               51G  4.0K   51G   1% /run/user/1001
overlay                            877G  805G   36G  96% /mnt/docker/overlay2/478bd2a43988031366edb415fb01b880c0cbadfa079ec17b5ac7ab9de3b93c07/merged
tmpfs                               51G  4.0K   51G   1% /run/user/1000
192.169.0.1:/mnt/storage/models    200T  3.9T  186T   3% /models
overlay                            877G  805G   36G  96% /mnt/docker/overlay2/b4ff783dad30e510f8dfafcdb50c2c6995182a3cf7815e2d92b19de40ec54505/merged
overlay                            877G  805G   36G  96% /mnt/docker/overlay2/82695fb56dcef0928078b78e06bcee26090fddffe381b7d7bf7d24328f4ec5b0/merged

相关内容