我在 10.0.0.100 上创建了一个新集群,经过一些调整后,所有 pod 都成功启动并运行:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
coredns-6d4b75cb6d-fwjnr 1/1 Running 0 47s 10.244.0.4 ts-k8s-master <none> <none>
coredns-6d4b75cb6d-l6hs2 1/1 Running 0 41s 10.244.0.5 ts-k8s-master <none> <none>
etcd-ts-k8s-master 1/1 Running 76 (17h ago) 22h 10.0.0.100 ts-k8s-master <none> <none>
kube-apiserver-ts-k8s-master 1/1 Running 70 (17h ago) 18h 10.0.0.100 ts-k8s-master <none> <none>
kube-controller-manager-ts-k8s-master 1/1 Running 79 (15h ago) 22h 10.0.0.100 ts-k8s-master <none> <none>
kube-proxy-zmzdr 1/1 Running 1 (37m ago) 21h 10.0.0.100 ts-k8s-master <none> <none>
kube-scheduler-ts-k8s-master 1/1 Running 81 (17h ago) 22h 10.0.0.100 ts-k8s-master <none> <none>
因此,我现在准备将节点加入集群(IP 10.0.0.101、10.0.0.102 等),但收到以下信息:
sudo kubeadm join 10.0.0.100:6443 --token l18xdm.eemusxu5rqf22gmx --discovery-token-ca-cert-hash sha256:e6451ec2e9ef26ddb1f2675e6dd7332e3d239db278516b567c7d9a33e6403ec9
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
[kubelet-check] Initial timeout of 40s passed.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
error execution phase kubelet-start: timed out waiting for the condition
因此,我检查了一下kubetlet
,发现这确实是要向集群传达节点试图加入的信息,并执行引导。看起来kubelet
节点上出现了问题:
systemctl status kubelet
* kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
`-10-kubeadm.conf
Active: activating (auto-restart) (Result: exit-code) since Thu 2023-07-27 10:00:16 UTC; 757ms ago
Docs: https://kubernetes.io/docs/home/
Process: 191475 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=1/FAILURE)
Main PID: 191475 (code=exited, status=1/FAILURE)
CPU: 202ms
作为比较,kubelet
控制平面上的情况看起来很好,并产生以下内容:
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Wed 2023-07-26 15:55:15 BST; 24h ago
Docs: https://kubernetes.io/docs/home/
Main PID: 64452 (kubelet)
Tasks: 19 (limit: 2081)
Memory: 129.2M
CPU: 2h 6min 33.994s
CGroup: /system.slice/kubelet.service
└─64452 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime=remote --container-runtime-endpoint=unix:///var/run/containerd/containerd.sock --pod-infra-container-image=registry.k8s.io/pause:3.7
我已经更新了控制平面和节点防火墙,主节点的端口为 6443,节点的端口为 10248(不确定是否需要)
我相信我已经正确设置了 CGroup,并且 containerd 正在运行:
* containerd.service - containerd container runtime
Loaded: loaded (/lib/systemd/system/containerd.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2023-07-26 13:23:16 UTC; 20h ago
Docs: https://containerd.io
Main PID: 133048 (containerd)
Tasks: 10
Memory: 19.8M
CPU: 4min 9.603s
CGroup: /system.slice/containerd.service
`-133048 /usr/bin/containerd
节点的 kublet 配置 YAML 中似乎没有太多明显配置错误(至少对我来说),但迹象似乎指向节点的 kubelet 存在问题,因此无法引导,因此节点无法加入集群:
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
anonymous:
enabled: false
webhook:
cacheTTL: 0s
enabled: true
x509:
clientCAFile: /etc/kubernetes/pki/ca.crt
authorization:
mode: Webhook
webhook:
cacheAuthorizedTTL: 0s
cacheUnauthorizedTTL: 0s
cgroupDriver: systemd
clusterDNS:
- 10.96.0.10
clusterDomain: cluster.local
cpuManagerReconcilePeriod: 0s
evictionPressureTransitionPeriod: 0s
fileCheckFrequency: 0s
healthzBindAddress: 127.0.0.1
healthzPort: 10248
httpCheckFrequency: 0s
imageMinimumGCAge: 0s
kind: KubeletConfiguration
logging:
flushFrequency: 0
options:
json:
infoBufferSize: "0"
verbosity: 0
memorySwap: {}
nodeStatusReportFrequency: 0s
nodeStatusUpdateFrequency: 0s
resolvConf: /run/systemd/resolve/resolv.conf
rotateCertificates: true
runtimeRequestTimeout: 0s
shutdownGracePeriod: 0s
shutdownGracePeriodCriticalPods: 0s
staticPodPath: /etc/kubernetes/manifests
streamingConnectionIdleTimeout: 0s
syncFrequency: 0s
volumeStatsAggPeriod: 0s
我现在有点不知该怎么做。任何帮助都非常感谢。