为了演示我所做的事情,我使用 Vagrant + Cloud-init 配置组建了一个 repo。
https://github.com/johnmanko/vagrant-k8s-cluster
基本上,Kubernetes 一直在崩溃,我不知道为什么。有一半的时间我无法连接来检查情况。 top
没有显示任何过度的 CPU/内存使用情况。
vagrant@kubemaster:~$ kubectl get pods -A
The connection to the server 192.168.56.21:6443 was refused - did you specify the right host or port?
vagrant@kubemaster:~$ kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-5dd5756b68-lfktv 0/1 Completed 7 28m
kube-system coredns-5dd5756b68-vxzcn 1/1 Running 5 (5m6s ago) 28m
kube-system etcd-kubemaster 1/1 Running 8 (4m2s ago) 28m
kube-system kube-apiserver-kubemaster 1/1 Running 9 (88s ago) 29m
kube-system kube-controller-manager-kubemaster 0/1 CrashLoopBackOff 11 (4m32s ago) 29m
kube-system kube-proxy-s2vkv 1/1 Running 13 (5m56s ago) 28m
kube-system kube-scheduler-kubemaster 0/1 CrashLoopBackOff 12 (3m8s ago) 28m
kube-system weave-net-rsd7x 0/2 CrashLoopBackOff 11 (46s ago) 17m
vagrant@kubemaster:~$ kubectl logs kube-scheduler-kubemaster -p
Error from server (NotFound): pods "kube-scheduler-kubemaster" not found
vagrant@kubemaster:~$ kubectl logs kube-scheduler-kubemaster -p -n kube-system
The connection to the server 192.168.56.21:6443 was refused - did you specify the right host or port?
vagrant@kubemaster:~$ kubectl get pods -A
The connection to the server 192.168.56.21:6443 was refused - did you specify the right host or port?
当我打字速度足够快时,可以抓取日志:
vagrant@kubemaster:~$ kubectl logs kube-scheduler-kubemaster -p -n kube-system
I1211 16:56:51.716017 1 serving.go:348] Generated self-signed cert in-memory
W1211 16:56:51.969869 1 authentication.go:368] Error looking up in-cluster authentication configuration: Get "https://192.168.56.21:6443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication": dial tcp 192.168.56.21:6443: connect: connection refused
W1211 16:56:51.969893 1 authentication.go:369] Continuing without authentication configuration. This may treat all requests as anonymous.
W1211 16:56:51.969901 1 authentication.go:370] To require authentication configuration lookup to succeed, set --authentication-tolerate-lookup-failure=false
I1211 16:56:51.971903 1 server.go:154] "Starting Kubernetes Scheduler" version="v1.28.4"
I1211 16:56:51.971922 1 server.go:156] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
I1211 16:56:51.972965 1 secure_serving.go:213] Serving securely on 127.0.0.1:10259
I1211 16:56:51.973076 1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
I1211 16:56:51.973135 1 shared_informer.go:311] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I1211 16:56:51.973153 1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
W1211 16:56:51.974053 1 reflector.go:535] pkg/server/dynamiccertificates/configmap_cafile_content.go:206: failed to list *v1.ConfigMap: Get "https://192.168.56.21:6443/api/v1/namespaces/kube-system/configmaps?fieldSelector=metadata.name%3Dextension-apiserver-authentication&limit=500&resourceVersion=0": dial tcp 192.168.56.21:6443: connect: connection refused
E1211 16:56:51.974113 1 reflector.go:147] pkg/server/dynamiccertificates/configmap_cafile_content.go:206: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: Get "https://192.168.56.21:6443/api/v1/namespaces/kube-system/configmaps?fieldSelector=metadata.name%3Dextension-apiserver-authentication&limit=500&resourceVersion=0": dial tcp 192.168.56.21:6443: connect: connection refused
W1211 16:56:51.974190 1 reflector.go:535] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: Get "https://192.168.56.21:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp 192.168.56.21:6443: connect: connection refused
E1211 16:56:51.974218 1 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://192.168.56.21:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp 192.168.56.21:6443: connect: connection refused
W1211 16:56:51.974283 1 reflector.go:535] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.CSINode: Get "https://192.168.56.21:6443/apis/storage.k8s.io/v1/csinodes?limit=500&resourceVersion=0": dial tcp 192.168.56.21:6443: connect: connection refused
E1211 16:56:51.974306 1 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.CSINode: failed to list *v1.CSINode: Get "https://192.168.56.21:6443/apis/storage.k8s.io/v1/csinodes?limit=500&resourceVersion=0": dial tcp 192.168.56.21:6443: connect: connection refused
W1211 16:56:51.974366 1 reflector.go:535] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.CSIStorageCapacity: Get "https://192.168.56.21:6443/apis/storage.k8s.io/v1/csistoragecapacities?limit=500&resourceVersion=0": dial tcp 192.168.56.21:6443: connect: connection refused
E1211 16:56:51.974395 1 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.CSIStorageCapacity: failed to list *v1.CSIStorageCapacity: Get "https://192.168.56.21:6443/apis/storage.k8s.io/v1/csistoragecapacities?limit=500&resourceVersion=0": dial tcp 192.168.56.21:6443: connect: connection refused
W1211 16:56:51.974450 1 reflector.go:535] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Node: Get "https://192.168.56.21:6443/api/v1/nodes?limit=500&resourceVersion=0": dial tcp 192.168.56.21:6443: connect: connection refused
E1211 16:56:51.974478 1 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://192.168.56.21:6443/api/v1/nodes?limit=500&resourceVersion=0": dial tcp 192.168.56.21:6443: connect: connection refused
W1211 16:56:51.974537 1 reflector.go:535] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.StorageClass: Get "https://192.168.56.21:6443/apis/storage.k8s.io/v1/storageclasses?limit=500&resourceVersion=0": dial tcp 192.168.56.21:6443: connect: connection refused
E1211 16:56:51.974561 1 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.StorageClass: failed to list *v1.StorageClass: Get "https://192.168.56.21:6443/apis/storage.k8s.io/v1/storageclasses?limit=500&resourceVersion=0": dial tcp 192.168.56.21:6443: connect: connection refused
W1211 16:56:51.974617 1 reflector.go:535] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Pod: Get "https://192.168.56.21:6443/api/v1/pods?fieldSelector=status.phase%21%3DSucceeded%2Cstatus.phase%21%3DFailed&limit=500&resourceVersion=0": dial tcp 192.168.56.21:6443: connect: connection refused
E1211 16:56:51.974640 1 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Pod: failed to list *v1.Pod: Get "https://192.168.56.21:6443/api/v1/pods?fieldSelector=status.phase%21%3DSucceeded%2Cstatus.phase%21%3DFailed&limit=500&resourceVersion=0": dial tcp 192.168.56.21:6443: connect: connection refused
W1211 16:56:51.974703 1 reflector.go:535] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.PodDisruptionBudget: Get "https://192.168.56.21:6443/apis/policy/v1/poddisruptionbudgets?limit=500&resourceVersion=0": dial tcp 192.168.56.21:6443: connect: connection refused
E1211 16:56:56.570686 1 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.StatefulSet: failed to list *v1.StatefulSet: Get "https://192.168.56.21:6443/apis/apps/v1/statefulsets?limit=500&resourceVersion=0": dial tcp 192.168.56.21:6443: connect: connection refused
W1211 16:57:02.094444 1 reflector.go:535] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.ReplicationController: Get "https://192.168.56.21:6443/api/v1/replicationcontrollers?limit=500&resourceVersion=0": dial tcp 192.168.56.21:6443: connect: connection refused
E1211 16:57:02.094639 1 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.ReplicationController: failed to list *v1.ReplicationController: Get "https://192.168.56.21:6443/api/v1/replicationcontrollers?limit=500&resourceVersion=0": dial tcp 192.168.56.21:6443: connect: connection refused
W1211 16:57:02.326461 1 reflector.go:535] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.StatefulSet: Get "https://192.168.56.21:6443/apis/apps/v1/statefulsets?limit=500&resourceVersion=0": dial tcp 192.168.56.21:6443: connect: connection refused
E1211 16:57:02.326682 1 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.StatefulSet: failed to list *v1.StatefulSet: Get "https://192.168.56.21:6443/apis/apps/v1/statefulsets?limit=500&resourceVersion=0": dial tcp 192.168.56.21:6443: connect: connection refused
E1211 16:57:05.294891 1 server.go:214] "waiting for handlers to sync" err="context canceled"
I1211 16:57:05.295817 1 leaderelection.go:250] attempting to acquire leader lease kube-system/kube-scheduler...
I1211 16:57:05.295861 1 server.go:238] "Requested to terminate, exiting"
从/var/log/container/kube-apiserver-xxx.log
:
2023-12-12T05:28:46.455303494Z stderr F W1212 05:28:46.455224 1 logging.go:59] [core] [Channel #3 SubChannel #4] grpc: addrConn.createTransport failed to connect to {Addr: "1
27.0.0.1:2379", ServerName: "127.0.0.1", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused"
2023-12-12T05:28:46.455514253Z stderr F W1212 05:28:46.455457 1 logging.go:59] [core] [Channel #94 SubChannel #95] grpc: addrConn.createTransport failed to connect to {Addr:
"127.0.0.1:2379", ServerName: "127.0.0.1", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused"
2023-12-12T05:28:46.455632271Z stderr F W1212 05:28:46.455576 1 logging.go:59] [core] [Channel #31 SubChannel #32] grpc: addrConn.createTransport failed to connect to {Addr:
"127.0.0.1:2379", ServerName: "127.0.0.1", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused"
2023-12-12T05:28:46.455798916Z stderr F W1212 05:28:46.455744 1 logging.go:59] [core] [Channel #67 SubChannel #68] grpc: addrConn.createTransport failed to connect to {Addr:
"127.0.0.1:2379", ServerName: "127.0.0.1", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused"
我已经多次破坏了集群并重新创建它,按照步骤尝试找出导致问题的原因,但几乎立即就kubeadm init
开始失败。
repo 是我用来创建集群的确切过程。关于如何确定持续崩溃,您有什么见解吗?