在我的主节点上,它曾经工作正常。今天,kubectl get nodes
命令导致The connection to the server 192.168.134.129:6443 was refused - did you specify the right host or port?
我做了几件事:
ps -aux | grep api
输出:
root 3529 16.0 4.0 820896 71120 ? Ssl 00:59 0:00 kube-apiserver --advertise-address=192.168.134.129 --allow-privileged=true --authorization-mode=Node,RBAC --client-ca-file=/etc/kubernetes/pki/ca.crt --enable-admission-plugins=NodeRestriction --enable-bootstrap-token-auth=true --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key --etcd-servers=https://127.0.0.1:2379 --insecure-port=0 --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key --requestheader-allowed-names=front-proxy-client --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt --requestheader-extra-headers-prefix=X-Remote-Extra- --requestheader-group-headers=X-Remote-Group --requestheader-username-headers=X-Remote-User --secure-port=6443 --service-account-issuer=https://kubernetes.default.svc.cluster.local --service-account-key-file=/etc/kubernetes/pki/sa.pub --service-account-signing-key-file=/etc/kubernetes/pki/sa.key --service-cluster-ip-range=10.96.0.0/12 --tls-cert-file=/etc/kubernetes/pki/apiserver.crt --tls-private-key-file=/etc/kubernetes/pki/apiserver.key
env | grep -i kub
输出为空。
systemctl status docker.service
输出:
docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2021-02-10 00:58:58 UTC; 2min 2s ago
TriggeredBy: ● docker.socket
Docs: https://docs.docker.com
Main PID: 882 (dockerd)
Tasks: 18
Memory: 134.5M
CGroup: /system.slice/docker.service
└─882 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
Feb 10 00:58:58 server1 dockerd[882]: time="2021-02-10T00:58:58.308455221Z" level=info msg="Daemon has completed initialization"
Feb 10 00:58:58 server1 dockerd[882]: time="2021-02-10T00:58:58.354601077Z" level=info msg="API listen on /run/docker.sock"
Feb 10 00:58:58 server1 systemd[1]: Started Docker Application Container Engine.
Feb 10 00:59:04 server1 dockerd[882]: time="2021-02-10T00:59:04.388230017Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelet>
Feb 10 00:59:24 server1 dockerd[882]: time="2021-02-10T00:59:24.266151129Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelet>
Feb 10 00:59:26 server1 dockerd[882]: time="2021-02-10T00:59:26.018774870Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelet>
Feb 10 00:59:55 server1 dockerd[882]: time="2021-02-10T00:59:55.914896185Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelet>
Feb 10 01:00:01 server1 dockerd[882]: time="2021-02-10T01:00:01.214287560Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelet>
Feb 10 01:00:37 server1 dockerd[882]: time="2021-02-10T01:00:37.987987183Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelet>
Feb 10 01:00:42 server1 dockerd[882]: time="2021-02-10T01:00:42.227305876Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelet>
systemctl status kubelet.service
输出:
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Wed 2021-02-10 00:58:52 UTC; 25min ago
Docs: https://kubernetes.io/docs/home/
Main PID: 854 (kubelet)
Tasks: 14 (limit: 1953)
Memory: 120.2M
CGroup: /system.slice/kubelet.service
└─854 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --network>
Feb 10 01:24:21 server1 kubelet[854]: E0210 01:24:21.554100 854 kubelet.go:2243] node "server1" not found
Feb 10 01:24:21 server1 kubelet[854]: E0210 01:24:21.655541 854 kubelet.go:2243] node "server1" not found
Feb 10 01:24:21 server1 kubelet[854]: E0210 01:24:21.756748 854 kubelet.go:2243] node "server1" not found
Feb 10 01:24:21 server1 kubelet[854]: E0210 01:24:21.857632 854 kubelet.go:2243] node "server1" not found
Feb 10 01:24:21 server1 kubelet[854]: E0210 01:24:21.958539 854 kubelet.go:2243] node "server1" not found
Feb 10 01:24:22 server1 kubelet[854]: E0210 01:24:22.059576 854 kubelet.go:2243] node "server1" not found
Feb 10 01:24:22 server1 kubelet[854]: E0210 01:24:22.160644 854 kubelet.go:2243] node "server1" not found
Feb 10 01:24:22 server1 kubelet[854]: E0210 01:24:22.261714 854 kubelet.go:2243] node "server1" not found
Feb 10 01:24:22 server1 kubelet[854]: E0210 01:24:22.362736 854 kubelet.go:2243] node "server1" not found
Feb 10 01:24:22 server1 kubelet[854]: E0210 01:24:22.463924 854 kubelet.go:2243] node "server1" not found
netstat -pnlt | grep 6443
输出:
tcp6 12 0 :::6443 :::* LISTEN 6196/kube-apiserver
更新
docker logs ${kube_api_sever_docker_container_id}
输出:
Flag --insecure-port has been deprecated, This flag has no effect now and will be removed in v1.24.
I0213 00:46:27.738611 1 server.go:632] external host was not specified, using 192.168.134.129
I0213 00:46:27.739309 1 server.go:182] Version: v1.20.2
I0213 00:46:28.410136 1 shared_informer.go:240] Waiting for caches to sync for node_authorizer
I0213 00:46:28.411492 1 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook.
I0213 00:46:28.411554 1 plugins.go:161] Loaded 10 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionWebhook,ResourceQuota.
I0213 00:46:28.413015 1 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook.
I0213 00:46:28.413077 1 plugins.go:161] Loaded 10 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionWebhook,ResourceQuota.
I0213 00:46:28.415165 1 client.go:360] parsed scheme: "endpoint"
I0213 00:46:28.415213 1 endpoint.go:68] ccResolverWrapper: sending new addresses to cc: [{https://127.0.0.1:2379 <nil> 0 <nil>}]
W0213 00:46:28.415674 1 clientconn.go:1223] grpc: addrConn.createTransport failed to connect to {https://127.0.0.1:2379 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
I0213 00:46:29.410414 1 client.go:360] parsed scheme: "endpoint"
I0213 00:46:29.410532 1 endpoint.go:68] ccResolverWrapper: sending new addresses to cc: [{https://127.0.0.1:2379 <nil> 0 <nil>}]
W0213 00:46:29.411469 1 clientconn.go:1223] grpc: addrConn.createTransport failed to connect to {https://127.0.0.1:2379 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0213 00:46:29.416635 1 clientconn.go:1223] grpc: addrConn.createTransport failed to connect to {https://127.0.0.1:2379 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0213 00:46:30.412469 1 clientconn.go:1223] grpc: addrConn.createTransport failed to connect to {https://127.0.0.1:2379 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0213 00:46:31.360814 1 clientconn.go:1223] grpc: addrConn.createTransport failed to connect to {https://127.0.0.1:2379 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0213 00:46:31.758564 1 clientconn.go:1223] grpc: addrConn.createTransport failed to connect to {https://127.0.0.1:2379 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0213 00:46:33.460810 1 clientconn.go:1223] grpc: addrConn.createTransport failed to connect to {https://127.0.0.1:2379 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0213 00:46:34.675812 1 clientconn.go:1223] grpc: addrConn.createTransport failed to connect to {https://127.0.0.1:2379 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0213 00:46:37.405884 1 clientconn.go:1223] grpc: addrConn.createTransport failed to connect to {https://127.0.0.1:2379 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0213 00:46:38.764105 1 clientconn.go:1223] grpc: addrConn.createTransport failed to connect to {https://127.0.0.1:2379 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0213 00:46:42.751449 1 clientconn.go:1223] grpc: addrConn.createTransport failed to connect to {https://127.0.0.1:2379 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0213 00:46:44.902545 1 clientconn.go:1223] grpc: addrConn.createTransport failed to connect to {https://127.0.0.1:2379 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
Error: context deadline exceeded
答案1
拨打 dial tcp 127.0.0.1:2379 时出错:连接:连接被拒绝”。
是的,如果 Kubernetes 出了什么问题,那总是etcd
你将需要开始灾难恢复流程对于 etcd,因为它包含构成集群“本身”的所有内容的约 80%,其余 20% 是各种 PKI 工件(用于控制平面和 etcd 本身)
如果您的控制平面是 HA,您可能已经在其他 apiserver 节点上拥有可运行的 etcd 成员,这将极大地帮助恢复过程。如果您的设置只有一个 apiserver 实例,那么您需要确定 etcd 的/var/lib/etcd
存储位置(它可能已从主机上的同一路径卷挂载,或者——不太可能——在某种 PVC 中)
答案2
@ZhaoGang 我想知道您是否已经解决了该问题,因为我遇到了与您完全类似的问题,所有故障排除的输出都相同,请告诉我您的结果。