kubectl get nodes 命令导致与服务器 192.168.134.129:6443 的连接被拒绝 - 您是否指定了正确的主机或端口?

kubectl get nodes 命令导致与服务器 192.168.134.129:6443 的连接被拒绝 - 您是否指定了正确的主机或端口?

在我的主节点上,它曾经工作正常。今天,kubectl get nodes命令导致The connection to the server 192.168.134.129:6443 was refused - did you specify the right host or port?

我做了几件事:

  1. ps -aux | grep api

输出:

root        3529 16.0  4.0 820896 71120 ?        Ssl  00:59   0:00 kube-apiserver --advertise-address=192.168.134.129 --allow-privileged=true --authorization-mode=Node,RBAC --client-ca-file=/etc/kubernetes/pki/ca.crt --enable-admission-plugins=NodeRestriction --enable-bootstrap-token-auth=true --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key --etcd-servers=https://127.0.0.1:2379 --insecure-port=0 --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key --requestheader-allowed-names=front-proxy-client --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt --requestheader-extra-headers-prefix=X-Remote-Extra- --requestheader-group-headers=X-Remote-Group --requestheader-username-headers=X-Remote-User --secure-port=6443 --service-account-issuer=https://kubernetes.default.svc.cluster.local --service-account-key-file=/etc/kubernetes/pki/sa.pub --service-account-signing-key-file=/etc/kubernetes/pki/sa.key --service-cluster-ip-range=10.96.0.0/12 --tls-cert-file=/etc/kubernetes/pki/apiserver.crt --tls-private-key-file=/etc/kubernetes/pki/apiserver.key
  1. env | grep -i kub

输出为空。

  1. systemctl status docker.service

输出:

docker.service - Docker Application Container Engine
     Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
     Active: active (running) since Wed 2021-02-10 00:58:58 UTC; 2min 2s ago
TriggeredBy: ● docker.socket
       Docs: https://docs.docker.com
   Main PID: 882 (dockerd)
      Tasks: 18
     Memory: 134.5M
     CGroup: /system.slice/docker.service
             └─882 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock

Feb 10 00:58:58 server1 dockerd[882]: time="2021-02-10T00:58:58.308455221Z" level=info msg="Daemon has completed initialization"
Feb 10 00:58:58 server1 dockerd[882]: time="2021-02-10T00:58:58.354601077Z" level=info msg="API listen on /run/docker.sock"
Feb 10 00:58:58 server1 systemd[1]: Started Docker Application Container Engine.
Feb 10 00:59:04 server1 dockerd[882]: time="2021-02-10T00:59:04.388230017Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelet>
Feb 10 00:59:24 server1 dockerd[882]: time="2021-02-10T00:59:24.266151129Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelet>
Feb 10 00:59:26 server1 dockerd[882]: time="2021-02-10T00:59:26.018774870Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelet>
Feb 10 00:59:55 server1 dockerd[882]: time="2021-02-10T00:59:55.914896185Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelet>
Feb 10 01:00:01 server1 dockerd[882]: time="2021-02-10T01:00:01.214287560Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelet>
Feb 10 01:00:37 server1 dockerd[882]: time="2021-02-10T01:00:37.987987183Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelet>
Feb 10 01:00:42 server1 dockerd[882]: time="2021-02-10T01:00:42.227305876Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelet>
  1. systemctl status kubelet.service

输出:

● kubelet.service - kubelet: The Kubernetes Node Agent
     Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/kubelet.service.d
             └─10-kubeadm.conf
     Active: active (running) since Wed 2021-02-10 00:58:52 UTC; 25min ago
       Docs: https://kubernetes.io/docs/home/
   Main PID: 854 (kubelet)
      Tasks: 14 (limit: 1953)
     Memory: 120.2M
     CGroup: /system.slice/kubelet.service
             └─854 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --network>

Feb 10 01:24:21 server1 kubelet[854]: E0210 01:24:21.554100     854 kubelet.go:2243] node "server1" not found
Feb 10 01:24:21 server1 kubelet[854]: E0210 01:24:21.655541     854 kubelet.go:2243] node "server1" not found
Feb 10 01:24:21 server1 kubelet[854]: E0210 01:24:21.756748     854 kubelet.go:2243] node "server1" not found
Feb 10 01:24:21 server1 kubelet[854]: E0210 01:24:21.857632     854 kubelet.go:2243] node "server1" not found
Feb 10 01:24:21 server1 kubelet[854]: E0210 01:24:21.958539     854 kubelet.go:2243] node "server1" not found
Feb 10 01:24:22 server1 kubelet[854]: E0210 01:24:22.059576     854 kubelet.go:2243] node "server1" not found
Feb 10 01:24:22 server1 kubelet[854]: E0210 01:24:22.160644     854 kubelet.go:2243] node "server1" not found
Feb 10 01:24:22 server1 kubelet[854]: E0210 01:24:22.261714     854 kubelet.go:2243] node "server1" not found
Feb 10 01:24:22 server1 kubelet[854]: E0210 01:24:22.362736     854 kubelet.go:2243] node "server1" not found
Feb 10 01:24:22 server1 kubelet[854]: E0210 01:24:22.463924     854 kubelet.go:2243] node "server1" not found
  1. netstat -pnlt | grep 6443

输出:

tcp6 12 0 :::6443 :::* LISTEN 6196/kube-apiserver


更新

docker logs ${kube_api_sever_docker_container_id}输出:

Flag --insecure-port has been deprecated, This flag has no effect now and will be removed in v1.24.
I0213 00:46:27.738611       1 server.go:632] external host was not specified, using 192.168.134.129
I0213 00:46:27.739309       1 server.go:182] Version: v1.20.2
I0213 00:46:28.410136       1 shared_informer.go:240] Waiting for caches to sync for node_authorizer
I0213 00:46:28.411492       1 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook.
I0213 00:46:28.411554       1 plugins.go:161] Loaded 10 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionWebhook,ResourceQuota.
I0213 00:46:28.413015       1 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook.
I0213 00:46:28.413077       1 plugins.go:161] Loaded 10 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionWebhook,ResourceQuota.
I0213 00:46:28.415165       1 client.go:360] parsed scheme: "endpoint"
I0213 00:46:28.415213       1 endpoint.go:68] ccResolverWrapper: sending new addresses to cc: [{https://127.0.0.1:2379  <nil> 0 <nil>}]
W0213 00:46:28.415674       1 clientconn.go:1223] grpc: addrConn.createTransport failed to connect to {https://127.0.0.1:2379  <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
I0213 00:46:29.410414       1 client.go:360] parsed scheme: "endpoint"
I0213 00:46:29.410532       1 endpoint.go:68] ccResolverWrapper: sending new addresses to cc: [{https://127.0.0.1:2379  <nil> 0 <nil>}]
W0213 00:46:29.411469       1 clientconn.go:1223] grpc: addrConn.createTransport failed to connect to {https://127.0.0.1:2379  <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0213 00:46:29.416635       1 clientconn.go:1223] grpc: addrConn.createTransport failed to connect to {https://127.0.0.1:2379  <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0213 00:46:30.412469       1 clientconn.go:1223] grpc: addrConn.createTransport failed to connect to {https://127.0.0.1:2379  <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0213 00:46:31.360814       1 clientconn.go:1223] grpc: addrConn.createTransport failed to connect to {https://127.0.0.1:2379  <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0213 00:46:31.758564       1 clientconn.go:1223] grpc: addrConn.createTransport failed to connect to {https://127.0.0.1:2379  <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0213 00:46:33.460810       1 clientconn.go:1223] grpc: addrConn.createTransport failed to connect to {https://127.0.0.1:2379  <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0213 00:46:34.675812       1 clientconn.go:1223] grpc: addrConn.createTransport failed to connect to {https://127.0.0.1:2379  <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0213 00:46:37.405884       1 clientconn.go:1223] grpc: addrConn.createTransport failed to connect to {https://127.0.0.1:2379  <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0213 00:46:38.764105       1 clientconn.go:1223] grpc: addrConn.createTransport failed to connect to {https://127.0.0.1:2379  <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0213 00:46:42.751449       1 clientconn.go:1223] grpc: addrConn.createTransport failed to connect to {https://127.0.0.1:2379  <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0213 00:46:44.902545       1 clientconn.go:1223] grpc: addrConn.createTransport failed to connect to {https://127.0.0.1:2379  <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
Error: context deadline exceeded

答案1

拨打 dial tcp 127.0.0.1:2379 时出错:连接:连接被拒绝”。

是的,如果 Kubernetes 出了什么问题,那总是etcd

你将需要开始灾难恢复流程对于 etcd,因为它包含构成集群“本身”的所有内容的约 80%,其余 20% 是各种 PKI 工件(用于控制平面和 etcd 本身)

如果您的控制平面是 HA,您可能已经在其他 apiserver 节点上拥有可运行的 etcd 成员,这将极大地帮助恢复过程。如果您的设置只有一个 apiserver 实例,那么您需要确定 etcd 的/var/lib/etcd存储位置(它可能已从主机上的同一路径卷挂载,或者——不太可能——在某种 PVC 中)

答案2

@ZhaoGang 我想知道您是否已经解决了该问题,因为我遇到了与您完全类似的问题,所有故障排除的输出都相同,请告诉我您的结果。

相关内容