kubernetes 集群网络问题

kubernetes 集群网络问题

我在 VPN 内有一个由一个主节点和三个节点组成的 Kubernetes 集群,它显示就绪状态。它是使用 kubeadm 和 flannel 构建的。VPN 网络范围为 192.168.1.0/16。

$ kubectl 获取节点 -o wide

NAME        STATUS   ROLES    AGE    VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
k8-master   Ready    master   144d   v1.17.0   192.168.1.132   <none>        Ubuntu 18.04.3 LTS   4.15.0-72-generic   docker://18.9.7
k8-n1       Ready    <none>   144d   v1.17.0   192.168.1.133   <none>        Ubuntu 18.04.3 LTS   4.15.0-72-generic   docker://18.9.7
k8-n2       Ready    <none>   144d   v1.17.0   192.168.1.134   <none>        Ubuntu 18.04.3 LTS   4.15.0-72-generic   docker://18.9.7
k8-n3       Ready    <none>   144d   v1.17.0   192.168.1.135   <none>        Ubuntu 18.04.3 LTS   4.15.0-72-generic   docker://18.9.7

我可以到达节点。

$ ping 192.168.1.133

PING 192.168.1.133 (192.168.1.133) 56(84) bytes of data.
64 bytes from 192.168.1.133: icmp_seq=1 ttl=64 time=0.219 ms
64 bytes from 192.168.1.133: icmp_seq=2 ttl=64 time=0.246 ms
64 bytes from 192.168.1.133: icmp_seq=3 ttl=64 time=0.199 ms
64 bytes from 192.168.1.133: icmp_seq=4 ttl=64 time=0.209 ms
^X^C
--- 192.168.1.133 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3071ms
rtt min/avg/max/mdev = 0.199/0.218/0.246/0.020 ms

$ ping 192.168.1.134

PING 192.168.1.134 (192.168.1.134) 56(84) bytes of data.
64 bytes from 192.168.1.134: icmp_seq=1 ttl=64 time=0.288 ms
64 bytes from 192.168.1.134: icmp_seq=2 ttl=64 time=0.272 ms
64 bytes from 192.168.1.134: icmp_seq=3 ttl=64 time=0.268 ms
^C
--- 192.168.1.134 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2032ms
rtt min/avg/max/mdev = 0.268/0.276/0.288/0.008 ms

$ ping 192.168.1.135

PING 192.168.1.135 (192.168.1.135) 56(84) bytes of data.
64 bytes from 192.168.1.135: icmp_seq=1 ttl=64 time=0.278 ms
64 bytes from 192.168.1.135: icmp_seq=2 ttl=64 time=0.221 ms
64 bytes from 192.168.1.135: icmp_seq=3 ttl=64 time=0.181 ms
^C
--- 192.168.1.135 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2030ms

但我设置了 nginx 2 pods 部署来测试它是否有效

nginx-deployment-574b87c764-2gz8t   1/1     Running   0          25m     192.168.2.12   k8-n2   <none>           <none>
nginx-deployment-574b87c764-rst8x   1/1     Running   0          25m     192.168.1.17   k8-n1   <none>           <none>

$ kubectl 获取服务

NAME               TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
kubernetes         ClusterIP   10.96.0.1       <none>        443/TCP        3d17h
nginx-deployment   NodePort    10.96.211.211   <none>        80:31577/TCP   13s

我无法连接它。

$ curl k8-n1:31577
curl: (7) Failed to connect to k8-n1 port 31577: Connection refused
$ curl k8-n2:31577
curl: (7) Failed to connect to k8-n2 port 31577: Connection refused
$ curl k8-n3:31577
curl: (7) Failed to connect to k8-n3 port 31577: Connection refused
$ curl 10.96.211.211:80
curl: (7) Failed to connect to 10.96.211.211 port 80: Connection refused
$ curl 192.168.1.17:80
curl: (7) Failed to connect to 192.168.1.17 port 80: No route to host
$ curl 192.168.1.17:31577
curl: (7) Failed to connect to 192.168.1.17 port 31577: No route to host
$ curl 192.168.1.133:31577
curl: (7) Failed to connect to 192.168.1.133 port 31577: Connection refused
$ curl 192.168.1.133:6443
curl: (7) Failed to connect to 192.168.1.133 port 6443: Connection refused

我变了:

sudo kubeadm init --pod-network-cidr=192.168.1.0/16 --apiserver-advertise-address=192.168.1.132

我将 flannel.yaml 网络更改为 192.168.1.0/16

kubectl edit cm -n kube-system kube-flannel-cfg

重启后 core-dns pod 描述:

  Normal   Scheduled  109s                default-scheduler  Successfully assigned kube-system/coredns-6955765f44-vwqgm to k8-n1
  Normal   Pulled     106s                kubelet, k8-n1     Container image "k8s.gcr.io/coredns:1.6.5" already present on machine
  Normal   Created    105s                kubelet, k8-n1     Created container coredns
  Normal   Started    105s                kubelet, k8-n1     Started container coredns
  Warning  Unhealthy  3s (x11 over 103s)  kubelet, k8-n1     Readiness probe failed: Get http://192.168.1.19:8181/ready: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy  1s (x5 over 41s)    kubelet, k8-n1     Liveness probe failed: Get http://192.168.1.19:8080/health: dial tcp 192.168.1.19:8080: connect: no route to host
  Normal   Killing    1s                  kubelet, k8-n1     Container coredns failed liveness probe, will be restarted

我将非常感激任何帮助或询问更多信息。

答案1

在检查问题时,我注意到 OP 使用 CIDR 初始化集群,192.168.1.0/16该 CIDR 与该节点 IP 地址重叠,然后导致coreDNSpod 出现问题。

使用新的不同初始化集群CIDR解决了该问题。

相关内容