[microstack][k8s][calico] k8s 中不同子网的 pod 之间无法 ping 通

[microstack][k8s][calico] k8s 中不同子网的 pod 之间无法 ping 通

从 microstack 创建两个 vm,用于使用 calico CNI 部署 k8s
VM1(extest-1):内部:192.168.122.204/外部:128.224.157.145
VM1(extest-2):内部:192.168.122.72/外部:128.224.157.139

calico 配置:

ubuntu@extest-1:~$ cat custom-resources.yaml 
# This section includes base Calico installation configuration.
# For more information, see: https://docs.tigera.io/calico/latest/reference/installation/api#operator.tigera.io/v1.Installation
apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
  name: default
spec:
  # Configures Calico networking.
  calicoNetwork:
    # Note: The ipPools section cannot be modified post-install.
    ipPools:
    - blockSize: 26
      cidr: 172.22.0.0/16
      encapsulation: VXLANCrossSubnet
      natOutgoing: Enabled
      nodeSelector: all()

---

# This section configures the Calico API server.
# For more information, see: https://docs.tigera.io/calico/latest/reference/installation/api#operator.tigera.io/v1.APIServer
apiVersion: operator.tigera.io/v1
kind: APIServer
metadata:
  name: default
spec: {}


ubuntu@extest-1:~$ ip route
default via 128.224.157.1 dev ens4 proto static 
default via 192.168.122.1 dev ens3 proto dhcp src 192.168.122.204 metric 100 
default via 192.168.122.1 dev ens3 proto dhcp metric 100 
128.224.157.0/24 dev ens4 proto kernel scope link src 128.224.157.145 
128.224.160.11 via 192.168.122.1 dev ens3 proto dhcp src 192.168.122.204 metric 100 
128.224.160.12 via 192.168.122.1 dev ens3 proto dhcp src 192.168.122.204 metric 100 
169.254.0.0/16 dev ens4 scope link metric 1000 
169.254.169.254 via 192.168.122.2 dev ens3 proto dhcp src 192.168.122.204 metric 100 
169.254.169.254 via 192.168.122.2 dev ens3 proto dhcp metric 100 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown 
blackhole 172.22.184.128/26 proto 80 
172.22.184.129 dev cali06f33a9668e scope link 
172.22.184.130 dev cali91df5b91d11 scope link 
172.22.184.131 dev cali3082f3602b7 scope link 
172.22.184.132 dev cali24ee372a81b scope link 
172.22.184.133 dev cali2a89713a9c5 scope link 
172.22.184.134 dev cali0393bfc615a scope link 
172.22.184.135 dev cali8c163d3f0a7 scope link 
172.22.246.192/26 via 128.224.157.139 dev ens4 proto 80 onlink 
192.168.122.0/24 dev ens3 proto kernel scope link src 192.168.122.204 metric 100 
192.168.122.1 dev ens3 proto dhcp scope link src 192.168.122.204 metric 100 
192.168.122.2 dev ens3 proto dhcp scope link src 192.168.122.204 metric 100
ubuntu@extest-1:~$ kubectl get pods -A -owide
NAMESPACE          NAME                                       READY   STATUS    RESTARTS   AGE     IP                NODE       NOMINATED NODE   READINESS GATES
calico-apiserver   calico-apiserver-6dc9d48f8b-j294c          1/1     Running   0          3h39m   172.22.184.133    extest-1   <none>           <none>
calico-apiserver   calico-apiserver-6dc9d48f8b-wqglr          1/1     Running   0          3h39m   172.22.184.134    extest-1   <none>           <none>
calico-system      calico-kube-controllers-74895d748f-kb5x8   1/1     Running   0          3h44m   172.22.184.132    extest-1   <none>           <none>
calico-system      calico-node-c76ps                          1/1     Running   0          3h36m   192.168.122.72    extest-2   <none>           <none>
calico-system      calico-node-qm6zf                          1/1     Running   0          3h44m   192.168.122.204   extest-1   <none>           <none>
calico-system      calico-typha-57bb44dfd5-pvmsd              1/1     Running   0          3h44m   192.168.122.204   extest-1   <none>           <none>
calico-system      csi-node-driver-k274s                      2/2     Running   0          3h36m   172.22.246.193    extest-2   <none>           <none>
calico-system      csi-node-driver-n6wv2                      2/2     Running   0          3h44m   172.22.184.131    extest-1   <none>           <none>
default            pingtest-7b5d44b647-dlf4w                  1/1     Running   0          142m    172.22.184.135    extest-1   <none>           <none>
default            pingtest-7b5d44b647-wgnht                  1/1     Running   0          142m    172.22.246.194    extest-2   <none>           <none>
kube-system        coredns-76f75df574-gfmvg                   1/1     Running   0          3h49m   172.22.184.129    extest-1   <none>           <none>
kube-system        coredns-76f75df574-pbmt2                   1/1     Running   0          3h49m   172.22.184.130    extest-1   <none>           <none>
kube-system        etcd-extest-1                              1/1     Running   0          3h49m   192.168.122.204   extest-1   <none>           <none>
kube-system        kube-apiserver-extest-1                    1/1     Running   0          3h49m   192.168.122.204   extest-1   <none>           <none>
kube-system        kube-controller-manager-extest-1           1/1     Running   0          3h49m   192.168.122.204   extest-1   <none>           <none>
kube-system        kube-proxy-8zqpm                           1/1     Running   0          3h49m   192.168.122.204   extest-1   <none>           <none>
kube-system        kube-proxy-dpb6s                           1/1     Running   0          3h36m   192.168.122.72    extest-2   <none>           <none>
kube-system        kube-scheduler-extest-1                    1/1     Running   0          3h49m   192.168.122.204   extest-1   <none>           <none>
tigera-operator    tigera-operator-55585899bf-84997           1/1     Running   0          3h47m   192.168.122.204   extest-1   <none>           <none>

在 pingtest-7b5d44b647-dlf4w(172.22.184.135)中

/ # ping 172.22.246.194
PING 172.22.246.194 (172.22.246.194): 56 data bytes
^C
--- 172.22.246.194 ping statistics ---
10 packets transmitted, 0 packets received, 100% packet loss
/ # nslookup www.google.com 10.96,0,10
nslookup: bad address '10.96,0,10'
/ # nslookup www.google.com 10.96.0.10
Server:     10.96.0.10
Address:    10.96.0.10:53

Non-authoritative answer:
Name:   www.google.com
Address: 199.96.62.21

Non-authoritative answer:
Name:   www.google.com
Address: 2a03:2880:f129:83:face:b00c:0:25de

它可以访问同一子网内的服务和其他 pod,但不能访问另一个子网(节点)内的 pod(如 pod pingtest-7b5d44b647-wgnht:172.22.246.194)

ubuntu@extest-1:~$ ifconfig
...
ens3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1442
        inet 192.168.122.204  netmask 255.255.255.0  broadcast 192.168.122.255
        inet6 fe80::f816:3eff:fe72:3ab6  prefixlen 64  scopeid 0x20<link>
        ether fa:16:3e:72:3a:b6  txqueuelen 1000  (Ethernet)
        RX packets 31201  bytes 20398464 (20.3 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 30892  bytes 6099260 (6.0 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens4: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 128.224.157.145  netmask 255.255.255.0  broadcast 128.224.157.255
        inet6 fe80::f816:3eff:feb0:23a0  prefixlen 64  scopeid 0x20<link>
        ether fa:16:3e:b0:23:a0  txqueuelen 1000  (Ethernet)
        RX packets 542425  bytes 1055748029 (1.0 GB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 224454  bytes 71843285 (71.8 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
...

如果我使用 tcpdump 进行调试,发现数据包可以到达 vm extest-1 上的 ens4

ubuntu@extest-1:~$ sudo tcpdump -i ens4 icmp
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on ens4, link-type EN10MB (Ethernet), snapshot length 262144 bytes
10:58:40.134615 IP 172.22.184.135 > 172.22.246.194: ICMP echo request, id 30, seq 27, length 64
10:58:41.134940 IP 172.22.184.135 > 172.22.246.194: ICMP echo request, id 30, seq 28, length 64
10:58:42.135462 IP 172.22.184.135 > 172.22.246.194: ICMP echo request, id 30, seq 29, length 64
10:58:43.135831 IP 172.22.184.135 > 172.22.246.194: ICMP echo request, id 30, seq 30, length 64

ubuntu@extest-1:~$ ip route get 172.22.246.194
172.22.246.194 via 128.224.157.139 dev ens4 src 128.224.157.145 uid 1001 
    cache 

ubuntu@extest-1:~$ ping 128.224.157.139
PING 128.224.157.139 (128.224.157.139) 56(84) bytes of data.
64 bytes from 128.224.157.139: icmp_seq=1 ttl=64 time=3.89 ms
64 bytes from 128.224.157.139: icmp_seq=2 ttl=64 time=2.09 ms

extest-2 上的 tcpdump,没有数据包到达。
对于 openstack,我还打开了几乎所有安全组的规则,包括 BGP 协议 4

以前有人遇到过这样的问题吗?

答案1

问题解决了。主要原因是 calico ippool 配置。似乎在 calico 的初始化安装过程中添加的路由配置错误。我将 IPIPmode 更改为 Never,然后应用更改。之后,将 IPIPmode 更改为 Always,再次应用。问题解决了。

相关内容