这是在裸机上使用 Kubespray 构建的新集群。
calicoctl
报告状态不正确的问题是Established
,StatefulSet
成员无法相互通信,并且大多数Ingress
请求需要大约 10 秒才能打开示例 Nginx 页面。
所有其他组件(如 etcd、pods 等)sudo kubectl get cs
均sudo kubectl cluster-info dump
正常。
master-1(192.168.250.111)和 node-1(192.168.250.112)上的 calico-node pod 在日志中没有报告任何错误
master-2(192.168.240.111)和 node-1(192.168.240.112)上的 calico-node pod 在日志中报告错误
bird: BGP: Unexpected connect from unknown address 192.168.240.240 (port 36597)
- 此 IP 是 VPN 路由器的 IP(这些服务器的网关)
master-3(192.168.230.111)和 node-3(192.168.230.112)上的 calico-node pod 在日志中报告错误
bird: BGP: Unexpected connect from unknown address 192.168.230.230 (port 35029)
- 此 IP 是 VPN 路由器的 IP(这些服务器的网关)
192.168.250.112(节点 1):
era@server-node-1:~$ sudo calicoctl node status
Calico process is running.
IPv4 BGP status
+-----------------+-------------------+-------+----------+--------------------------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+-----------------+-------------------+-------+----------+--------------------------------+
| 192.168.250.111 | node-to-node mesh | up | 19:54:47 | Established |
| 192.168.240.111 | node-to-node mesh | start | 19:54:35 | Active Socket: Connection |
| | | | | reset by peer |
| 192.168.230.111 | node-to-node mesh | up | 20:42:31 | Established |
| 192.168.240.112 | node-to-node mesh | start | 19:54:35 | Active Socket: Connection |
| | | | | reset by peer |
| 192.168.230.112 | node-to-node mesh | up | 20:42:30 | Established |
+-----------------+-------------------+-------+----------+--------------------------------+
IPv6 BGP status
No IPv6 peers found.
era@server-node-1:~$
192.168.240.112(节点 2):
era@server-node-2:~$ sudo calicoctl node status
Calico process is running.
IPv4 BGP status
+-----------------+-------------------+-------+----------+--------------------------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+-----------------+-------------------+-------+----------+--------------------------------+
| 192.168.250.111 | node-to-node mesh | start | 19:52:09 | Passive |
| 192.168.240.111 | node-to-node mesh | up | 19:54:37 | Established |
| 192.168.230.111 | node-to-node mesh | start | 19:52:09 | Active Socket: Connection |
| | | | | reset by peer |
| 192.168.250.112 | node-to-node mesh | start | 19:52:09 | Passive |
| 192.168.230.112 | node-to-node mesh | start | 19:52:09 | Active Socket: Connection |
| | | | | reset by peer |
+-----------------+-------------------+-------+----------+--------------------------------+
IPv6 BGP status
No IPv6 peers found.
era@server-node-2:~$
192.168.230.112(节点 3):
era@server-node-3:~$ sudo calicoctl node status
Calico process is running.
IPv4 BGP status
+-----------------+-------------------+-------+----------+-------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+-----------------+-------------------+-------+----------+-------------+
| 192.168.250.111 | node-to-node mesh | up | 20:42:31 | Established |
| 192.168.240.111 | node-to-node mesh | start | 19:51:59 | Passive |
| 192.168.230.111 | node-to-node mesh | up | 19:54:25 | Established |
| 192.168.250.112 | node-to-node mesh | up | 20:42:30 | Established |
| 192.168.240.112 | node-to-node mesh | start | 19:51:59 | Passive |
+-----------------+-------------------+-------+----------+-------------+
IPv6 BGP status
No IPv6 peers found.
era@server-node-3:~$
我尝试设置精确的网络接口以查看是否有帮助 - 没有帮助:
era@server-master-1:~$ kubectl set env daemonset/calico-node -n kube-system IP_AUTODETECTION_METHOD=interface=ens3
daemonset.apps/calico-node env updated
尝试使用 179 从任意节点和主节点测试nc
任意节点和主节点的端口,并且成功了。
操作系统采用Ubuntu 18.04。
还有什么建议可以在 Calico 中调试以解决问题吗?任何提示都有助于更接近解决方案。
更新
我发现问题与缺失路线有关。
以下是 192.168.250.112 的输出。因此,它无法到达 192.168.240.x 中的节点和主节点,因为没有路由:
era@server-node-1:~$ ip route | grep tun
10.233.76.0/24 via 192.168.230.112 dev tunl0 proto bird onlink
10.233.77.0/24 via 192.168.230.111 dev tunl0 proto bird onlink
10.233.79.0/24 via 192.168.250.111 dev tunl0 proto bird onlink
era@server-node-1:~$ sudo calicoctl node status
Calico process is running.
IPv4 BGP status
+-----------------+-------------------+-------+----------+--------------------------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+-----------------+-------------------+-------+----------+--------------------------------+
| 192.168.250.111 | node-to-node mesh | up | 21:39:05 | Established |
| 192.168.240.111 | node-to-node mesh | start | 19:54:35 | Connect Socket: Connection |
| | | | | reset by peer |
| 192.168.230.111 | node-to-node mesh | up | 20:42:31 | Established |
| 192.168.240.112 | node-to-node mesh | start | 19:54:35 | Connect Socket: Connection |
| | | | | reset by peer |
| 192.168.230.112 | node-to-node mesh | up | 20:42:30 | Established |
+-----------------+-------------------+-------+----------+--------------------------------+
IPv6 BGP status
No IPv6 peers found.
era@server-node-1:~$
以下是 192.168.240.112 的输出。因此,它无法到达 192.168.250.x 和 192.168.230.x 中的节点和主节点,因为没有路由:
era@server-node-2:~$ ip r | grep tunl
10.233.66.0/24 via 192.168.240.111 dev tunl0 proto bird onlink
era@server-node-2:~$ sudo calicoctl node status
Calico process is running.
IPv4 BGP status
+-----------------+-------------------+-------+----------+--------------------------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+-----------------+-------------------+-------+----------+--------------------------------+
| 192.168.250.111 | node-to-node mesh | start | 19:52:10 | Passive |
| 192.168.240.111 | node-to-node mesh | up | 19:54:38 | Established |
| 192.168.230.111 | node-to-node mesh | start | 22:05:18 | Active Socket: Connection |
| | | | | reset by peer |
| 192.168.250.112 | node-to-node mesh | start | 19:52:10 | Passive |
| 192.168.230.112 | node-to-node mesh | start | 22:05:22 | Active Socket: Connection |
| | | | | reset by peer |
+-----------------+-------------------+-------+----------+--------------------------------+
IPv6 BGP status
No IPv6 peers found.
era@server-node-2:~$
以下是 192.168.230.112 的输出。因此,它无法到达 192.168.240.x 中的节点和主节点,因为没有路由:
era@server-node-3:~$ ip r | grep tunl
10.233.77.0/24 via 192.168.230.111 dev tunl0 proto bird onlink
10.233.79.0/24 via 192.168.250.111 dev tunl0 proto bird onlink
10.233.100.0/24 via 192.168.250.112 dev tunl0 proto bird onlink
era@server-node-3:~$ sudo calicoctl node status
Calico process is running.
IPv4 BGP status
+-----------------+-------------------+-------+----------+-------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+-----------------+-------------------+-------+----------+-------------+
| 192.168.250.111 | node-to-node mesh | up | 21:36:50 | Established |
| 192.168.240.111 | node-to-node mesh | start | 19:51:59 | Passive |
| 192.168.230.111 | node-to-node mesh | up | 19:54:25 | Established |
| 192.168.250.112 | node-to-node mesh | up | 20:42:30 | Established |
| 192.168.240.112 | node-to-node mesh | start | 19:51:59 | Passive |
+-----------------+-------------------+-------+----------+-------------+
IPv6 BGP status
No IPv6 peers found.
era@server-node-3:~$
那么为什么这些路线不存在,以及如何通过添加它们来改变这种行为?如果我手动添加,路线会被自动删除。
答案1
问题在于 VPN TUN(第 3 层)上应用了 NATing。Calico 不支持它(或者我不熟悉可用的 NATed 解决方案)。
解决方案:使用路由代替 NAT