(该问题已从 Stackoverflow 移出)
首先,我为冗长的内容表示歉意,但我认为最好提供尽可能多的细节。
- 主机操作系统:Win10
- 客户操作系统:Ubuntu 20.10 (Groovy)
- Docker CE:5:19.03.15~3-0~ubuntu-bionic
- Kubernetes:1.20.4-00
- VirtualBox:Win10 上为 6.1.18
- eth0:NAT
- eth1:仅主机(192.168.50.1/24)
我有三个控制平面节点,每个节点上都安装了 keepalived/haproxy 组合作为“负载均衡器”,IP 为 192.168.50.100。因此,apiserver 入口点是“poc-lb:8443”,而该入口点又分布在端口 6443 上的控制平面节点之间。每个节点上的 /etc/hosts 如下所示:
- 192.168.50.10 poc-ctrl-1
- 192.168.50.11 poc-ctrl-2
- 192.168.50.12 poc-ctrl-3
- 192.168.50.100 poc-磅
我使用以下命令在 poc-ctrl-1 上初始化 k8s 集群:
sudo kubeadm init --apiserver-advertise-address 192.168.50.10 --control-plane-endpoint poc-lb:8443 --upload-certs
在该节点上初始化后,我使用以下命令部署 weave CNI 插件:
kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
在第一个控制平面节点上部署 weave 插件后,我使用“kubeadm join”命令加入第二和第三个控制平面节点(poc-ctrl-2 和 poc-ctrl-3)(为简洁起见,已删除 --token、discovery-token 和 --certificate-key):
sudo kubeadm join poc-lb:8443 --control-plane --apiserver-advertise-address 192.168.50.11
sudo kubeadm join poc-lb:8443 --control-plane --apiserver-advertise-address 192.168.50.12
节点连接没有问题,但是 weave POD 似乎不太高兴。这是 poc-ctrl-1 上“weave”容器的日志:
DEBU: 2021/03/08 15:03:32.486479 [kube-peers] Checking peer "1e:85:5b:9b:50:c5" against list &{[]}
Peer not in list; removing persisted data
INFO: 2021/03/08 15:03:32.561859 Command line options: map[conn-limit:200 datapath:datapath db-prefix:/weavedb/weave-net docker-api: expect-npc:true http-addr:127.0.0.1:6784 ipalloc-init:consensus=0 ipalloc-range:10.32.0.0/12 metrics-addr:0.0.0.0:6782 name:1e:85:5b:9b:50:c5 nickname:poc-ctrl-1 no-dns:true no-masq-local:true port:6783]
INFO: 2021/03/08 15:03:32.561901 weave 2.8.1
INFO: 2021/03/08 15:03:33.216812 Bridge type is bridged_fastdp
INFO: 2021/03/08 15:03:33.216846 Communication between peers is unencrypted.
INFO: 2021/03/08 15:03:33.224064 Our name is 1e:85:5b:9b:50:c5(poc-ctrl-1)
INFO: 2021/03/08 15:03:33.224115 Launch detected - using supplied peer list: []
INFO: 2021/03/08 15:03:33.224149 Using "no-masq-local" LocalRangeTracker
INFO: 2021/03/08 15:03:33.224155 Checking for pre-existing addresses on weave bridge
INFO: 2021/03/08 15:03:33.233984 [allocator 1e:85:5b:9b:50:c5] No valid persisted data
INFO: 2021/03/08 15:03:33.262924 [allocator 1e:85:5b:9b:50:c5] Initialising via deferred consensus
INFO: 2021/03/08 15:03:33.263027 Sniffing traffic on datapath (via ODP)
INFO: 2021/03/08 15:03:33.265856 Listening for HTTP control messages on 127.0.0.1:6784
INFO: 2021/03/08 15:03:33.266928 Listening for metrics requests on 0.0.0.0:6782
INFO: 2021/03/08 15:03:33.401417 Error checking version: Get "https://checkpoint-api.weave.works/v1/check/weave-net?arch=amd64&flag_docker-version=none&flag_kernel-version=5.8.0-41-generic&os=linux&signature=aQyw2dVd0f8HNRaTeZ8N3lnlww9j0P3J5P359AkeBBk%3D&version=2.8.1": dial tcp: lookup checkpoint-api.weave.works on 10.96.0.10:53: write udp 10.0.2.15:46287->10.96.0.10:53: write: operation not permitted
INFO: 2021/03/08 15:03:33.578810 [kube-peers] Added myself to peer list &{[{1e:85:5b:9b:50:c5 poc-ctrl-1}]}
DEBU: 2021/03/08 15:03:33.588343 [kube-peers] Nodes that have disappeared: map[]
INFO: 2021/03/08 15:03:33.599543 Assuming quorum size of 1
INFO: 2021/03/08 15:03:33.599784 adding entry 10.32.0.0/12 to weaver-no-masq-local of 0
INFO: 2021/03/08 15:03:33.599809 added entry 10.32.0.0/12 to weaver-no-masq-local of 0
10.32.0.1
DEBU: 2021/03/08 15:03:33.684752 registering for updates for node delete events
INFO: 2021/03/08 15:20:34.605758 ->[192.168.50.12:57361] connection accepted
INFO: 2021/03/08 15:20:34.620605 ->[192.168.50.12:57361|a2:18:ea:75:33:ca(poc-ctrl-3)]: connection ready; using protocol version 2
INFO: 2021/03/08 15:20:34.620811 overlay_switch ->[a2:18:ea:75:33:ca(poc-ctrl-3)] using fastdp
INFO: 2021/03/08 15:20:34.620830 ->[192.168.50.12:57361|a2:18:ea:75:33:ca(poc-ctrl-3)]: connection added (new peer)
INFO: 2021/03/08 15:20:34.634204 ->[192.168.50.12:57361|a2:18:ea:75:33:ca(poc-ctrl-3)]: connection fully established
INFO: 2021/03/08 15:20:34.723969 sleeve ->[192.168.50.12:6783|a2:18:ea:75:33:ca(poc-ctrl-3)]: Effective MTU verified at 1438
INFO: 2021/03/08 15:20:35.742452 Discovered remote MAC a2:18:ea:75:33:ca at a2:18:ea:75:33:ca(poc-ctrl-3)
INFO: 2021/03/08 15:20:36.352445 Discovered remote MAC ee:27:39:76:a7:5d at a2:18:ea:75:33:ca(poc-ctrl-3)
INFO: 2021/03/08 15:20:36.510082 Discovered remote MAC be:c8:b2:c2:d2:cf at a2:18:ea:75:33:ca(poc-ctrl-3)
INFO: 2021/03/08 15:21:04.875787 adding entry 10.32.0.0/13 to weaver-no-masq-local of 0
INFO: 2021/03/08 15:21:04.875840 added entry 10.32.0.0/13 to weaver-no-masq-local of 0
INFO: 2021/03/08 15:21:04.876883 adding entry 10.40.0.0/14 to weaver-no-masq-local of 0
INFO: 2021/03/08 15:21:04.876905 added entry 10.40.0.0/14 to weaver-no-masq-local of 0
INFO: 2021/03/08 15:21:04.877778 deleting entry 10.32.0.0/12 from weaver-no-masq-local of 0
INFO: 2021/03/08 15:21:04.877792 deleted entry 10.32.0.0/12 from weaver-no-masq-local of 0
这是 poc-ctrl-2 上“weave”容器的日志:
DEBU: 2021/03/08 15:40:06.625988 [kube-peers] Checking peer "9a:7c:0f:a1:76:36" against list &{[{1e:85:5b:9b:50:c5 poc-ctrl-1}]}
Peer not in list; removing persisted data
FATA: 2021/03/08 15:40:36.654217 [kube-peers] Could not get Kubernetes version: Get "https://10.96.0.1:443/version?timeout=32s": dial tcp 10.96.0.1:443: i/o timeout
最后,poc-ctrl-3 上的“weave”容器的日志:
FATA: 2021/03/08 15:21:04.964921 [kube-peers] Could not update peer list: Unable to fetch ConfigMap kube-system/weave-net: Get "https://10.96.0.1:443/api/v1/namespaces/kube-system/configmaps/weave-net": dial tcp 10.96.0.1:443: i/o timeout
INFO: 2021/03/08 15:21:04.981699 adding entry 10.44.0.0/14 to weaver-no-masq-local of 0
INFO: 2021/03/08 15:21:04.981948 added entry 10.44.0.0/14 to weaver-no-masq-local of 0
10.44.0.0
INFO: 2021/03/08 15:21:16.935459 ->[192.168.50.11:6783] attempting connection
INFO: 2021/03/08 15:21:16.936059 ->[192.168.50.11:6783] error during connection attempt: dial tcp :0->192.168.50.11:6783: connect: connection refused
FATA: 2021/03/08 15:21:35.037984 [kube-peers] could not set node status: Patch "https://10.96.0.1:443/api/v1/nodes/poc-ctrl-3/status": dial tcp 10.96.0.1:443: i/o timeout
INFO: 2021/03/08 15:21:40.255913 ->[192.168.50.11:6783] attempting connection
INFO: 2021/03/08 15:21:40.256478 ->[192.168.50.11:6783] error during connection attempt: dial tcp :0->192.168.50.11:6783: connect: connection refused
INFO: 2021/03/08 15:21:59.917279 Discovered remote MAC 4a:0d:3e:de:62:b4 at 1e:85:5b:9b:50:c5(poc-ctrl-1)
INFO: 2021/03/08 15:22:30.157989 ->[192.168.50.11:6783] attempting connection
INFO: 2021/03/08 15:22:30.158579 ->[192.168.50.11:6783] error during connection attempt: dial tcp :0->192.168.50.11:6783: connect: connection refused
INFO: 2021/03/08 15:23:25.508244 ->[192.168.50.11:6783] attempting connection
INFO: 2021/03/08 15:23:25.508785 ->[192.168.50.11:6783] error during connection attempt: dial tcp :0->192.168.50.11:6783: connect: connection refused
INFO: 2021/03/08 15:24:57.982083 ->[192.168.50.11:6783] attempting connection
INFO: 2021/03/08 15:24:57.982653 ->[192.168.50.11:6783] error during connection attempt: dial tcp :0->192.168.50.11:6783: connect: connection refused
INFO: 2021/03/08 15:26:10.300785 ->[192.168.50.11:6783] attempting connection
INFO: 2021/03/08 15:26:10.301685 ->[192.168.50.11:6783] error during connection attempt: dial tcp :0->192.168.50.11:6783: connect: connection refused
INFO: 2021/03/08 15:27:42.395131 ->[192.168.50.11:6783] attempting connection
INFO: 2021/03/08 15:27:42.395556 ->[192.168.50.11:6783] error during connection attempt: dial tcp :0->192.168.50.11:6783: connect: connection refused
INFO: 2021/03/08 15:34:00.374000 ->[192.168.50.11:6783] attempting connection
INFO: 2021/03/08 15:34:00.374547 ->[192.168.50.11:6783] error during connection attempt: dial tcp :0->192.168.50.11:6783: connect: connection refused
INFO: 2021/03/08 15:40:56.090626 ->[192.168.50.11:6783] attempting connection
INFO: 2021/03/08 15:40:56.091130 ->[192.168.50.11:6783] error during connection attempt: dial tcp :0->192.168.50.11:6783: connect: connection refused
所有节点都已加载“br_netfilter” net.bridge.bridge-nf-call-iptables = 1
。
IP 10.96.0.1 分配给 443/tcp 上的 kubernetes 服务,端口 6783/tcp 和 678(3|4)/udp 由 weave 使用。根据上面的输出,我感觉我遇到了一些与 iptables 相关的问题,或者数据包是否通过了(eth0 接口)上的默认路由?
ip 路由给出:
default via 10.0.2.2 dev eth0 proto dhcp src 10.0.2.15 metric 100
10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15
10.0.2.2 dev eth0 proto dhcp scope link src 10.0.2.15 metric 100
10.32.0.0/12 dev weave proto kernel scope link src 10.32.0.1
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.50.0/24 dev eth1 proto kernel scope link src 192.168.50.10
我在这里遗漏了什么?
答案1
检查完 iptables 规则后,我感觉分配给 k8s svc 的 IP 必须路由到“错误”的接口。我发出了
sudo ip route add 10.96.0.1 dev eth1
开始编织!