升级到 v1.24.0 后(删除 Dockershim 后),我必须安装cri-docker,然后我做了以下事情:
sudo kubeadm init --pod-network-cidr=10.244.0.0/16 --cri-socket=unix:///var/run/cri-dockerd.sock --apiserver-advertise-address=192.168.0.196
我选择的flannel
网络插件:
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
到目前为止,一切都按预期进行,但在主节点上启用调度、加入工作节点并部署我的 Pod 和服务后,我注意到一个奇怪的网络问题,节点端口和集群IP节点之间的服务无法正常工作(使用一个节点时没有问题)。
后来我发现 Pod 正在从docker 网络(172.17.0.*
)而不是来自--pod-network-cidr=10.244.0.0/16
:
masterzulu@master-zulu:~$ kubectl get pods --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
django-space django-588cb669d4-46b4w 1/1 Running 0 3m35s 172.17.0.4 master-zulu
django-space postgres-deployment-b58d5ff94-hs7t4 1/1 Running 0 3m35s 172.17.0.5 master-zulu
kube-system coredns-6d4b75cb6d-8gw6c 1/1 Running 0 7m9s 172.17.0.2 master-zulu
kube-system coredns-6d4b75cb6d-nxlq9 1/1 Running 0 7m9s 172.17.0.3 master-zulu
flannel DaemonSet 正在运行:
kube-system kube-flannel-ds-tqgvk 1/1 Running 0 5m51s 192.168.3.132 master-zulu
并且 podCIDR 已设置:
masterzulu@master-zulu:~$ kubectl get no master-zulu -o json | jq '.spec.podCIDR'
"10.244.0.0/24"
我尝试将该--network-plugin=cni
标志添加到 kubelet 启动配置中,但出现错误,因为该标志与 dockershim 和 v1.24.0 中的其他标志一起被删除。
这是cri-docker:
● cri-docker.service - CRI Interface for Docker Application Container Engine
Loaded: loaded (/etc/systemd/system/cri-docker.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2022-05-25 21:36:57 BST; 5h 34min ago
TriggeredBy: ● cri-docker.socket
Docs: https://docs.mirantis.com
Main PID: 1098 (cri-dockerd)
Tasks: 15
Memory: 53.4M
CGroup: /system.slice/cri-docker.service
└─1098 /usr/local/bin/cri-dockerd --container-runtime-endpoint fd:// --network-plugin=
May 26 01:51:56 master-zulu cri-dockerd[1098]: time="2022-05-26T01:51:56+01:00" level=info msg="Failed to read pod IP from plugin/docker: Couldn't find network status for kube-system/coredns-6d4b75cb6d-nxlq9 through plugin: invalid network status for"
May 26 01:51:56 master-zulu cri-dockerd[1098]: time="2022-05-26T01:51:56+01:00" level=info msg="Failed to read pod IP from plugin/docker: Couldn't find network status for kube-system/coredns-6d4b75cb6d-nxlq9 through plugin: invalid network status for"
May 26 01:51:56 master-zulu cri-dockerd[1098]: time="2022-05-26T01:51:56+01:00" level=info msg="Failed to read pod IP from plugin/docker: Couldn't find network status for kube-system/coredns-6d4b75cb6d-8gw6c through plugin: invalid network status for"
May 26 01:53:13 master-zulu cri-dockerd[1098]: time="2022-05-26T01:53:13+01:00" level=info msg="Will attempt to re-write config file /var/lib/docker/containers/8ee7640d48c129058259b4b7632a0f6173ad8a9e2d5368cf3c9f29d1ea7db13e/resolv.conf as [nameserver 192.168.3.48 nameserver 192.168.0.1]"
May 26 01:55:30 master-zulu cri-dockerd[1098]: time="2022-05-26T01:55:30+01:00" level=info msg="Will attempt to re-write config file /var/lib/docker/containers/f378aff3d077030215ef664d72132b189f8412a8d432e5a554cdbfbb37c3ea19/resolv.conf as [nameserver 10.96.0.10 search django-space.svc.cluster.local svc.cluster.local cluster.local options ndots:5]"
May 26 01:55:30 master-zulu cri-dockerd[1098]: time="2022-05-26T01:55:30+01:00" level=info msg="Failed to read pod IP from plugin/docker: Couldn't find network status for django-space/django-588cb669d4-46b4w through plugin: invalid network status for"
May 26 01:55:31 master-zulu cri-dockerd[1098]: time="2022-05-26T01:55:31+01:00" level=info msg="Failed to read pod IP from plugin/docker: Couldn't find network status for django-space/django-588cb669d4-46b4w through plugin: invalid network status for"
May 26 01:55:43 master-zulu cri-dockerd[1098]: time="2022-05-26T01:55:43+01:00" level=info msg="Will attempt to re-write config file /var/lib/docker/containers/9523255b7991855027185cecbc8420bbe1268fcef21c2ddcb4d76851bce7e3a0/resolv.conf as [nameserver 10.96.0.10 search django-space.svc.cluster.local svc.cluster.local cluster.local options ndots:5]"
May 26 01:55:43 master-zulu cri-dockerd[1098]: time="2022-05-26T01:55:43+01:00" level=info msg="Failed to read pod IP from plugin/docker: Couldn't find network status for django-space/postgres-deployment-b58d5ff94-hs7t4 through plugin: invalid network status for"
May 26 01:55:43 master-zulu cri-dockerd[1098]: time="2022-05-26T01:55:43+01:00" level=info msg="Failed to read pod IP from plugin/docker: Couldn't find network status for django-space/postgres-deployment-b58d5ff94-hs7t4 through plugin: invalid network status for"
有谁知道我该怎么做才能解决这个问题?
更新:
cni0
k8s master 上缺少接口:
masterzulu@master-zulu:~$ ifconfig -a
docker0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255
inet6 fe80::42:e9ff:fec1:dd1b prefixlen 64 scopeid 0x20<link>
ether 02:42:e9:c1:dd:1b txqueuelen 0 (Ethernet)
RX packets 5140 bytes 418818 (418.8 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 5475 bytes 522703 (522.7 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
enp3s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.0.196 netmask 255.255.255.0 broadcast 192.168.0.255
inet6 fe80::e808:144d:a0dc:60a6 prefixlen 64 scopeid 0x20<link>
ether 98:40:bb:3e:f2:1c txqueuelen 1000 (Ethernet)
RX packets 6332 bytes 515688 (515.6 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 6684 bytes 631167 (631.1 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet 10.244.0.0 netmask 255.255.255.255 broadcast 0.0.0.0
inet6 fe80::494:d8ff:fe1b:4aab prefixlen 64 scopeid 0x20<link>
ether 06:94:d8:1b:4a:ab txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 129 overruns 0 carrier 0 collisions 0
答案1
经过一些调查,我发现cri-dockerd
服务缺少一些参数:
CGroup: /system.slice/cri-docker.service
└─1098 /usr/local/bin/cri-dockerd --container-runtime-endpoint fd:// --network-plugin=
我手动将它们添加到/etc/systemd/system/cri-docker.service
:
...
ExecStart=/usr/local/bin/cri-dockerd --network-plugin=cni --cni-bin-dir=/opt/cni/bin --cni-cache-dir=/var/lib/cni/cache --cni-conf-dir=/etc/cni/net.d --pod-infra-container-image=k8s.gcr.io/pause:3.7
...
重新加载服务:
sudo systemctl daemon-reload
sudo systemctl restart cri-docker.service
此时 cri-dockerd 配置正确,但问题仍然存在,后来我注意到它/opt/cni/bin
是空的(没有容器网络插件):
masterzulu@master-zulu:~$ sudo /usr/local/bin/cri-dockerd
INFO[0000] Connecting to docker on the Endpoint unix:///var/run/docker.sock
INFO[0000] Start docker client with request timeout 0s
INFO[0000] Hairpin mode is set to none
ERRO[0000] Error validating CNI config list ({
"name": "cbr0",
"cniVersion": "0.3.1",
"plugins": [
{
"type": "flannel",
"delegate": {
"hairpinMode": true,
"isDefaultGateway": true
}
},
{
"type": "portmap",
"capabilities": {
"portMappings": true
}
}
]
}
): [failed to find plugin "portmap" in path [/opt/cni/bin]]
INFO[0000] Docker cri networking managed by network plugin kubernetes.io/no-op
...
INFO[0000] Setting cgroupDriver cgroupfs
INFO[0000] Docker cri received runtime config &RuntimeConfig{NetworkConfig:&NetworkConfig{PodCidr:,},}
INFO[0000] Starting the GRPC backend for the Docker CRI interface.
INFO[0000] Start cri-dockerd grpc backend
我认为我错误地删除了 /opt/cni/bin,因此我再次添加了其内容(获取最新发布):
cd /tmp && mkdir cni-plugins && wget https://github.com/containernetworking/plugins/releases/download/v1.1.1/cni-plugins-linux-amd64-v1.1.1.tgz && cd cni-plugins && tar zxfv ../cni-plugins-linux-amd64-v1.1.1.tgz
sudo cp /tmp/cni-plugins/* /opt/cni/bin/
ls /opt/cni/bin
bandwidth bridge dhcp firewall flannel host-device host-local ipvlan loopback macvlan portmap ptp sbr static tuning vlan vrf
重启后cri-docker服务,一切开始按预期工作:
masterzulu@master-zulu:~$ kubectl get pods -Ao wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
django-space django-588cb669d4-4zz7f 1/1 Running 0 11s 10.244.0.4 master-zulu
django-space postgres-deployment-b58d5ff94-scmrx 1/1 Running 0 12s 10.244.0.5 master-zulu
kube-system coredns-6d4b75cb6d-rnjlm 1/1 Running 0 73m 10.244.0.2 master-zulu
kube-system coredns-6d4b75cb6d-s6zl7 1/1 Running 0 73m 10.244.0.3 master-zulu
cni0
已启动:
masterzulu@master-zulu:~$ ifconfig -a
cni0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet 10.244.0.1 netmask 255.255.255.0 broadcast 10.244.0.255
inet6 fe80::8c8:84ff:fe78:d999 prefixlen 64 scopeid 0x20<link>
ether 0a:c8:84:78:d9:99 txqueuelen 1000 (Ethernet)
RX packets 27714 bytes 5010722 (5.0 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 26936 bytes 2898949 (2.8 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
cri-docker地位:
masterzulu@master-zulu:~$ sudo systemctl status cri-docker
● cri-docker.service - CRI Interface for Docker Application Container Engine
Loaded: loaded (/etc/systemd/system/cri-docker.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2022-05-27 22:39:06 BST; 1h 57min ago
TriggeredBy: ● cri-docker.socket
Docs: https://docs.mirantis.com
Main PID: 187399 (cri-dockerd)
Tasks: 11
Memory: 17.1M
CGroup: /system.slice/cri-docker.service
└─187399 /usr/local/bin/cri-dockerd --network-plugin=cni --cni-bin-dir=/opt/cni/bin --cni-cache-dir=/var/lib/cni/cache --cni-conf-dir=/etc/cni/net.d --po>
May 28 00:36:20 master-zulu cri-dockerd[187399]: time="2022-05-28T00:36:20+01:00" level=info msg="Using CNI configuration file /etc/cni/net.d/10-flannel.conflist"
我的结论
缺乏--network-plugin=cni
cri-docker启动参数或任何其他问题中央网络接口配置可能会导致此问题,其中cri-docker认为中央网络接口缺失并docker0
直接使用接口,因此 pod 从这个范围获取其 IP 172.17.0.x
。
希望这能帮助遇到同样问题的人。