我正在遵循 Linux 基金会“Kubernetes 管理员”课程的指南,但部署简单应用程序时遇到了困难。我认为麻烦比应用程序部署更早出现。
我已经创建了 master 和 worker,看起来它们都没问题:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ubuntu-training-server-1 Ready master 63m v1.19.1
ubuntu-training-server-2 Ready <none> 57m v1.19.1
但这里有一个错误:
$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default nginx-6799fc88d8-556z4 0/1 ContainerCreating 0 50m
kube-system calico-kube-controllers-69496d8b75-thcl8 1/1 Running 1 63m
kube-system calico-node-gl885 0/1 CrashLoopBackOff 20 58m
kube-system calico-node-jvc59 1/1 Running 1 63m
kube-system coredns-f9fd979d6-hjfst 1/1 Running 1 64m
kube-system coredns-f9fd979d6-kvx42 1/1 Running 1 64m
kube-system etcd-ubuntu-training-server-1 1/1 Running 1 64m
kube-system kube-apiserver-ubuntu-training-server-1 1/1 Running 1 64m
kube-system kube-controller-manager-ubuntu-training-server-1 1/1 Running 1 64m
kube-system kube-proxy-9899t 1/1 Running 1 58m
kube-system kube-proxy-z6b22 1/1 Running 1 64m
kube-system kube-scheduler-ubuntu-training-server-1 1/1 Running 1 64m
我的意思是并不是所有人都准备好了。
如果我尝试获取有关故障节点的详细信息,我会看到:
$ kubectl logs -n kube-system calico-node-gl885
Error from server (NotFound): the server could not find the requested resource ( pods/log calico-node-gl885)
当我尝试部署 nginx 时,我得到:
$ kubectl create deployment nginx --image=nginx
deployment.apps/nginx created
和
$ kubectl get deployments
NAME READY UP-TO-DATE AVAILABLE AGE
nginx 0/1 1 0 52m
麻烦也在这里:
$ kubectl get events
...
92s Warning FailedCreatePodSandBox pod/nginx-6799fc88d8-556z4 Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "4c451bc2c92f555c930f84e4e8b7082a03dd2824cf50948d348893ebea488d93" network for pod "nginx-6799fc88d8-556z4": networkPlugin cni failed to set up pod "nginx-6799fc88d8-556z4_default" network: stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/
...
我/var/lib/calico/nodename
在工作节点上没有看到任何内容,只在主节点上看到过,而且指南中也只谈到了kubectl apply -f calico.yaml
主节点。
有人能帮我摆脱 calico 错误吗?尝试搜索,见过类似的案例,但看起来它们与不同的事情有关。
更新
我发现可能存在网络冲突(包含 Calico 配置192.168.0.0/16
并且我的 VirtualBox 适配器是),因此我重置了集群,更改了192.168.56.0/24
Calico 配置并再次初始化集群。networking/podSubnet
kubeadm-config.yaml
192.168.0.0/24
新状态如下。
看起来还不错:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ubuntu-training-server-1 Ready master 39m v1.19.1
ubuntu-training-server-2 Ready <none> 38m v1.19.1
看起来也不错:
$ kubectl get events
LAST SEEN TYPE REASON OBJECT MESSAGE
36m Normal Starting node/ubuntu-training-server-1 Starting kubelet.
36m Normal NodeHasSufficientMemory node/ubuntu-training-server-1 Node ubuntu-training-server-1 status is now: NodeHasSufficientMemory
36m Normal NodeHasNoDiskPressure node/ubuntu-training-server-1 Node ubuntu-training-server-1 status is now: NodeHasNoDiskPressure
36m Normal NodeHasSufficientPID node/ubuntu-training-server-1 Node ubuntu-training-server-1 status is now: NodeHasSufficientPID
36m Normal NodeAllocatableEnforced node/ubuntu-training-server-1 Updated Node Allocatable limit across pods
36m Normal NodeReady node/ubuntu-training-server-1 Node ubuntu-training-server-1 status is now: NodeReady
35m Normal RegisteredNode node/ubuntu-training-server-1 Node ubuntu-training-server-1 event: Registered Node ubuntu-training-server-1 in Controller
35m Normal Starting node/ubuntu-training-server-1 Starting kube-proxy.
35m Normal Starting node/ubuntu-training-server-2 Starting kubelet.
35m Normal NodeHasSufficientMemory node/ubuntu-training-server-2 Node ubuntu-training-server-2 status is now: NodeHasSufficientMemory
35m Normal NodeHasNoDiskPressure node/ubuntu-training-server-2 Node ubuntu-training-server-2 status is now: NodeHasNoDiskPressure
35m Normal NodeHasSufficientPID node/ubuntu-training-server-2 Node ubuntu-training-server-2 status is now: NodeHasSufficientPID
35m Normal NodeAllocatableEnforced node/ubuntu-training-server-2 Updated Node Allocatable limit across pods
22s Normal CIDRNotAvailable node/ubuntu-training-server-2 Node ubuntu-training-server-2 status is now: CIDRNotAvailable
35m Normal Starting node/ubuntu-training-server-2 Starting kube-proxy.
35m Normal RegisteredNode node/ubuntu-training-server-2 Node ubuntu-training-server-2 event: Registered Node ubuntu-training-server-2 in Controller
35m Normal NodeReady node/ubuntu-training-server-2 Node ubuntu-training-server-2 status is now: NodeReady
新的麻烦又出现了,calico-kube-controllers-69496d8b75-gdbd7
已经持续了半个多小时了:
$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-69496d8b75-gdbd7 0/1 ContainerCreating 0 37m
kube-system calico-node-8xjsm 0/1 CrashLoopBackOff 13 37m
kube-system calico-node-zktsh 1/1 Running 0 37m
kube-system coredns-f9fd979d6-7bkwn 1/1 Running 0 39m
kube-system coredns-f9fd979d6-rsws5 1/1 Running 0 39m
kube-system etcd-ubuntu-training-server-1 1/1 Running 0 39m
kube-system kube-apiserver-ubuntu-training-server-1 1/1 Running 0 39m
kube-system kube-controller-manager-ubuntu-training-server-1 1/1 Running 0 39m
kube-system kube-proxy-2tvjp 1/1 Running 0 39m
kube-system kube-proxy-jkzbz 1/1 Running 0 39m
kube-system kube-scheduler-ubuntu-training-server-1 1/1 Running 0 39m
更新2
关于我的设置的详细信息。
$ cat kubeadm-config.yaml
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
kubernetesVersion: 1.19.1
controlPlaneEndpoint: "k8smaster:6443"
networking:
podSubnet: 192.168.0.0/24
集群已初始化:
kubeadm init --config=kubeadm-config.yaml --upload-cert
答案1
明白了。在我的设置中,我有两个 VirtualBox VM,它们都有两个网络接口 - 一个用于连接外部世界(10.0.2.15),另一个用于相互通信(192.168.56.104、192.168.56.105)。在kubeadm init
日志中我发现它正在使用第一个,所以我明确告知kubeadm
使用内部 IP。这是我成功创建集群并在其中部署简单应用程序的命令
kubeadm init --apiserver-advertise-address=192.168.56.104 --apiserver-cert-extra-sans=192.168.56.104 --node-name k8smaster --pod-network-cidr=192.168.0.0/24 --kubernetes-version=1.19.1
令人伤心的是——不幸的是我找不到如何添加到我在命令行中使用的配置选项。
答案2
非常感谢 Dmitriy。这绝对拯救了我的一天。我也正在使用 virtualbox 为 LFD 课程设置集群,遇到了同样的错误。calico pod 没有在工作节点上运行,错误消息也是一样的。
我强烈建议使用不同于 192.168 的网络作为 VirtualBox 上虚拟机的 IP 地址。这很简单。您需要做的就是在 VirtualBox 中创建一个新的主机网络,并将该网络作为网络适配器添加到您的虚拟机中。我为我的两个虚拟机创建了一个 172.16.16.0/24 网络,因此我不需要更新任何配置。我需要做的就是更新 kubeadm init cmd 并将正确的 IP 地址放在那里。