我刚刚按照以下步骤(来自 Linux 学院)在一台干净的 ubuntu 机器上安装了 kubernetes:
以 root 身份:
apt install -y docker.io
cat << EOF >/etc/docker/daemon.json
{
"exec-opts": ["native.cgroupdriver=systemd"]
}
EOF
curl -s http://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
cat << EOF >/etc/apt/sources.list.d/kubernetes.list
deb http://apt.kubernetes.io/ kubernetes-xenial main
EOF
apt update -y
apt install -y kubeadm kubectl kubelet
然后以普通用户身份初始化 kubernetes:
sudo kubeadm init --pod-network-cidr=10.244.0.0/16
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
此时节点已设置但尚未准备好,因此我运行 flannel 的 yml 来创建守护进程、服务和 pod:
sudo kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/v0.9.1/Documentation/kube-flannel.yml
但是,我看不到 Pod,并且守护进程集所需的 Pod 数为 0
没有 pod(coredns 正在等待 flannel 设置):
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-576cbf47c7-j2rbn 0/1 Pending 0 2m25s
kube-system coredns-576cbf47c7-lqhrj 0/1 Pending 0 2m25s
kube-system etcd-webdriver1.mylabserver.com 1/1 Running 0 118s
kube-system kube-apiserver-webdriver1.mylabserver.com 1/1 Running 0 94s
kube-system kube-controller-manager-webdriver1.mylabserver.com 1/1 Running 0 89s
kube-system kube-proxy-fzh97 1/1 Running 0 2m25s
kube-system kube-scheduler-webdriver1.mylabserver.com 1/1 Running 0 90s
还有恶魔
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-system daemonset.apps/kube-flannel-ds 0 0 0 0 0 beta.kubernetes.io/arch=amd64 3m49s
kube-system daemonset.apps/kube-proxy 1 1 1 1 1 <none> 48m
有任何帮助吗?我该如何调试这个问题?
答案1
您用于创建 flannel 资源的配置文件包含过于严格的守护进程容忍度,因此不会在任何节点上调度 pod。向 flannel 守护进程添加一个通用调度容忍度(如渠道配置并且它们将按预期进行安排。您可以通过 2 种方式执行此操作:
(1)修补现有配置
kubectl patch daemonset kube-flannel-ds \
--namespace=kube-system \
--patch='{"spec":{"template":{"spec":{"tolerations":[{"key": "node-role.kubernetes.io/master", "operator": "Exists", "effect": "NoSchedule"},{"effect":"NoSchedule","operator":"Exists"}]}}}}'
(2)修改配置文件在应用之前包括以下内容:
...
spec:
template:
...
spec:
...
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
- operator: Exists
effect: NoSchedule
...
从 k8s v1.10 升级时,我的集群出现了这个问题。似乎与更改不同 k8s 版本中的污点/容忍度。