我尝试按照各种不同的指南安装 HA kube,但什么也做不了。
我的基本设置是拥有 3 个主节点,Keepalived 创建虚拟 IP(10.10.20.30),keepalived 将流量发送到已启动的 kubelet。我在一个全新安装的 ubuntu 上运行此脚本,并计划使用提供的 master join 命令加入更多主节点。
我的印象是我可以使用 Haproxy 上的端口 8443 转发到其他主节点上的端口 6443。
这是我创建的脚本,它涵盖了所有步骤。我在全新安装的 ubuntu 20.04 上运行它。
export KUBE_VERSION="1.24.6-00"
# SWAP
## this session
sudo swapoff -a
## permanently
sudo sed -i '/swap/d' /etc/fstab
# NETWORKING
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF
sudo modprobe overlay
sudo modprobe br_netfilter
sudo systemctl disable --now ufw || echo "already disabled"
## sysctl params required by setup, params persist across reboots
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
#Apply sysctl params without reboot
sudo sysctl --system
# INSTALLS
sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl
sudo curl -fsSLo /usr/share/keyrings/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpg
echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
sudo apt-mark unhold kubelet kubeadm kubectl
sudo apt-get install -y kubelet=${KUBE_VERSION} kubeadm=${KUBE_VERSION} kubectl=${KUBE_VERSION} --allow-downgrades
sudo apt-mark hold kubelet kubeadm kubectl
## Make sure no kube clusters are joined/created
sudo kubeadm reset -f
rm /home/skadmin/.kube/config || echo "already removed"
sudo systemctl stop kubelet || echo "kubelet service doesn't exist"
# DISABLE apparmor
sudo systemctl stop apparmor
sudo systemctl disable apparmor
sudo systemctl restart containerd.service || echo "containerd service doesn't exist"
# CONTAINER RUNTIME
sudo apt install -y containerd
sudo mkdir -p /etc/containerd || echo "already exists"
containerd config default | sudo tee /etc/containerd/config.toml
sudo systemctl restart containerd
sudo systemctl enable containerd
# KEEPALIVED
## /etc/keepalived/keepalived.conf
sudo apt-get install -y keepalived
export STATE="BACKUP"
export INTERFACE="ens18"
export ROUTER_ID=51
export PRIORITY=100
export AUTH_PASS=42
export APISERVER_VIP=10.10.20.30
cat <<EOF | sudo tee /etc/keepalived/keepalived.conf
global_defs {
router_id LVS_DEVEL
}
vrrp_script check_apiserver {
script "/etc/keepalived/check_apiserver.sh"
interval 3
weight -2
fall 5
rise 2
}
vrrp_instance VI_1 {
state ${STATE}
interface ${INTERFACE}
virtual_router_id ${ROUTER_ID}
priority ${PRIORITY}
advert_int 5
authentication {
auth_type PASS
auth_pass ${AUTH_PASS}
}
virtual_ipaddress {
${APISERVER_VIP}
}
track_script {
check_apiserver
}
}
EOF
## /etc/keepalived/check_apiserver.sh
export APISERVER_DEST_PORT=8443
cat <<EOF | sudo tee /etc/keepalived/check_apiserver.sh
#!/bin/sh
errorExit() {
echo "*** $*" 1>&2
exit 1
}
curl --silent --max-time 2 --insecure https://localhost:${APISERVER_DEST_PORT}/ -o /dev/null || errorExit "Error GET https://localhost:${APISERVER_DEST_PORT}/"
if ip addr | grep -q ${APISERVER_VIP}; then
curl --silent --max-time 2 --insecure https://${APISERVER_VIP}:${APISERVER_DEST_PORT}/ -o /dev/null || errorExit "Error GET https://${APISERVER_VIP}:${APISERVER_DEST_PORT}/"
fi
EOF
sudo chmod +x /etc/keepalived/check_apiserver.sh
sudo systemctl enable --now keepalived
sudo systemctl restart keepalived
# HA PROXY
sudo apt-get install -y haproxy
## /etc/haproxy/haproxy.cfg
export APISERVER_SRC_PORT=6443
export HOST1_ID=k8-1
export HOST1_ADDRESS=10.10.20.21
export HOST2_ID=k8-2
export HOST2_ADDRESS=10.10.20.22
export HOST3_ID=k8-3
export HOST3_ADDRESS=10.10.20.23
cat <<EOF | sudo tee /etc/haproxy/haproxy.cfg
frontend kubernetes-frontend
bind *:${APISERVER_DEST_PORT}
mode tcp
option tcplog
default_backend kubernetes-backend
backend kubernetes-backend
option httpchk GET /healthz
http-check expect status 200
mode tcp
option ssl-hello-chk
#option httpchk HEAD /
balance roundrobin
server ${HOST2_ID} ${HOST2_ADDRESS}:${APISERVER_SRC_PORT} check fall 3 rise 2
# server ${HOST3_ID} ${HOST3_ADDRESS}:${APISERVER_SRC_PORT} check fall 3 rise 2
# server ${HOST4_ID} ${HOST4_ADDRESS}:${APISERVER_SRC_PORT} check fall 3 rise 2
EOF
sudo systemctl enable haproxy
sudo systemctl restart haproxy
# INIT CLUSTER
cat <<EOF | tee cluster_config.yaml
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
kubernetesVersion: stable
controlPlaneEndpoint: "kube.sk.stringking.com:${APISERVER_DEST_PORT}"
kubernetesVersion: v1.24.6
networking:
podSubnet: 10.0.0.0/16
localAPIEndpoint:
advertiseAddress: 10.10.20.21
bindPort: 8443
cgroupDriver: cgroupfs
EOF
sudo kubeadm init --upload-certs --config cluster_config.yaml
mkdir /home/skadmin/.kube/ || echo "can't create"
sudo cp /etc/kubernetes/admin.conf /home/skadmin/.kube/config
sudo chown skadmin /home/skadmin/.kube/config
sudo chgrp skadmin /home/skadmin/.kube/config
sudo chmod 700 /home/skadmin/.kube/config
# INSTALL CALICO
kubectl apply -f https://docs.projectcalico.org/manifests/calico-typha.yaml
运行之后我得到:
$ sudo kubeadm init --upload-certs --config cluster_config.yaml
W1019 05:30:57.257489 18686 initconfiguration.go:306] error unmarshaling configuration schema.GroupVersionKind{Group:"kubeadm.k8s.io", Version:"v1beta3", Kind:"ClusterConfiguration"}: strict decoding error: yaml: unmarshal errors:
line 5: key "kubernetesVersion" already set in map, unknown field "cgroupDriver", unknown field "localAPIEndpoint"
[init] Using Kubernetes version: v1.24.6
[preflight] Running pre-flight checks
[WARNING SystemVerification]: missing optional cgroups: blkio
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [k8-1 kube.sk.stringking.com kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.10.20.21]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [k8-1 localhost] and IPs [10.10.20.21 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [k8-1 localhost] and IPs [10.10.20.21 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
W1019 05:30:59.451551 18686 endpoint.go:57] [endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "admin.conf" kubeconfig file
W1019 05:30:59.544831 18686 endpoint.go:57] [endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "kubelet.conf" kubeconfig file
W1019 05:30:59.646279 18686 endpoint.go:57] [endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
W1019 05:31:00.043652 18686 endpoint.go:57] [endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all running Kubernetes containers by using crictl:
- 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock ps -a | grep kube | grep -v pause'
Once you have found the failing container, you can inspect its logs with:
- 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher
另外在我的 journalctl 中
journalctl -flu kubelet
Oct 19 05:50:45 k8-1 kubelet[18783]: E1019 05:50:45.963575 18783 event.go:276] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"k8-1.171f61621a018184", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ZZZ_DeprecatedClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"k8-1", UID:"k8-1", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasSufficientPID", Message:"Node k8-1 status is now: NodeHasSufficientPID", Source:v1.EventSource{Component:"kubelet", Host:"k8-1"}, FirstTimestamp:time.Date(2022, time.October, 19, 5, 31, 1, 121368452, time.Local), LastTimestamp:time.Date(2022, time.October, 19, 5, 31, 1, 249038541, time.Local), Count:3, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Patch "https://kube.sk.stringking.com:8443/api/v1/namespaces/default/events/k8-1.171f61621a018184": EOF'(may retry after sleeping)
Oct 19 05:50:46 k8-1 kubelet[18783]: E1019 05:50:46.057764 18783 kubelet.go:2424] "Error getting node" err="node \"k8-1\" not found"
Oct 19 05:50:46 k8-1 kubelet[18783]: E1019 05:50:46.159048 18783 kubelet.go:2424]
答案1
你迷失了,因为通常的方法是先分解事物,然后才能够增加所做事情的复杂性。
创建HA k8s的主要思想是创建HA-controlplane,然后添加worker节点。
根据官方文档,您需要首先创建负载均衡器来平衡API 服务器, 和然后你已经创建了主节点。你可以使用一些简单的工具手动创建 HA'd apiserver,比如 netcat(只是为了测试当某个节点死机时是否会发生切换),然后你就可以继续下一步了。
相反,你得到了一些自动魔法脚本,您不知道它是如何工作的,并期望一些神奇的结果。这不是这样工作的:您必须手动完成每个步骤,然后您才可能能够将您的体验安排为脚本。
我会放弃使用 keepalived 的想法,而选择一些简单的东西,比如外部平衡器上的 haproxy,它不会操作任何过于复杂的事情,比如浮动IP。
虽然您可能不会采纳但我还是会给您的建议是:忘记“仅从互联网上获取自动化脚本”并期望它们能够发挥作用 - 最好的情况是 - 它可能是获取想法的样本材料。