HA 集群初始化问题

2024-6-2 • tag-icon

在初始化第一个控制平面节点（3 个控制平面节点 + 3 个工作节点）期间，我收到以下错误：

root@k8s-eu-1-control-plane-node-1:~# sudo kubeadm init --control-plane-endpoint k82-eu-1-load-balancer-dns-1:53 --upload-certs --v=8 --ignore-preflight-errors=Port-6443


Unfortunately, an error has occurred:
    timed out waiting for the condition

This error is likely caused by:
    - The kubelet is not running
    - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
    - 'systemctl status kubelet'
    - 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all running Kubernetes containers by using crictl:
    - 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock ps -a | grep kube | grep -v pause'
    Once you have found the failing container, you can inspect its logs with:
    - 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock logs CONTAINERID'

这是初始化失败的完整输出：

https://drive.google.com/file/d/1iEnu34unu7xnsh556eTbY5EAJ-2zBrkr/view?usp=sharing

输出journalctl -xeu kubelet：https://drive.google.com/file/d/1d_Z6ic2xjyXu1QIpdba655yS4bvcfhkl/view?usp=drive_link

root@k8s-eu-1-control-plane-node-1:~# crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock ps -a | grep kube | grep -v pause
579c2b9e5d17a       7fe0e6f37db33       41 seconds ago      Exited              kube-apiserver            50                  7d52f351045d2       kube-apiserver-k8s-eu-1-control-plane-node-1
9db9a2fe179e3       e3db313c6dbc0       16 minutes ago      Running             kube-scheduler            25                  d55a5e9d9be56       kube-scheduler-k8s-eu-1-control-plane-node-1
d3887c919854f       d058aa5ab969c       16 minutes ago      Running             kube-controller-manager   18                  e61c1eb6a8700       kube-controller-manager-k8s-eu-1-control-plane-node-1
root@k8s-eu-1-control-plane-node-1:~# 
root@k8s-eu-1-control-plane-node-1:~# crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock logs 579c2b9e5d17a
I1128 16:58:28.080267       1 options.go:220] external host was not specified, using 38.242.249.60
I1128 16:58:28.081342       1 server.go:148] Version: v1.28.4
I1128 16:58:28.081365       1 server.go:150] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
E1128 16:58:28.081652       1 run.go:74] "command failed" err="failed to create listener: failed to listen on 0.0.0.0:6443: listen tcp 0.0.0.0:6443: bind: address already in use"

root@k8s-eu-1-control-plane-node-1:~# ps xa | grep 6443
  33348 pts/0    R+     0:00 grep --color=auto 6443

如果我这样做，kubeadm init --pod-network-cidr=192.168.0.0/16初始化过程就会顺利进行

根据此处描述的内容：https://github.com/kubernetes/kubeadm/blob/main/docs/ha-considerations.md#keepalived-configuration

我定义：

/etc/haproxy/haproxy.cfg：

# https://github.com/kubernetes/kubeadm/blob/main/docs/ha-considerations.md#haproxy-configuration

# /etc/haproxy/haproxy.cfg
#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------
global
    log /dev/log local0
    log /dev/log local1 notice
    daemon

#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
defaults
    mode                    http
    log                     global
    option                  httplog
    option                  dontlognull
    option http-server-close
    option forwardfor       except 127.0.0.0/8
    option                  redispatch
    retries                 1
    timeout http-request    10s
    timeout queue           20s
    timeout connect         5s
    timeout client          20s
    timeout server          20s
    timeout http-keep-alive 10s
    timeout check           10s

#---------------------------------------------------------------------
# apiserver frontend which proxys to the control plane nodes
#---------------------------------------------------------------------
frontend apiserver
    bind *:6445
    mode tcp
    option tcplog
    default_backend apiserverbackend

#---------------------------------------------------------------------
# round robin balancing for apiserver
#---------------------------------------------------------------------
# https://github.com/kubernetes/kubeadm/blob/main/docs/ha-considerations.md#bootstrap-the-cluster

backend apiserverbackend
    #option httpchk GET /healthz
    option httpchk GET /livez
    http-check expect status 200
    mode tcp
    option ssl-hello-chk
    balance     roundrobin
        server k82-eu-1-load-balancer-dns-1 ppp.pp.ppp.pp:53
        server k82-eu-1-load-balancer-dns-2 yyy.yy.yyy.yy:53

/etc/keepalived/keepalived.conf：

# https://github.com/kubernetes/kubeadm/blob/main/docs/ha-considerations.md#keepalived-configuration
# https://www.server-world.info/en/note?os=Ubuntu_22.04&p=keepalived&f=1

! /etc/keepalived/keepalived.conf
! Configuration File for keepalived
global_defs {
    router_id LVS_DEVEL
    enable_script_security
}
vrrp_script check_apiserver {
  script "/etc/keepalived/check_apiserver.sh"
  interval 3
  weight -2
  fall 10
  rise 2
}

vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 51
    priority 101
    authentication {
        auth_type PASS
        auth_pass 42
    }
    virtual_ipaddress {
        10.0.0.30
    }
    track_script {
        check_apiserver
    }
}

/etc/keepalived/check_apiserver.sh：

# https://github.com/kubernetes/kubeadm/blob/main/docs/ha-considerations.md#keepalived-configuration
# https://www.server-world.info/en/note?os=Ubuntu_22.04&p=keepalived&f=1

#!/bin/sh

errorExit() {
    echo "*** $*" 1>&2
    exit 1
}

APISERVER_DEST_PORT=6445
APISERVER_VIP=10.0.0.30

curl --silent --max-time 2 --insecure https://localhost:${APISERVER_DEST_PORT}/ -o /dev/null || errorExit "Error GET https://localhost:${APISERVER_DEST_PORT}/"
if ip addr | grep -q ${APISERVER_VIP}; then
    curl --silent --max-time 2 --insecure https://${APISERVER_VIP}:${APISERVER_DEST_PORT}/ -o /dev/null || errorExit "Error GET https://${APISERVER_VIP}:${APISERVER_DEST_PORT}/"
fi

/etc/kubernetes/manifests/haproxy.yaml：

# https://github.com/kubernetes/kubeadm/blob/main/docs/ha-considerations.md#option-2-run-the-services-as-static-pods

apiVersion: v1
kind: Pod
metadata:
  name: haproxy
  namespace: kube-system
spec:
  containers:
  - image: haproxy:2.1.4
    name: haproxy
    livenessProbe:
      failureThreshold: 8
      httpGet:
        host: localhost
        path: /healthz
        port: 6445
        scheme: HTTPS
    volumeMounts:
    - mountPath: /usr/local/etc/haproxy/haproxy.cfg
      name: haproxyconf
      readOnly: true
  hostNetwork: true
  volumes:
  - hostPath:
      path: /etc/haproxy/haproxy.cfg
      type: FileOrCreate
    name: haproxyconf
status: {}

/etc/kubernetes/manifests/keepalived.yaml：

# https://github.com/kubernetes/kubeadm/blob/main/docs/ha-considerations.md#option-2-run-the-services-as-static-pods

apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  name: keepalived
  namespace: kube-system
spec:
  containers:
  - image: osixia/keepalived:2.0.17
    name: keepalived
    resources: {}
    securityContext:
      capabilities:
        add:
        - NET_ADMIN
        - NET_BROADCAST
        - NET_RAW
    volumeMounts:
    - mountPath: /usr/local/etc/keepalived/keepalived.conf
      name: config
    - mountPath: /etc/keepalived/check_apiserver.sh
      name: check
  hostNetwork: true
  volumes:
  - hostPath:
      path: /etc/keepalived/keepalived.conf
    name: config
  - hostPath:
      path: /etc/keepalived/check_apiserver.sh
    name: check
status: {}

我做错了什么？如何使初始化过程正常工作？

相关内容