带有 haproxy LB 的多主 kubernetes 集群，主节点重启后集群不工作（无法执行 kubectl cmds）

2024-6-2 • tag-icon

带有 haproxy LB 的多主 kubernetes 集群，主节点重启后集群不工作（无法执行 kubectl cmds）

我已经安装了多主集群，参考指南设置k8多主集群

设置细节如下。

负载均衡器：Haproxy LB

frontend kubernetes-frontend
    bind 192.168.1.11:6443
    mode tcp
    option tcplog
    default_backend kubernetes-backend

backend kubernetes-backend
    mode tcp
    option tcp-check
    balance roundrobin
    server master21.server 192.168.1.21:6443 check fall 3 rise 2
    server master22.server 192.168.1.22:6443 check fall 3 rise 2

Kubernetes 版本：v1.25.0

No Of masters: 2
No of workers: 2

Docker 版本 23.0.1

cri-dockerd V3.0

环境：Vmware 虚拟服务器：Centos 8

安装和集群设置完成后，一切运行正常，我还部署了一个示例 pod。然后我想通过关闭其中一个主服务器来检查集群的高可用性，问题就来了，一旦我关闭其中一个主服务器，kubectl 命令就会停止工作。尝试重新启动和切换主节点，但 kubectl 命令不起作用。当命令超时时，它会给出以下错误（但并非总是如此）

error: Get "https://192.168.1.11:6443/api?timeout=32s": net/http: TLS handshake timeout - error from a previous attempt: EOF

我尝试过带有和不带有 http(s) 的 curl 命令，结果如下

[***@master21 ~]$ curl -v https://192.168.1.11:6443/api?timeout=32s
*   Trying 192.168.1.11...
* TCP_NODELAY set
* Connected to 192.168.1.11 (192.168.1.11) port 6443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to 192.168.1.11:6443
curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to 192.168.1.11:6443

[***@master21 ~]$ curl -v http://192.168.1.11:6443/api?timeout=32s
*   Trying 192.168.1.11...
* TCP_NODELAY set
* Connected to 192.168.1.11 (192.168.1.11) port 6443 (#0)
> GET /api?timeout=32s HTTP/1.1
> Host: 192.168.1.11:6443
> User-Agent: curl/7.61.1
> Accept: */*
>
* Empty reply from server

有人能帮我解决这个问题吗？我相信 haproxy 上需要 TLS 配置，但不了解如何配置它以与 k8 集群中现有的 SSL 设置相匹配

输出curl-kv-卷曲https://192.168.1.21:6443/healthz关闭一个主机（master22.server 整个虚拟机）



[***@master21 ~]$ curl -kv https://192.168.1.21:6443/healthz
*   Trying 192.168.1.21...
* TCP_NODELAY set
* Connected to 192.168.1.21 (192.168.1.21) port 6443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Request CERT (13):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, [no content] (0):
* TLSv1.3 (OUT), TLS handshake, Certificate (11):
* TLSv1.3 (OUT), TLS handshake, [no content] (0):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=kube-apiserver
*  start date: Mar 23 08:10:26 2023 GMT
*  expire date: Mar 22 08:10:26 2024 GMT
*  issuer: CN=kubernetes
*  SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* TLSv1.3 (OUT), TLS app data, [no content] (0):
* TLSv1.3 (OUT), TLS app data, [no content] (0):
* TLSv1.3 (OUT), TLS app data, [no content] (0):
* Using Stream ID: 1 (easy handle 0x5605644bf690)
* TLSv1.3 (OUT), TLS app data, [no content] (0):
> GET /healthz HTTP/2
> Host: 192.168.1.21:6443
> User-Agent: curl/7.61.1
> Accept: */*
>
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS app data, [no content] (0):
* Connection state changed (MAX_CONCURRENT_STREAMS == 250)!
* TLSv1.3 (OUT), TLS app data, [no content] (0):
* TLSv1.3 (IN), TLS app data, [no content] (0):
* TLSv1.3 (IN), TLS app data, [no content] (0):
< HTTP/2 403
< audit-id: 930660ff-c7ee-4226-9b98-8fdaed13a251
< cache-control: no-cache, private
< content-type: application/json
< x-content-type-options: nosniff
< x-kubernetes-pf-flowschema-uid:
< x-kubernetes-pf-prioritylevel-uid:
< content-length: 224
< date: Fri, 24 Mar 2023 06:46:01 GMT
<
* TLSv1.3 (IN), TLS app data, [no content] (0):
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "forbidden: User \"system:anonymous\" cannot get path \"/healthz\"",
  "reason": "Forbidden",
  "details": {},
  "code": 403
* Connection #0 to host 192.168.1.21 left intact

经过进一步检查，我注意到问题发生在我完全关闭主节点（整个虚拟机）时，当我仅停止 kubelet 服务时，kubectl 命令给出以下预期输出

[***@master22 ~]$  kubectl get nodes
NAME              STATUS     ROLES           AGE   VERSION
master21.server   Ready      control-plane   22h   v1.25.0
master22.server   NotReady   control-plane   22h   v1.25.0
worker31.server   Ready      <none>          22h   v1.25.0
worker32.server   Ready      <none>          22h   v1.25.0

相关内容