'curl' 命令间歇性运行

'curl' 命令间歇性运行

我正在尝试安装和配置高可用性RKE2集群(Rancher Kubernetes Engine 2)。我的架构由 4 台虚拟机组成:一台配置了 DNS 和 LoadBalancer,另一台配置了服务器节点并正在运行,还有 2 台虚拟机用作要加入的节点。

来自代理节点的日志指出:

    Jul 27 13:10:22 ha-rancher-2 rke2[31465]: time="2023-07-27T13:10:22Z" level=fatal msg="starting kubernetes: preparing server: failed to get CA certs: https://rancher.inwi.priv:9345/cacerts: 503 Service Unavailable"
Jul 27 13:10:22 ha-rancher-2 systemd[1]: rke2-server.service: main process exited, code=exited, status=1/FAILURE
Jul 27 13:10:22 ha-rancher-2 systemd[1]: Failed to start Rancher Kubernetes Engine v2 (server).
Jul 27 13:10:22 ha-rancher-2 systemd[1]: Unit rke2-server.service entered failed state.
Jul 27 13:10:22 ha-rancher-2 systemd[1]: rke2-server.service failed.
Jul 27 13:10:28 ha-rancher-2 systemd[1]: rke2-server.service holdoff time over, scheduling restart.
Jul 27 13:10:28 ha-rancher-2 systemd[1]: Stopped Rancher Kubernetes Engine v2 (server).
Jul 27 13:10:28 ha-rancher-2 systemd[1]: Starting Rancher Kubernetes Engine v2 (server)...
Jul 27 13:10:28 ha-rancher-2 sh[31479]: + /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service
Jul 27 13:10:28 ha-rancher-2 sh[31479]: Failed to get unit file state for nm-cloud-setup.service: No such file or directory
Jul 27 13:10:28 ha-rancher-2 rke2[31485]: time="2023-07-27T13:10:28Z" level=warning msg="not running in CIS mode"
Jul 27 13:10:28 ha-rancher-2 rke2[31485]: time="2023-07-27T13:10:28Z" level=info msg="Starting rke2 v1.24.15+rke2r1 (8cf3a75d5ccd6e2aa0a99cdf869426f1decd970d)"
Jul 27 13:10:28 ha-rancher-2 rke2[31485]: time="2023-07-27T13:10:28Z" level=info msg="Managed etcd cluster not yet initialized"
Jul 27 13:10:28 ha-rancher-2 rke2[31485]: time="2023-07-27T13:10:28Z" level=fatal msg="starting kubernetes: preparing server: failed to validate server configuration: CA cert validation failed: https://rancher.inwi.priv:9345/cacerts: 503 Service Unavailable"
Jul 27 13:10:28 ha-rancher-2 systemd[1]: rke2-server.service: main process exited, code=exited, status=1/FAILURE
Jul 27 13:10:28 ha-rancher-2 systemd[1]: Failed to start Rancher Kubernetes Engine v2 (server).
Jul 27 13:10:28 ha-rancher-2 systemd[1]: Unit rke2-server.service entered failed state.
Jul 27 13:10:28 ha-rancher-2 systemd[1]: rke2-server.service failed.

我试图调查这个问题,但我发现奇怪的是“curl”命令间歇性地工作,这令人困惑。

    [root@HA-Rancher-2 ~]# curl -vks https://rancher.inwi.priv:9345/cacerts                                                                                * About to connect() to rancher.inwi.priv port 9345 (#0)
*   Trying 172.20.10.210...
* Connected to rancher.inwi.priv (172.20.10.210) port 9345 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* skipping SSL peer certificate verification
* NSS: client certificate not found (nickname not specified)
* SSL connection using TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256
* Server certificate:
*       subject: CN=rke2,O=rke2
*       start date: Jul 25 19:12:23 2023 GMT
*       expire date: Jul 25 22:54:08 2024 GMT
*       common name: rke2
*       issuer: CN=rke2-server-ca@1690312343
> GET /cacerts HTTP/1.1
> User-Agent: curl/7.29.0
> Host: rancher.inwi.priv:9345
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: text/plain
< Date: Thu, 27 Jul 2023 14:34:58 GMT
< Content-Length: 570
<
-----BEGIN CERTIFICATE-----
MIIBeTCCAR+gAwIBAgIBADAKBggqhkjOPQQDAjAkMSIwIAYDVQQDDBlya2UyLXNl
cnZlci1jYUAxNjkwMzEyMzQzMB4XDTIzMDcyNTE5MTIyM1oXDTMzMDcyMjE5MTIy
M1owJDEiMCAGA1UEAwwZcmtlMi1zZXJ2ZXItY2FAMTY5MDMxMjM0MzBZMBMGByqG
SM49AgEGCCqGSM49AwEHA0IABJdeIAgxOwLhgv7IH4hloybTf...
-----END CERTIFICATE-----
* Connection #0 to host rancher.inwi.priv left intact
[root@HA-Rancher-2 ~]# curl -vks https://rancher.inwi.priv:9345/cacerts
* About to connect() to rancher.inwi.priv port 9345 (#0)
*   Trying 172.20.10.210...
* Connected to rancher.inwi.priv (172.20.10.210) port 9345 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* skipping SSL peer certificate verification
* NSS: client certificate not found (nickname not specified)
* SSL connection using TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256
* Server certificate:
*       subject: CN=rke2,O=rke2
*       start date: Jul 25 19:12:23 2023 GMT
*       expire date: Jul 25 22:54:20 2024 GMT
*       common name: rke2
*       issuer: CN=rke2-server-ca@1690312343
> GET /cacerts HTTP/1.1
> User-Agent: curl/7.29.0
> Host: rancher.inwi.priv:9345
> Accept: */*
>
< HTTP/1.1 503 Service Unavailable
< Content-Type: text/plain; charset=utf-8
< X-Content-Type-Options: nosniff
< Date: Thu, 27 Jul 2023 14:35:00 GMT
< Content-Length: 9
<
starting
* Connection #0 to host rancher.inwi.priv left intact**strong text**

通过 LB vm 中的“netstat”命令,我可以观察到连接状态保持在“TIME_WAIT”状态超过 2 分钟,而不是转换到我正在使用的端口(即端口 9345)的“ESTABLISHED”状态。

  Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State
tcp        0      0 rancher:48916           172.20.10.11:9345       TIME_WAIT
tcp        0      0 rancher:48898           172.20.10.11:9345       TIME_WAIT
tcp        0      0 rancher:48958           172.20.10.11:9345       TIME_WAIT
tcp        0      0 rancher:56558           172.20.10.14:9345       TIME_WAIT
tcp        0      0 rancher:48988           172.20.10.11:9345       TIME_WAIT
tcp        0      0 rancher:ssh             172.20.10.200:44748     ESTABLISHED
tcp        0      0 rancher:48978           172.20.10.11:9345       TIME_WAIT
tcp        0      0 rancher:56568           172.20.10.14:9345       TIME_WAIT
tcp        0      0 rancher:9345            172.20.10.13:40856      TIME_WAIT
tcp        0      0 rancher:9345            172.20.10.13:40892      TIME_WAIT
tcp        0      0 rancher:56538           172.20.10.14:9345       TIME_WAIT
tcp        0      0 rancher:48924           172.20.10.11:9345       TIME_WAIT
tcp        0      0 rancher:56526           172.20.10.14:9345       TIME_WAIT
udp        0      0 rancher:34185           8.8.8.8:domain          ESTABLISHED
udp        0      0 rancher:41489           8.8.4.4:domain          ESTABLISHED
udp        0      0 rancher:47731           8.8.8.8:domain          ESTABLISHED
udp        0      0 rancher:40760           8.8.8.8:domain          ESTABLISHED 

相关内容