问题是,在 中启用 IPVS 模式后kube-proxy
,一切都正常。但是,一旦安装 Traefik,我就会立即失去与 Kubernetes 的连接。
操作系统:CentOS 7.9
$ uname -rs
Linux 3.10.0-1160.71.1.el7.x86_64
Kubernetes:1.22.2 CNI:Calico
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
srv-dev-is-elt-01 Ready control-plane,master 21h v1.22.2
srv-dev-is-zabbix-01 Ready control-plane,master 21h v1.22.2
srv-nt-bpmtest-postgres-01 Ready control-plane,master 21h v1.22.2
srv-nt-bpmtest-postgres-02 Ready <none> 21h v1.22.2
srv-rnt-rrsys-minio Ready <none> 21h v1.22.2
下面我将展示与 Kubernetes 失去连接之后的操作顺序。
在所有节点上启用对 IPVS 的虚拟服务器支持
$ lsmod | grep -e ip_vs -e nf_conntrack_ipv4
nf_conntrack_ipv4 15053 32
nf_defrag_ipv4 12729 1 nf_conntrack_ipv4
ip_vs_lc 12516 0
ip_vs_sh 12688 0
ip_vs_wrr 12697 0
ip_vs_rr 12600 0
ip_vs 145458 8 ip_vs_lc,ip_vs_rr,ip_vs_sh,ip_vs_wrr
nf_conntrack 139264 10 ip_vs,nf_nat,nf_nat_ipv4,nf_nat_ipv6,xt_conntrack,nf_nat_masquerade_ipv4,nf_nat_masquerade_ipv6,nf_conntrack_netlink,nf_conntrack_ipv4,nf_conntrack_ipv6
libcrc32c 12644 3 ip_vs,nf_nat,nf_conntrack
我确保 Traefik 未通过 Helm Chart 安装
$ helm list -A
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
然后我在 kube-proxy 中打开 IPVS 模式
kubectl edit configmap kube-proxy -n kube-system
iptables:
masqueradeAll: false
masqueradeBit: null
minSyncPeriod: 0s
syncPeriod: 0s
ipvs:
excludeCIDRs: null
minSyncPeriod: 0s
scheduler: "lc"
strictARP: false
syncPeriod: 0s
tcpFinTimeout: 0s
tcpTimeout: 0s
udpTimeout: 0s
kind: KubeProxyConfiguration
metricsBindAddress: ""
mode: "ipvs"
nodePortAddresses: null
oomScoreAdj: null
这里,不是mode: ""
我已经指定mode: "ipvs"
,而是scheduler: ""
我已经指定scheduler: "lc"
。平衡模式:最小连接。
在查看了代理多维数据集中的日志后,我确定 IPVS 模式已成功启用。该行Using ipvs Proxier
。
I0816 02:53:24.689691 1 node.go:172] Successfully retrieved node IP: 172.24.17.16 │
│ I0816 02:53:24.689748 1 server_others.go:140] Detected node IP 172.24.17.16 │
│ I0816 02:53:24.744103 1 server_others.go:206] kube-proxy running in dual-stack mode, IPv4-primary │
│ I0816 02:53:24.744172 1 server_others.go:274] Using ipvs Proxier. │
│ I0816 02:53:24.744205 1 server_others.go:276] creating dualStackProxier for ipvs. │
│ W0816 02:53:24.744254 1 server_others.go:495] detect-local-mode set to ClusterCIDR, but no IPv6 cluster CIDR defined, , defaulting to no-op detect-local for IPv6 │
│ E0816 02:53:24.744604 1 proxier.go:381] "can't set sysctl net/ipv4/vs/conn_reuse_mode, kernel version must be at least 4.1" │
│ E0816 02:53:24.745339 1 proxier.go:381] "can't set sysctl net/ipv4/vs/conn_reuse_mode, kernel version must be at least 4.1" │
│ W0816 02:53:24.745504 1 ipset.go:113] ipset name truncated; [KUBE-6-LOAD-BALANCER-SOURCE-CIDR] -> [KUBE-6-LOAD-BALANCER-SOURCE-CID] │
│ W0816 02:53:24.745543 1 ipset.go:113] ipset name truncated; [KUBE-6-NODE-PORT-LOCAL-SCTP-HASH] -> [KUBE-6-NODE-PORT-LOCAL-SCTP-HAS] │
│ I0816 02:53:24.745919 1 server.go:649] Version: v1.22.2 │
│ I0816 02:53:24.753389 1 conntrack.go:52] Setting nf_conntrack_max to 262144 │
│ I0816 02:53:24.753935 1 config.go:315] Starting service config controller │
│ I0816 02:53:24.753967 1 config.go:224] Starting endpoint slice config controller │
│ I0816 02:53:24.753988 1 shared_informer.go:240] Waiting for caches to sync for service config │
│ I0816 02:53:24.754010 1 shared_informer.go:240] Waiting for caches to sync for endpoint slice config │
│ E0816 02:53:24.759251 1 event_broadcaster.go:253] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"srv-nt-bpmtest-postgres-01.170bb3a40a │
│ I0816 02:53:24.854413 1 shared_informer.go:247] Caches are synced for endpoint slice config │
│ I0816 02:53:24.854494 1 shared_informer.go:247] Caches are synced for service config
我还检查了所有连接现在都像lc
$ sudo ipvsadm -ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 10.96.0.1:443 lc
-> 172.24.16.225:6443 Masq 1 0 0
-> 172.24.16.226:6443 Masq 1 0 0
-> 172.24.17.16:6443 Masq 1 1 0
TCP 10.96.0.10:53 lc
-> 192.168.236.7:53 Masq 1 0 0
-> 192.168.236.10:53 Masq 1 0 0
TCP 10.96.0.10:9153 lc
-> 192.168.236.7:9153 Masq 1 0 0
-> 192.168.236.10:9153 Masq 1 0 0
TCP 10.101.219.53:9094 lc
-> 192.168.236.8:9094 Masq 1 0 0
TCP 10.105.217.135:443 lc
-> 192.168.236.6:5443 Masq 1 0 0
-> 192.168.236.9:5443 Masq 1 0 0
TCP 10.107.33.224:5473 lc
-> 172.24.16.225:5473 Masq 1 0 0
-> 172.24.17.16:5473 Masq 1 0 0
-> 172.24.17.17:5473 Masq 1 0 0
UDP 10.96.0.10:53 lc
-> 192.168.236.7:53 Masq 1 0 0
-> 192.168.236.10:53 Masq 1 0 0
我再次检查是否与 Kubernetes 建立了连接
$ kubectl get all -A
NAMESPACE NAME READY STATUS RESTARTS AGE
calico-apiserver pod/calico-apiserver-6b9c675d9-9kwgs 1/1 Running 198 (64m ago) 21h
calico-apiserver pod/calico-apiserver-6b9c675d9-9lkpj 1/1 Running 198 (64m ago) 21h
calico-system pod/calico-kube-controllers-6f875db9f6-lkz5q 1/1 Running 198 (65m ago) 21h
calico-system pod/calico-node-9lwdx 1/1 Running 2 (73m ago) 111m
calico-system pod/calico-node-fwx2q 1/1 Running 3 (64m ago) 111m
calico-system pod/calico-node-jfmpn 1/1 Running 2 (73m ago) 112m
calico-system pod/calico-node-nm2wv 1/1 Running 1 (99m ago) 112m
calico-system pod/calico-node-rfslp 1/1 Running 1 (100m ago) 112m
calico-system pod/calico-typha-694b7cc975-4gwdp 1/1 Running 2 (100m ago) 21h
calico-system pod/calico-typha-694b7cc975-9w7rd 1/1 Running 9 (73m ago) 21h
calico-system pod/calico-typha-694b7cc975-kchjm 1/1 Running 21 (64m ago) 21h
kube-system pod/coredns-78fcd69978-4fnhn 1/1 Running 6 (64m ago) 21h
kube-system pod/coredns-78fcd69978-r4wf5 1/1 Running 6 (64m ago) 21h
kube-system pod/kube-apiserver-srv-dev-is-elt-01 1/1 Running 215 (68m ago) 21h
kube-system pod/kube-apiserver-srv-dev-is-zabbix-01 1/1 Running 201 (68m ago) 21h
kube-system pod/kube-apiserver-srv-nt-bpmtest-postgres-01 1/1 Running 217 (64m ago) 21h
kube-system pod/kube-controller-manager-srv-dev-is-elt-01 1/1 Running 15 (73m ago) 21h
kube-system pod/kube-controller-manager-srv-dev-is-zabbix-01 1/1 Running 8 (73m ago) 21h
kube-system pod/kube-controller-manager-srv-nt-bpmtest-postgres-01 1/1 Running 14 (64m ago) 21h
kube-system pod/kube-proxy-49xzk 1/1 Running 2 (99m ago) 21h
kube-system pod/kube-proxy-ftrdk 1/1 Running 2 (73m ago) 21h
kube-system pod/kube-proxy-jj5zw 1/1 Running 2 (73m ago) 21h
kube-system pod/kube-proxy-pht8d 1/1 Running 2 (100m ago) 21h
kube-system pod/kube-proxy-pwgnm 1/1 Running 3 (64m ago) 106m
kube-system pod/kube-scheduler-srv-dev-is-elt-01 1/1 Running 16 (73m ago) 21h
kube-system pod/kube-scheduler-srv-dev-is-zabbix-01 1/1 Running 8 (73m ago) 21h
kube-system pod/kube-scheduler-srv-nt-bpmtest-postgres-01 1/1 Running 16 (64m ago) 21h
tigera-operator pod/tigera-operator-57b5454687-2rfmt 1/1 Running 15 (64m ago) 21h
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
calico-apiserver service/calico-api ClusterIP 10.105.217.135 <none> 443/TCP 21h
calico-system service/calico-kube-controllers-metrics ClusterIP 10.101.219.53 <none> 9094/TCP 21h
calico-system service/calico-typha ClusterIP 10.107.33.224 <none> 5473/TCP 21h
default service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 21h
kube-system service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 21h
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
calico-system daemonset.apps/calico-node 5 5 5 5 5 kubernetes.io/os=linux 21h
kube-system daemonset.apps/kube-proxy 5 5 5 5 5 kubernetes.io/os=linux 21h
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
calico-apiserver deployment.apps/calico-apiserver 2/2 2 2 21h
calico-system deployment.apps/calico-kube-controllers 1/1 1 1 21h
calico-system deployment.apps/calico-typha 3/3 3 3 21h
kube-system deployment.apps/coredns 2/2 2 2 21h
tigera-operator deployment.apps/tigera-operator 1/1 1 1 21h
NAMESPACE NAME DESIRED CURRENT READY AGE
calico-apiserver replicaset.apps/calico-apiserver-6b9c675d9 2 2 2 21h
calico-system replicaset.apps/calico-kube-controllers-6f875db9f6 1 1 1 21h
calico-system replicaset.apps/calico-typha-694b7cc975 3 3 3 21h
kube-system replicaset.apps/coredns-78fcd69978 2 2 2 21h
tigera-operator replicaset.apps/tigera-operator-57b5454687 1 1 1 21h
最后,我尝试安装 Traefik,但首先values.yaml
我将编辑该externalIPs
行并在那里添加我的集群的 IP 地址数组:
loadBalancerSourceRanges: []
# - 192.168.0.1/32
# - 172.16.0.0/16
externalIPs:
- 172.24.17.16 # master1
- 172.24.16.226 # master2
- 172.24.16.225 #master3
我开始安装 Traefik
$ helm install traefik traefik/ -n traefik
Error: failed post-install: warning: Hook post-install traefik/templates/dashboard-hook-ingressroute.yaml failed: rpc error: code = Unavailable desc = error reading from server: read tcp 172.24.17.16:38452->172.24.16.225:2379: read: connection reset by peer
和:
$ kubectl get all -A
Error from server: etcdserver: request timed out
Error from server: etcdserver: request timed out
Error from server: etcdserver: request timed out
Error from server: etcdserver: request timed out
The connection to the server 172.24.18.188:6443 was refused - did you specify the right host or port?
The connection to the server 172.24.18.188:6443 was refused - did you specify the right host or port?
The connection to the server 172.24.18.188:6443 was refused - did you specify the right host or port?
The connection to the server 172.24.18.188:6443 was refused - did you specify the right host or port?
The connection to the server 172.24.18.188:6443 was refused - did you specify the right host or port?
The connection to the server 172.24.18.188:6443 was refused - did you specify the right host or port?
etcd:
[root@srv-nt-bpmtest-postgres-01 m.kostromin]# systemctl status etcd
● etcd.service - Etcd Server
Loaded: loaded (/usr/lib/systemd/system/etcd.service; enabled; vendor preset: disabled)
Active: active (running) since Tue 2022-08-16 07:58:43 MSK; 15s ago
Main PID: 3719 (etcd)
Tasks: 21
Memory: 56.0M
CGroup: /system.slice/etcd.service
└─3719 /usr/bin/etcd --name=etcd1 --data-dir=/opt/etcd-data/etcd1.etcd --listen-client-urls=https://172.24.17.16:2379,https://127.0.0.1:2379
Aug 16 07:58:43 srv-nt-bpmtest-postgres-01 etcd[3719]: serving client requests on 127.0.0.1:2379
Aug 16 07:58:43 srv-nt-bpmtest-postgres-01 etcd[3719]: serving client requests on 172.24.17.16:2379
Aug 16 07:58:46 srv-nt-bpmtest-postgres-01 bash[3719]: proto: no coders for int
Aug 16 07:58:46 srv-nt-bpmtest-postgres-01 bash[3719]: proto: no encoder for ValueSize int [GetProperties]
Aug 16 07:58:48 srv-nt-bpmtest-postgres-01 etcd[3719]: health check for peer c721fffd85ddc9e0 could not connect: dial tcp 172.24.16.226:2380: connect: no route to host (prober "ROUND_TRIPPER_SNAPSHOT")
Aug 16 07:58:48 srv-nt-bpmtest-postgres-01 etcd[3719]: health check for peer c721fffd85ddc9e0 could not connect: dial tcp 172.24.16.226:2380: connect: no route to host (prober "ROUND_TRIPPER_RAFT_MESSAGE")
Aug 16 07:58:53 srv-nt-bpmtest-postgres-01 etcd[3719]: health check for peer c721fffd85ddc9e0 could not connect: dial tcp 172.24.16.226:2380: i/o timeout (prober "ROUND_TRIPPER_SNAPSHOT")
Aug 16 07:58:53 srv-nt-bpmtest-postgres-01 etcd[3719]: health check for peer c721fffd85ddc9e0 could not connect: dial tcp 172.24.16.226:2380: i/o timeout (prober "ROUND_TRIPPER_RAFT_MESSAGE")
Aug 16 07:58:58 srv-nt-bpmtest-postgres-01 etcd[3719]: health check for peer c721fffd85ddc9e0 could not connect: dial tcp 172.24.16.226:2380: connect: no route to host (prober "ROUND_TRIPPER_SNAPSHOT")
Aug 16 07:58:58 srv-nt-bpmtest-postgres-01 etcd[3719]: health check for peer c721fffd85ddc9e0 could not connect: dial tcp 172.24.16.226:2380: connect: no route to host (prober "ROUND_TRIPPER_RAFT_MESSAGE")
请告诉我可能存在什么问题?