在 VPN 中到达集群时 kubectl 超时

2024-6-20 • tag-icon

我有一个非常奇怪的问题 - 我的 kubectl cli 无法到达 VPN 内的集群。

我的背景：

操作系统 archlinux
kubectl 客户端版本：v1.28.1（最新）
在我的电脑上，curl 到 api 端点效果很好
在浏览器中打开 api 也可以
我什至编写了一个使用 go_client 传输的小型 go 模块，该模块也可以工作
我尝试降级 kubectl 客户端以匹配远程集群版本，但它也不起作用。

似乎一切正常除了库贝克尔。

kubectl get pods --request-timeout='5s' -n my-ns -v=99
I0913 17:43:34.500572   60035 loader.go:395] Config loaded from file:  /home/alexander/.config/kube/config
I0913 17:43:34.501699   60035 round_trippers.go:466] curl -v -XGET  -H "Accept: application/json;g=apidiscovery.k8s.io;v=v2beta1;as=APIGroupDiscoveryList,application/json" -H "User-Agent: kubectl/v1.28.1 (linux/amd64) kubernetes/8dc49c4" 'https://172.16.10.10:6443/api?timeout=5s'
I0913 17:43:34.556074   60035 round_trippers.go:510] HTTP Trace: Dial to tcp:172.16.10.10:6443 succeed
I0913 17:43:39.503205   60035 round_trippers.go:553] GET https://172.16.10.10:6443/api?timeout=5s  in 5001 milliseconds
I0913 17:43:39.503231   60035 round_trippers.go:570] HTTP Statistics: DNSLookup 0 ms Dial 43 ms TLSHandshake 47 ms Duration 5001 ms
I0913 17:43:39.503241   60035 round_trippers.go:577] Response Headers:
E0913 17:43:39.503298   60035 memcache.go:265] couldn't get current server API group list: Get "https://172.16.10.10:6443/api?timeout=5s": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
I0913 17:43:39.503312   60035 cached_discovery.go:120] skipped caching discovery info due to Get "https://172.16.10.10:6443/api?timeout=5s": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
I0913 17:43:39.503398   60035 round_trippers.go:466] curl -v -XGET  -H "Accept: application/json;g=apidiscovery.k8s.io;v=v2beta1;as=APIGroupDiscoveryList,application/json" -H "User-Agent: kubectl/v1.28.1 (linux/amd64) kubernetes/8dc49c4" 'https://172.16.10.10:6443/api?timeout=5s'

我的 ip 表包含一些使用 172.17 和 172.18 范围的“docker”相关配置，但不确定它有什么关系，因为“curl”和浏览器正常工作。

IP 路由看起来也不错 - 流量通过 VPN 的“tun”接口进行路由。

default via 192.168.91.249 dev tun0 proto static metric 50
default via 192.168.10.254 dev enp5s0 proto dhcp src 192.168.10.134 metric 100
default via 192.168.10.254 dev wlp3s0 proto dhcp src 192.168.10.119 metric 600
172.16.100.0/24 via 192.168.91.249 dev tun0 proto static metric 50
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
172.18.0.0/16 dev br-53500e17974d proto kernel scope link src 172.18.0.1
172.19.0.0/16 dev br-c1725b2d64cd proto kernel scope link src 172.19.0.1 linkdown
172.28.0.0/16 dev br-788b95918c9a proto kernel scope link src 172.28.0.1
192.168.10.0/24 dev enp5s0 proto kernel scope link src 192.168.10.134 metric 100
192.168.10.0/24 dev wlp3s0 proto kernel scope link src 192.168.10.119 metric 600
192.168.10.254 dev enp5s0 proto static scope link metric 50
192.168.91.1 via 192.168.91.249 dev tun0 proto static metric 50
192.168.91.249 dev tun0 proto kernel scope link src 192.168.91.250 metric 50
196.179.246.130 via 192.168.10.254 dev enp5s0 proto static metric 50

答案1

经过大量修改后，我们发现与服务器的 UDP 网络连接被堵塞，因此我们最终添加了“修复1400“在客户端配置中，它解决了问题。

如果我理解正确的话，这个值将强制通过隧道的数据包大小，在我们的例子中解决了问题。

还有一个很多人推荐的“fragment”变量，但是并没有在服务器端激活。

答案1

相关内容