我正在尝试在裸机集群上将一个简单的 WireGuard 容器作为 BitTorrent 组合的一部分运行,但遇到了 Kubernetes 特有的连接问题:相同的配置在 Docker 中可以完美运行。
由于 WireGuard 容器需要net.ipv4.conf.all.src_valid_mark=1
客户端模式,并且我想要 IPv6 转发,所以我使用以下 kubeadm init 配置来启动集群:
apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
nodeRegistration:
kubeletExtraArgs:
allowed-unsafe-sysctls: "net.ipv4.conf.all.src_valid_mark,net.ipv6.conf.all.forwarding"
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
networking:
podSubnet: 192.168.0.0/16
然后我部署以下内容以及各种服务和 nginx 网关。
apiVersion: apps/v1
kind: Deployment
metadata:
name: bittorrent
annotations:
keel.sh/policy: all
keel.sh/trigger: poll
keel.sh/pollSchedule: "@hourly"
spec:
replicas: 1
selector:
matchLabels:
app: bittorrent
template:
metadata:
labels:
app: bittorrent
spec:
nodeSelector:
kubernetes.io/hostname: obsidiana
securityContext:
sysctls:
- name: net.ipv4.conf.all.src_valid_mark
value: "1"
- name: net.ipv6.conf.all.forwarding
value: "1"
containers:
- name: airvpn
image: lscr.io/linuxserver/wireguard:latest
livenessProbe:
exec:
command:
- /bin/sh
- -c
- "wg show | grep -q transfer"
initialDelaySeconds: 65
periodSeconds: 120
securityContext:
privileged: true
capabilities:
add: ["NET_ADMIN"]
add: ["SYS_MODULE"]
env:
- name: PUID
value: "1000"
- name: PGID
value: "1000"
- name: TZ
value: America/Los_Angeles
volumeMounts:
- name: airvpn-config
mountPath: /etc/wireguard/
- name: lib-modules
mountPath: /lib/modules
ports:
- containerPort: 9091
protocol: TCP
- name: transmission
image: lscr.io/linuxserver/transmission:latest
livenessProbe:
httpGet:
path: /rpc
port: 9091
httpHeaders:
- name: Authorization
value: Basic <redacted>
env:
- name: PUID
value: "1000"
- name: PGID
value: "1000"
- name: TZ
value: America/Los_Angeles
- name: USER
valueFrom:
secretKeyRef:
name: transmission-secrets
key: USER
- name: PASS
valueFrom:
secretKeyRef:
name: transmission-secrets
key: PASS
volumeMounts:
- name: transmission-config
mountPath: /config
- name: downloads
mountPath: /downloads
volumes:
- name: transmission-config
hostPath:
path: /srv/bittorrent/transmission/config
- name: airvpn-config
hostPath:
path: /srv/bittorrent/airvpn
- name: lib-modules
hostPath:
path: /lib/modules
- name: downloads
hostPath:
path: /downloads
对 WireGuard 容器使用以下 wg0.conf 文件:
[Interface]
Address = 10.145.<redacted>/32, fd7d:76ee:e68f:a993:<redacted>/128
PrivateKey = <redacted>
MTU = 1320
DNS = 10.128.0.1, fd7d:76ee:e68f:a993::1
[Peer]
PublicKey = <redacted>
PresharedKey = <redacted>
Endpoint = america3.vpn.airdns.org:1637
AllowedIPs = 0.0.0.0/0, ::/0
PersistentKeepalive = 15
我还尝试过其他各种服务器,例如 ca3 和 europe3,结果相同:在 Docker 中始终有效,在 Kubernetes 中几乎从不有效。也就是说,虽然 WireGuard 客户端偶尔会连接,但绝大多数情况下,WireGuard 的标准输出如下所示:
Uname info: Linux bittorrent-6db8674f9-6bv9s 6.2.0-39-generic #40-Ubuntu SMP PREEMPT_DYNAMIC Tue Nov 14 14:18:00 UTC 2023 x86_64 GNU/Linux
** It seems the wireguard module is already active. Skipping kernel header install and module compilation. **
** Client mode selected. **
[custom-init] No custom files found, skipping...
** Disabling CoreDNS **
** Found WG conf /config/wg_confs/wg0.conf, adding to list **
** Activating tunnel /config/wg_confs/wg0.conf **
[#] ip link add wg0 type wireguard
[#] wg setconf wg0 /dev/fd/63
Try again: `america3.vpn.airdns.org:1637'. Trying again in 1.00 seconds...
Try again: `america3.vpn.airdns.org:1637'. Trying again in 1.20 seconds...
Try again: `america3.vpn.airdns.org:1637'. Trying again in 1.44 seconds...
Try again: `america3.vpn.airdns.org:1637'. Trying again in 1.73 seconds...
Try again: `america3.vpn.airdns.org:1637'. Trying again in 2.07 seconds...
Try again: `america3.vpn.airdns.org:1637'. Trying again in 2.49 seconds...
Try again: `america3.vpn.airdns.org:1637'. Trying again in 2.99 seconds...
Try again: `america3.vpn.airdns.org:1637'. Trying again in 3.58 seconds...
Try again: `america3.vpn.airdns.org:1637'. Trying again in 4.30 seconds...
Try again: `america3.vpn.airdns.org:1637'. Trying again in 5.16 seconds...
Try again: `america3.vpn.airdns.org:1637'. Trying again in 6.19 seconds...
Try again: `america3.vpn.airdns.org:1637'. Trying again in 7.43 seconds...
Try again: `america3.vpn.airdns.org:1637'. Trying again in 8.92 seconds...
Try again: `america3.vpn.airdns.org:1637'. Trying again in 10.70 seconds...
Try again: `america3.vpn.airdns.org:1637'. Trying again in 12.84 seconds...
Try again: `america3.vpn.airdns.org:1637'
Configuration parsing error
[#] ip link delete dev wg0
** Tunnel /config/wg_confs/wg0.conf failed, will stop all others! **
** All tunnels are now down. Please fix the tunnel config /config/wg_confs/wg0.conf and restart the container **
[ls.io-init] done.
以下是可供参考的功能性 docker-compose 文件:
version: "3.9"
services:
airvpn:
image: linuxserver/wireguard:latest
container_name: airvpn
cap_add:
- NET_ADMIN
environment:
- PUID=1000
- PGID=1000
- TZ=America/Los_Angeles
volumes:
- ./airvpn/wg0.conf:/config/wg0.conf
- /lib/modules:/lib/modules
sysctls:
net.ipv4.conf.all.src_valid_mark: 1
net.ipv6.conf.all.disable_ipv6: 0
ports:
- 9091:9091
privileged: true
restart: always
transmission:
image: linuxserver/transmission:latest
container_name: transmission
network_mode: service:airvpn
depends_on:
- airvpn
volumes:
- ./transmission/config:/config:rw
- /downloads:/downloads:rw
environment:
- PUID=1000
- PGID=1000
- TZ=America/Los_Angeles
env_file:
- ./.env
restart: always
以下是我使用 Docker 容器连接时产生的日志。这是 Kuberentes 中的预期行为。
Uname info: Linux 5b3141a4c699 6.2.0-39-generic #40-Ubuntu SMP PREEMPT_DYNAMIC Tue Nov 14 14:18:00 UTC 2023 x86_64 GNU/Linux
**** It seems the wireguard module is already active. Skipping kernel header install and module compilation. ****
**** Performing migration to new folder structure for confs. Please see the image changelog 2023-10-03 entry for more details. ****
rm: cannot remove '/config/wg0.conf': Resource busy
**** Client mode selected. ****
[custom-init] No custom files found, skipping...
**** Disabling CoreDNS ****
**** Found WG conf /config/wg_confs/wg0.conf, adding to list ****
**** Activating tunnel /config/wg_confs/wg0.conf ****
[#] ip link add wg0 type wireguard
[#] wg setconf wg0 /dev/fd/63
[#] ip -4 address add 10.145<redacted>/32 dev wg0
[#] ip -6 address add fd7d:76ee:e68f:a993:<redacted>/128 dev wg0
[#] ip link set mtu 1320 up dev wg0
[#] resolvconf -a wg0 -m 0 -x
s6-rc: fatal: unable to take locks: Resource busy
[#] wg set wg0 fwmark 5182x
[#] ip -6 route add ::/0 dev wg0 table 5182x
[#] ip -6 rule add not fwmark 5182x table 5182x
[#] ip -6 rule add table main suppress_prefixlength 0
[#] ip6tables-restore -n
[#] ip -4 route add 0.0.0.0/0 dev wg0 table 5182x
[#] ip -4 rule add not fwmark 5182x table 5182x
[#] ip -4 rule add table main suppress_prefixlength 0
[#] iptables-restore -n
**** All tunnels are now active ****
[ls.io-init] done.
我的自定义 nettools 测试 pod 可以毫无问题地解析地址:
-> % kubectl exec -it nettools-test-674f556b96-2vv5j -- nslookup america3.vpn.airdns.org
Server: 10.96.0.10
Address: 10.96.0.10#53
Non-authoritative answer:
Name: america3.vpn.airdns.org
Address: 184.75.223.205
但是,WireGuard 容器本身无法解析 kubernetes.default 或 ping DNS 服务器。
-> % kubectl exec bittorrent-6db8674f9-mbszb -c airvpn -- nslookup america3.vpn.airdns.org
;; connection timed out; no servers could be reached
command terminated with exit code 1
-> % kubectl exec bittorrent-6db8674f9-mbszb -c airvpn -- nslookup google.com
;; connection timed out; no servers could be reached
command terminated with exit code 1
-> % kubectl exec bittorrent-6db8674f9-mbszb -c airvpn -- nslookup kubernetes.default
;; connection timed out; no servers could be reached
command terminated with exit code 1
-> % kubectl exec bittorrent-6db8674f9-mbszb -c airvpn -- ping -c3 10.96.0.10
PING 10.96.0.10 (10.96.0.10) 56(84) bytes of data.
--- 10.96.0.10 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2054ms
command terminated with exit code 1
然而,在这些连接尝试期间它确实有一个默认路由:
-> % kubectl exec -it bittorrent-6459cfc9ff-ds8zk -c airvpn -- ip route
default via 169.254.1.1 dev eth0
169.254.1.1 dev eth0 scope link
看起来/etc/resolv.conf
很健康:
-> % kubectl exec -it bittorrent-6459cfc9ff-ds8zk -c airvpn -- cat /etc/resolv.conf
search default.svc.cluster.local svc.cluster.local cluster.local
nameserver 10.96.0.10
options ndots:5
我真的不知道接下来该怎么办。任何建议或帮助都将不胜感激!