WireGuard 客户端在 Docker 中完美运行,Kubernetes 中无需 DNS

WireGuard 客户端在 Docker 中完美运行,Kubernetes 中无需 DNS

我正在尝试在裸机集群上将一个简单的 WireGuard 容器作为 BitTorrent 组合的一部分运行,但遇到了 Kubernetes 特有的连接问题:相同的配置在 Docker 中可以完美运行。

由于 WireGuard 容器需要net.ipv4.conf.all.src_valid_mark=1客户端模式,并且我想要 IPv6 转发,所以我使用以下 kubeadm init 配置来启动集群:

apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
nodeRegistration:
  kubeletExtraArgs:
    allowed-unsafe-sysctls: "net.ipv4.conf.all.src_valid_mark,net.ipv6.conf.all.forwarding"
--- 
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration 
networking: 
  podSubnet: 192.168.0.0/16

然后我部署以下内容以及各种服务和 nginx 网关。

apiVersion: apps/v1
kind: Deployment
metadata:
  name: bittorrent 
  annotations:
    keel.sh/policy: all
    keel.sh/trigger: poll
    keel.sh/pollSchedule: "@hourly"
spec:
  replicas: 1
  selector:
    matchLabels:
      app: bittorrent
  template:
    metadata:
      labels:
        app: bittorrent
    spec:
      nodeSelector:
        kubernetes.io/hostname: obsidiana
      securityContext:
        sysctls:
        - name: net.ipv4.conf.all.src_valid_mark
          value: "1"
        - name: net.ipv6.conf.all.forwarding
          value: "1"
      containers:
      - name: airvpn
        image: lscr.io/linuxserver/wireguard:latest
        livenessProbe:
          exec:
            command:
              - /bin/sh
              - -c
              - "wg show | grep -q transfer"
          initialDelaySeconds: 65
          periodSeconds: 120
        securityContext:
          privileged: true
          capabilities:
            add: ["NET_ADMIN"]
            add: ["SYS_MODULE"]
        env:
        - name: PUID
          value: "1000"
        - name: PGID
          value: "1000"
        - name: TZ
          value: America/Los_Angeles
        volumeMounts:
        - name: airvpn-config
          mountPath: /etc/wireguard/
        - name: lib-modules
          mountPath: /lib/modules
        ports:
        - containerPort: 9091
          protocol: TCP
      - name: transmission
        image: lscr.io/linuxserver/transmission:latest
        livenessProbe:
          httpGet:
            path: /rpc
            port: 9091
            httpHeaders:
              - name: Authorization
                value: Basic <redacted>
        env:
        - name: PUID
          value: "1000"
        - name: PGID
          value: "1000"
        - name: TZ
          value: America/Los_Angeles
        - name: USER
          valueFrom:
            secretKeyRef:
              name: transmission-secrets
              key: USER
        - name: PASS
          valueFrom:
            secretKeyRef:
              name: transmission-secrets
              key: PASS
        volumeMounts:
        - name: transmission-config
          mountPath: /config
        - name: downloads
          mountPath: /downloads
      volumes:
      - name: transmission-config
        hostPath:
          path: /srv/bittorrent/transmission/config
      - name: airvpn-config
        hostPath: 
          path: /srv/bittorrent/airvpn
      - name: lib-modules
        hostPath:
          path: /lib/modules 
      - name: downloads 
        hostPath:
          path: /downloads

对 WireGuard 容器使用以下 wg0.conf 文件:

[Interface]
Address = 10.145.<redacted>/32, fd7d:76ee:e68f:a993:<redacted>/128
PrivateKey = <redacted>
MTU = 1320
DNS = 10.128.0.1, fd7d:76ee:e68f:a993::1

[Peer]
PublicKey = <redacted>
PresharedKey = <redacted>
Endpoint = america3.vpn.airdns.org:1637
AllowedIPs = 0.0.0.0/0, ::/0
PersistentKeepalive = 15

我还尝试过其他各种服务器,例如 ca3 和 europe3,结果相同:在 Docker 中始终有效,在 Kubernetes 中几乎从不有效。也就是说,虽然 WireGuard 客户端偶尔会连接,但绝大多数情况下,WireGuard 的标准输出如下所示:

Uname info: Linux bittorrent-6db8674f9-6bv9s 6.2.0-39-generic #40-Ubuntu SMP PREEMPT_DYNAMIC Tue Nov 14 14:18:00 UTC 2023 x86_64 GNU/Linux
** It seems the wireguard module is already active. Skipping kernel header install and module compilation. **
** Client mode selected. **
[custom-init] No custom files found, skipping...
** Disabling CoreDNS **
** Found WG conf /config/wg_confs/wg0.conf, adding to list **
** Activating tunnel /config/wg_confs/wg0.conf **
[#] ip link add wg0 type wireguard
[#] wg setconf wg0 /dev/fd/63
Try again: `america3.vpn.airdns.org:1637'. Trying again in 1.00 seconds...
Try again: `america3.vpn.airdns.org:1637'. Trying again in 1.20 seconds...
Try again: `america3.vpn.airdns.org:1637'. Trying again in 1.44 seconds...
Try again: `america3.vpn.airdns.org:1637'. Trying again in 1.73 seconds...
Try again: `america3.vpn.airdns.org:1637'. Trying again in 2.07 seconds...
Try again: `america3.vpn.airdns.org:1637'. Trying again in 2.49 seconds...
Try again: `america3.vpn.airdns.org:1637'. Trying again in 2.99 seconds...
Try again: `america3.vpn.airdns.org:1637'. Trying again in 3.58 seconds...
Try again: `america3.vpn.airdns.org:1637'. Trying again in 4.30 seconds...
Try again: `america3.vpn.airdns.org:1637'. Trying again in 5.16 seconds...
Try again: `america3.vpn.airdns.org:1637'. Trying again in 6.19 seconds...
Try again: `america3.vpn.airdns.org:1637'. Trying again in 7.43 seconds...
Try again: `america3.vpn.airdns.org:1637'. Trying again in 8.92 seconds...
Try again: `america3.vpn.airdns.org:1637'. Trying again in 10.70 seconds...
Try again: `america3.vpn.airdns.org:1637'. Trying again in 12.84 seconds...
Try again: `america3.vpn.airdns.org:1637'
Configuration parsing error
[#] ip link delete dev wg0
** Tunnel /config/wg_confs/wg0.conf failed, will stop all others! **
** All tunnels are now down. Please fix the tunnel config /config/wg_confs/wg0.conf and restart the container **
[ls.io-init] done.

以下是可供参考的功能性 docker-compose 文件:

version: "3.9"
services:
  airvpn:
    image: linuxserver/wireguard:latest
    container_name: airvpn
    cap_add:
      - NET_ADMIN
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=America/Los_Angeles
    volumes:
      - ./airvpn/wg0.conf:/config/wg0.conf
      - /lib/modules:/lib/modules
    sysctls:
      net.ipv4.conf.all.src_valid_mark: 1
      net.ipv6.conf.all.disable_ipv6: 0
    ports:
      - 9091:9091
    privileged: true
    restart: always

  transmission:
    image: linuxserver/transmission:latest
    container_name: transmission
    network_mode: service:airvpn
    depends_on:
      - airvpn
    volumes:
      - ./transmission/config:/config:rw
      - /downloads:/downloads:rw
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=America/Los_Angeles
    env_file:
      - ./.env
    restart: always

以下是我使用 Docker 容器连接时产生的日志。这是 Kuberentes 中的预期行为。

Uname info: Linux 5b3141a4c699 6.2.0-39-generic #40-Ubuntu SMP PREEMPT_DYNAMIC Tue Nov 14 14:18:00 UTC 2023 x86_64 GNU/Linux
**** It seems the wireguard module is already active. Skipping kernel header install and module compilation. ****
**** Performing migration to new folder structure for confs. Please see the image changelog 2023-10-03 entry for more details. ****
rm: cannot remove '/config/wg0.conf': Resource busy
**** Client mode selected. ****
[custom-init] No custom files found, skipping...
**** Disabling CoreDNS ****
**** Found WG conf /config/wg_confs/wg0.conf, adding to list ****
**** Activating tunnel /config/wg_confs/wg0.conf ****
[#] ip link add wg0 type wireguard
[#] wg setconf wg0 /dev/fd/63
[#] ip -4 address add 10.145<redacted>/32 dev wg0
[#] ip -6 address add fd7d:76ee:e68f:a993:<redacted>/128 dev wg0
[#] ip link set mtu 1320 up dev wg0
[#] resolvconf -a wg0 -m 0 -x
s6-rc: fatal: unable to take locks: Resource busy
[#] wg set wg0 fwmark 5182x
[#] ip -6 route add ::/0 dev wg0 table 5182x
[#] ip -6 rule add not fwmark 5182x table 5182x
[#] ip -6 rule add table main suppress_prefixlength 0
[#] ip6tables-restore -n
[#] ip -4 route add 0.0.0.0/0 dev wg0 table 5182x
[#] ip -4 rule add not fwmark 5182x table 5182x
[#] ip -4 rule add table main suppress_prefixlength 0
[#] iptables-restore -n
**** All tunnels are now active ****
[ls.io-init] done.

我的自定义 nettools 测试 pod 可以毫无问题地解析地址:

-> % kubectl exec -it nettools-test-674f556b96-2vv5j -- nslookup america3.vpn.airdns.org
Server:     10.96.0.10
Address:    10.96.0.10#53

Non-authoritative answer:
Name:   america3.vpn.airdns.org
Address: 184.75.223.205

但是,WireGuard 容器本身无法解析 kubernetes.default 或 ping DNS 服务器。

-> % kubectl exec bittorrent-6db8674f9-mbszb -c airvpn -- nslookup america3.vpn.airdns.org
;; connection timed out; no servers could be reached

command terminated with exit code 1

-> % kubectl exec bittorrent-6db8674f9-mbszb -c airvpn -- nslookup google.com             
;; connection timed out; no servers could be reached

command terminated with exit code 1

-> % kubectl exec bittorrent-6db8674f9-mbszb -c airvpn -- nslookup kubernetes.default
;; connection timed out; no servers could be reached

command terminated with exit code 1

-> % kubectl exec bittorrent-6db8674f9-mbszb -c airvpn -- ping -c3 10.96.0.10        
PING 10.96.0.10 (10.96.0.10) 56(84) bytes of data.

--- 10.96.0.10 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2054ms

command terminated with exit code 1

然而,在这些连接尝试期间它确实有一个默认路由:

-> % kubectl exec -it bittorrent-6459cfc9ff-ds8zk -c airvpn -- ip route
default via 169.254.1.1 dev eth0
169.254.1.1 dev eth0 scope link

看起来/etc/resolv.conf很健康:

-> % kubectl exec -it bittorrent-6459cfc9ff-ds8zk -c airvpn -- cat /etc/resolv.conf
search default.svc.cluster.local svc.cluster.local cluster.local
nameserver 10.96.0.10
options ndots:5

我真的不知道接下来该怎么办。任何建议或帮助都将不胜感激!

相关内容