使用 iptables 将流量从 80/443 正确重定向到 Kubernetes 上 NodePort 暴露的 Traefik 30080/30443

使用 iptables 将流量从 80/443 正确重定向到 Kubernetes 上 NodePort 暴露的 Traefik 30080/30443

在 Kubernetes 设置中,Traefik 使用如下方式部署HelmRelease

apiVersion: helm.fluxcd.io/v1
kind: HelmRelease
metadata:
  name: traefik-default
  namespace: kube-system
spec:
  chart:
    repository: https://containous.github.io/traefik-helm-chart
    name: traefik
    version: 9.2.1
  releaseName: traefik-default
  values:
    ingressRoute:
      dashboard:
        enabled: false
    persistence:
      enabled: false
      accessMode: ReadWriteOnce
      size: 100Mi
      storageClass: "ceph-replicated"
      path: /data
      annotations: {}
    rollingUpdate:
      maxUnavailable: 1
      maxSurge: 0
    additionalArguments:
      - "--log.level=INFO"
      - "--serverstransport.maxidleconnsperhost=0"
      - "--certificatesresolvers.dns-cloudflare.acme.dnschallenge=true"
      - "--certificatesresolvers.dns-cloudflare.acme.dnschallenge.provider=cloudflare"
      - "--certificatesresolvers.dns-cloudflare.acme.dnschallenge.delaybeforecheck=60"
      - "--certificatesresolvers.dns-cloudflare.acme.email=redacted"
      - "--certificatesresolvers.dns-cloudflare.acme.storage=/data/dns-cloudflare.json"
      - "--certificatesresolvers.tls.acme.tlschallenge=true"
      - "--certificatesresolvers.tls.acme.email=redacted"
      - "--certificatesresolvers.tls.acme.storage=/data/tls.json"
      - "--providers.kubernetescrd.throttleduration=15"
      - "--accesslog=true"
    ports:
      traefik:
        expose: true
      web:
        port: 30080
        nodePort: 30080
      websecure:
        port: 30443
        nodePort: 30443
    service:
      type: NodePort
    nodeSelector:
      load-balancer: cloudflare
    resources:
      requests:
        cpu: "500m"
        memory: "100Mi"
      limits:
        cpu: "1000m"
        memory: "250Mi"

我认为相关事实是:

  • 在这个 4 节点集群中,仅匹配节点选择器,因此上述内容在 Kubernetes 集群上线后node4,Traefik Pod 将立即运行。node4HelmReleaseapplied
  • 由于我使用 公开 Traefik NodePort,因此在所有 4 个节点上,这些端口都处于打开状态并接受流量。netstat -tunlp证实了这一点:
...
tcp6       0      0 :::30080                :::*                    LISTEN      5050/kube-proxy
tcp6       0      0 :::30443                :::*                    LISTEN      5050/kube-proxy
tcp6       0      0 :::32100                :::*                    LISTEN      5050/kube-proxy
tcp6       0      0 :::32709                :::*                    LISTEN      5050/kube-proxy
...
  • iptables我在所有节点上使用以下规则(由systemctl restart iptables-restore.servicetied to管理/var/lib/iptables/rules-save
● iptables-restore.service - Restore iptables firewall rules
   Loaded: loaded (/usr/lib/systemd/system/iptables-restore.service; enabled; vendor preset: enabled)
   Active: inactive (dead) since Sun 2020-10-25 18:53:27 UTC; 17min ago
  Process: 31234 ExecStart=/sbin/iptables-restore -w -- /var/lib/iptables/rules-save (code=exited, status=0/SUCCESS)
 Main PID: 31234 (code=exited, status=0/SUCCESS)

在本地网络上,IP 10.1.44.40 对应于它,node4因为它以 40 结尾。node1在本地网络上的 IP 是 10.1.44.10。node2在本地网络上的 IP 是 10.1.44.20。node3在本地网络上的 IP 是 10.1.44.30。

*filter
# Set default policies.
-F INPUT
-P INPUT DROP
# Enable loopback interface.
-A INPUT  -i lo -j ACCEPT
-A OUTPUT -o lo -j ACCEPT
# Enable established connections.
-A INPUT -i br0 -m state --state ESTABLISHED,RELATED -j ACCEPT
# Disable fake pakages.
-A INPUT -s 224.0.0.0/4 -j DROP
-A INPUT -s 240.0.0.0/5 -j DROP
-A INPUT -s 255.255.255.255 -j DROP
-A INPUT -d 0.0.0.0 -j DROP
-A INPUT -s 0.0.0.0/8 -j DROP
-A INPUT -s 169.254.0.0/16 -j DROP
-A INPUT -s 192.0.2.0/24 -j DROP
-A INPUT -s 224.0.0.0/3 -j DROP
# Enable local IP addresses.
-A INPUT -s 10.0.0.0/8 -j ACCEPT
-A INPUT -s 172.16.0.0/12 -j ACCEPT
-A INPUT -s 192.168.0.0/16 -j ACCEPT
# Local services.
-A INPUT -i br0 -p tcp --dport 22 -j ACCEPT
-A INPUT -i br0 -p tcp --dport 80 -j ACCEPT
-A INPUT -i br0 -p tcp --dport 443 -j ACCEPT
-A INPUT -i br0 -p tcp --dport 6443 -j ACCEPT
-A INPUT -i br0 -p tcp --dport 30000:32767 -j ACCEPT
-A INPUT -i br0 -p udp --dport 30000:32767 -j ACCEPT
-A INPUT -i br0 -p icmp --icmp-type 0 -j ACCEPT
-A INPUT -i br0 -p icmp --icmp-type 3 -j ACCEPT
-A INPUT -i br0 -p icmp --icmp-type 11 -j ACCEPT
# Port 22 is rate-limited.
-I INPUT ! -s 10.0.0.0/8  -p tcp --dport 22 -i br0 -m state --state NEW -m recent --set
-I INPUT -p tcp --dport 22 -i br0 -m state --state NEW -m recent --update --seconds 60 --hitcount 2 -j REJECT
# SYN-flood protection.
-A FORWARD -p tcp --syn -m limit --limit 1/s -j ACCEPT
# Port-scan protection.
-A FORWARD -p tcp --tcp-flags SYN,ACK,FIN,RST RST -m limit --limit 1/s -j ACCEPT
# Ping-of-death protection.
-A FORWARD -p icmp --icmp-type echo-request -m limit --limit 1/s -j ACCEPT
-A FORWARD -p tcp -d 10.1.44.40 --dport 80 -m state --state NEW,ESTABLISHED,RELATED -j ACCEPT
-A FORWARD -p tcp -d 10.1.44.40 --dport 443 -m state --state NEW,ESTABLISHED,RELATED -j ACCEPT
# These I also tried.
# -A FORWARD -p tcp -d redacted-IP --dport 80 -m state --state NEW,ESTABLISHED,RELATED -j ACCEPT
# -A FORWARD -p tcp -d redacted-IP --dport 443 -m state --state NEW,ESTABLISHED,RELATED -j ACCEPT
COMMIT
*nat
# This is for Traefik.
-A PREROUTING -p TCP -i br0 --dport 443 -j REDIRECT --to-port 30443
-A PREROUTING -p TCP -i br0 --dport 80 -j REDIRECT --to-port 30080
# These I also tried. Since local network IP ends with 40, this example iptables rules are for node4.
# -A PREROUTING -i br0 -p tcp --dport 80 -j DNAT --to 10.1.44.40:30080
# -A PREROUTING -i br0 -p tcp --dport 443 -j DNAT --to 10.1.44.40:30443
COMMIT
  • 对于/etc/systemd/network每一个node1-4我有:
network # cat 10-bond0.netdev
[NetDev]
Name=bond0
Kind=bond

[Bond]
Mode=802.3ad
TransmitHashPolicy=layer2+3
MIIMonitorSec=1s
LACPTransmitRate=fast
network # cat 10-bond0.network
[Match]
Name=en*

[Network]
Bond=bond0
network # cat 15-br0.netdev
[NetDev]
Name=br0
Kind=bridge
cat 15-br0.network
[Match]
Name=bond0

[Network]
Bridge=br0
cat 20-static.network
[Match]
Name=br0

[Network]
DNS=redacted-IP
Address=redacted-IP/24
Gateway=redacted-IP

Address=10.1.44.20/24 # As you can see, this is `node2`

所有服务都配置为在 上运行的 Traefik node4,因此所有服务都解析为node4公共的删除 IP。

规则有些地方不太正确iptables,因为:

  • 当 Pod A 在 上运行时node4(Traefik 也在 上运行时node4),Pod A 永远无法通过公开的修订 IP 连接到 Traefik 的任何服务。Pod A 的连接被拒绝。
  • 当 Pod B 在除 之外的任何节点上运行时node4,该 Pod B 可以通过公共的修订 IP 连接到 Traefik 的任何服务。
  • 从集群上运行的 Pod 访问 Traefik 公开的服务时,网络非常繁忙。连接尝试经常中断。但是,当从远程网络调用相同的服务时,它可以正常工作。

谢谢!

答案1

以下内容有效。显著的变化:

ports:
      traefik:
        expose: true
      web:
        port: 80
        nodePort: 30080
      websecure:
        port: 443
        nodePort: 30443
    securityContext:
      capabilities:
        drop: [ ALL ]
        add: [ NET_BIND_SERVICE ]
      readOnlyRootFilesystem: true
      runAsGroup: 0
      runAsNonRoot: false
      runAsUser: 0
apiVersion: helm.fluxcd.io/v1
kind: HelmRelease
metadata:
  name: traefik
  namespace: kube-system
spec:
  chart:
    repository: https://helm.traefik.io/traefik
    name: traefik
    version: 9.12.3
  releaseName: traefik
  values:
    deployment:
      kind: Deployment
    experimental:
      kubernetesGateway:
        enabled: false
    ingressRoute:
      dashboard:
        enabled: false
    persistence:
      enabled: false
      accessMode: ReadWriteOnce
      size: 100Mi
      storageClass: "ceph-replicated"
      path: /data
      annotations: {}
    rollingUpdate:
      maxUnavailable: 1
      maxSurge: 0
    additionalArguments:
      - "--log.level=INFO"
      - "--certificatesresolvers.dns-cloudflare.acme.dnschallenge=true"
      - "--certificatesresolvers.dns-cloudflare.acme.dnschallenge.provider=cloudflare"
      - "--certificatesresolvers.dns-cloudflare.acme.dnschallenge.delaybeforecheck=60"
      - "--certificatesresolvers.dns-cloudflare.acme.email=redacted"
      - "--certificatesresolvers.dns-cloudflare.acme.storage=/data/dns-cloudflare.json"
      - "--certificatesresolvers.tls.acme.tlschallenge=true"
      - "--certificatesresolvers.tls.acme.email=redacted"
      - "--certificatesresolvers.tls.acme.storage=/data/tls.json"
      - "--providers.kubernetescrd.throttleduration=15"
      - "--accesslog=true"
      - "--serversTransport.insecureSkipVerify=true"
    env:
      - name: CF_API_EMAIL
        value: redacted
      - name: CLOUDFLARE_API_KEY
        value: redacted
    ports:
      traefik:
        expose: true
      web:
        port: 80
        nodePort: 30080
      websecure:
        port: 443
        nodePort: 30443
    securityContext:
      capabilities:
        drop: [ ALL ]
        add: [ NET_BIND_SERVICE ]
      readOnlyRootFilesystem: true
      runAsGroup: 0
      runAsNonRoot: false
      runAsUser: 0
    hostNetwork: true
    service:
      type: NodePort
    nodeSelector:
      kubernetes.io/hostname: sigma04
    resources:
      requests:
        cpu: "1000m"
        memory: "200Mi"
      limits:
        cpu: "2000m"
        memory: "400Mi"

相关内容