IP 数据包在路由决策中受阻

IP 数据包在路由决策中受阻

首先,这是我的基础设施的样子和工作原理:

在此处输入图片描述

Controller1/2 和 Compute1/2 都运行 VM,并通过 VPN 相互连接。在每台服务器上,br-ext 接口都插入 ext 接口(vpn 接口)。所有服务器都能够相互通信,VM 也能够在其私有接口上进行通信。

我有两个 ubuntu 16.04 路由器(带有 ETH3 和 BR-ext 的 2 个盒子),一次只有一个处于活动状态(第二个是使用 keepalived 的故障转移)并且同时拥有公共子网(51.38.XY/27)和 IP 10.38.166.190(充当所有 VM 的网关)。

我使用 Iptables 和 Iproute2 来允许流量(比如说 51.38.X.YYA)到达 10.38.X.YYA,以及从 10.38.X.YYA 经过 51.38.X.YYA。

从其中一个虚拟机,我可以毫无问题地访问外部,如果我运行 curl ifconfig.co,系统会提示我公共 IP,这是我想要的行为。

我的问题:

如果我尝试使用其公共 IP 从 VM1 访问 VM2,则根本行不通。

我将使用两台虚拟机来说明我的问题并提供有关它的所有配置:

虚拟机1:10.38.166.167/51.38.166.167 虚拟机2:10.38.166.166/51.38.166.166

我目前所做的:

在 router1 上:

ETH1 = 主接口(管理) ETH3 = 包含所有 IP 和 NAT 到 VM 的接口 br-ext = 包含 VPN 接口的桥 ext = VPN 接口(插在桥 br-ext 上)

[root@network3] ~# ip a l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever

3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether fa:16:3e:19:3e:41 brd ff:ff:ff:ff:ff:ff
    inet 51.38.166.162/32 brd 51.38.x.162 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fe19:3e41/64 scope link
       valid_lft forever preferred_lft forever

5: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether fa:16:3e:72:94:cb brd ff:ff:ff:ff:ff:ff
    inet 51.38.166.163/32 brd 51.38.x.163 scope global eth3
       valid_lft forever preferred_lft forever
    inet 51.38.166.166/32 scope global eth3
       valid_lft forever preferred_lft forever
    inet 51.38.166.167/32 scope global eth3
       valid_lft forever preferred_lft forever


7: br-ext: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether d2:f8:64:36:64:f2 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.103/9 brd 10.127.255.255 scope global br-ext
       valid_lft forever preferred_lft forever
    inet 10.0.0.120/32 scope global br-ext
       valid_lft forever preferred_lft forever
    inet 10.38.166.190/32 scope global br-ext
       valid_lft forever preferred_lft forever

10: ext: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master br-ext state UNKNOWN group default qlen 1000
    link/ether d2:f8:64:36:64:f2 brd ff:ff:ff:ff:ff:ff

我已经设置了一组路由,以允许来自 51.38.x.160/27 外部的数据包路由到 10.38.xy/27

[root@network3] ~# ip ru l | grep "lookup 103"
9997:   from 10.38.x.167 lookup 103
9998:   from 10.38.x.166 lookup 103

# rules to tells that each IP of the /27 need to use table 103
10301:  from 51.38.166.163 lookup 103
10302:  from all to 51.38.166.163 lookup 103
10307:  from 51.38.166.166 lookup 103
10308:  from all to 51.38.166.166 lookup 103
10309:  from 51.38.166.167 lookup 103
10310:  from all to 51.38.166.167 lookup 103

[root@network3] ~# ip r s table 103
default via 51.38.166.190 dev eth3
51.38.166.160/27 dev eth3  scope link

[root@network3] ~# ip r s
default via 51.38.166.190 dev eth1 onlink
10.0.0.0/9 dev br-ext  proto kernel  scope link  src 10.0.0.103
172.16.0.0/16 dev br-manag  proto kernel  scope link  src 172.16.0.103

我的 iptables 如下所示:

[root@network3] ~# iptables -nvL
Chain INPUT (policy ACCEPT 21334 packets, 1015K bytes)
 pkts bytes target     prot opt in     out     source               destination
91877 4376K ACCEPT     icmp --  *      *       0.0.0.0/0            0.0.0.0/0            /* 000 accept all icmp */
   18  1564 ACCEPT     all  --  lo     *       0.0.0.0/0            0.0.0.0/0            /* 001 accept all to lo interface */
    0     0 REJECT     all  --  !lo    *       0.0.0.0/0            127.0.0.0/8          /* 002 reject local traffic not on loopback interface */ reject-with icmp-port-unreachable
 343K  123M ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            state ESTABLISHED /* 003 accept related established rules */
  243 14472 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            multiport dports 1022 /* 030 allow SSH */
 481M   42G ACCEPT     udp  --  *      *       0.0.0.0/0            0.0.0.0/0            multiport dports 3210:3213 /* 031 allow VPNtunnel */
 4155  241K DROP       all  --  eth0   *       0.0.0.0/0            0.0.0.0/0            /* 999 drop all */

Chain FORWARD (policy ACCEPT 98325 packets, 8874K bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain OUTPUT (policy ACCEPT 964M packets, 93G bytes)
 pkts bytes target     prot opt in     out     source               destination

Iptables NAT 规则

[root@network3] ~# iptables -t nat -nvL --line
Chain PREROUTING (policy ACCEPT 156K packets, 6455K bytes)
num   pkts bytes target     prot opt in     out     source               destination
31   11228  771K DNAT       all  --  *      *       0.0.0.0/0            51.38.166.166        /* 112 NAT for 10.38.166.166 */ to:10.38.166.166
32   11624  809K DNAT       all  --  *      *       0.0.0.0/0            51.38.166.167        /* 112 NAT for 10.38.166.167 */ to:10.38.166.167

Chain INPUT (policy ACCEPT 85077 packets, 3527K bytes)
num   pkts bytes target     prot opt in     out     source               destination

Chain OUTPUT (policy ACCEPT 16505 packets, 1294K bytes)
num   pkts bytes target     prot opt in     out     source               destination

Chain POSTROUTING (policy ACCEPT 105K packets, 4357K bytes)
num   pkts bytes target     prot opt in     out     source               destination              destination
31      17  1196 SNAT       all  --  *      *       10.38.166.166        0.0.0.0/0             to:51.38.166.166
32       8   549 SNAT       all  --  *      *       10.38.166.167        0.0.0.0/0             to:51.38.166.167

我还在 RAW 表中插入了一些规则来帮助我跟踪数据包:

[root@network3] ~# iptables -t raw -nvL
Chain PREROUTING (policy ACCEPT 3765 packets, 227K bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 TRACE      all  --  *      *       51.38.166.167        0.0.0.0/0
  185 12988 TRACE      all  --  *      *       0.0.0.0/0            51.38.166.167

Chain OUTPUT (policy ACCEPT 7941 packets, 837K bytes)
 pkts bytes target     prot opt in     out     source               destination

从 VM1 进行测试:

ubuntu@test-1:~$ ip a l dev ens3
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc pfifo_fast state UP group default qlen 1000
    link/ether fa:16:3e:51:0a:0b brd ff:ff:ff:ff:ff:ff
    inet 10.38.166.167/24 brd 10.38.166.255 scope global ens3
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fe51:a0b/64 scope link
       valid_lft forever preferred_lft forever

ubuntu@test-1:~$ curl ifconfig.co
51.38.166.167

ubuntu@test-1:~$ ping 51.38.166.166 -c 4
PING 51.38.166.166 (51.38.166.166) 56(84) bytes of data.

--- 51.38.166.166 ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 3031ms

从 VM2 进行测试:

ubuntu@test-2:~$ ip a l dev ens3
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc pfifo_fast state UP group default qlen 1000
    link/ether fa:16:3e:9d:79:ce brd ff:ff:ff:ff:ff:ff
    inet 10.38.166.166/24 brd 10.38.166.255 scope global ens3
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fe9d:79ce/64 scope link
       valid_lft forever preferred_lft forever

ubuntu@test-2:~$ curl ifconfig.co
51.38.166.166

ubuntu@test-2:~$ ping 51.38.166.167 -c 4
PING 51.38.166.167 (51.38.166.167) 56(84) bytes of data.

--- 51.38.166.167 ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 3023ms

来自网络3的日志:

[root@network3] ~# tail -f /var/log/kern.log | grep "SRC=10.38.166.166 DST=51.38.166.167"
Jul  5 11:58:12 network3 kernel: [79540.314496] TRACE: nat:PREROUTING:rule:32 IN=br-ext OUT= MAC=de:01:31:2d:47:18:fa:16:3e:9d:79:ce:08:00 SRC=10.38.166.166 DST=51.38.166.167 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=49094 DF PROTO=ICMP TYPE=8 CODE=0 ID=4992 SEQ=57
Jul  5 11:58:13 network3 kernel: [79541.322501] TRACE: raw:PREROUTING:policy:3 IN=br-ext OUT= MAC=de:01:31:2d:47:18:fa:16:3e:9d:79:ce:08:00 SRC=10.38.166.166 DST=51.38.166.167 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=49203 DF PROTO=ICMP TYPE=8 CODE=0 ID=4992 SEQ=58
Jul  5 11:58:13 network3 kernel: [79541.322543] TRACE: mangle:PREROUTING:policy:1 IN=br-ext OUT= MAC=de:01:31:2d:47:18:fa:16:3e:9d:79:ce:08:00 SRC=10.38.166.166 DST=51.38.166.167 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=49203 DF PROTO=ICMP TYPE=8 CODE=0 ID=4992 SEQ=58
Jul  5 11:58:13 network3 kernel: [79541.322574] TRACE: nat:PREROUTING:rule:32 IN=br-ext OUT= MAC=de:01:31:2d:47:18:fa:16:3e:9d:79:ce:08:00 SRC=10.38.166.166 DST=51.38.166.167 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=49203 DF PROTO=ICMP TYPE=8 CODE=0 ID=4992 SEQ=58
Jul  5 11:58:14 network3 kernel: [79542.330582] TRACE: raw:PREROUTING:policy:3 IN=br-ext OUT= MAC=de:01:31:2d:47:18:fa:16:3e:9d:79:ce:08:00 SRC=10.38.166.166 DST=51.38.166.167 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=49367 DF PROTO=ICMP TYPE=8 CODE=0 ID=4992 SEQ=59
Jul  5 11:58:14 network3 kernel: [79542.330615] TRACE: mangle:PREROUTING:policy:1 IN=br-ext OUT= MAC=de:01:31:2d:47:18:fa:16:3e:9d:79:ce:08:00 SRC=10.38.166.166 DST=51.38.166.167 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=49367 DF PROTO=ICMP TYPE=8 CODE=0 ID=4992 SEQ=59
Jul  5 11:58:14 network3 kernel: [79542.330639] TRACE: nat:PREROUTING:rule:32 IN=br-ext OUT= MAC=de:01:31:2d:47:18:fa:16:3e:9d:79:ce:08:00 SRC=10.38.166.166 DST=51.38.166.167 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=49367 DF PROTO=ICMP TYPE=8 CODE=0 ID=4992 SEQ=59
^C

由于给定 SEQ 的 ID 不会改变,因此我可以在日志中搜索与此 ID/SEQ 相关的任何内容:

[root@network3] ~# grep "ID=49367" /var/log/kern.log
Jul  5 11:58:14 network3 kernel: [79542.330582] TRACE: raw:PREROUTING:policy:3 IN=br-ext OUT= MAC=de:01:31:2d:47:18:fa:16:3e:9d:79:ce:08:00 SRC=10.38.166.166 DST=51.38.166.167 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=49367 DF PROTO=ICMP TYPE=8 CODE=0 ID=4992 SEQ=59
Jul  5 11:58:14 network3 kernel: [79542.330615] TRACE: mangle:PREROUTING:policy:1 IN=br-ext OUT= MAC=de:01:31:2d:47:18:fa:16:3e:9d:79:ce:08:00 SRC=10.38.166.166 DST=51.38.166.167 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=49367 DF PROTO=ICMP TYPE=8 CODE=0 ID=4992 SEQ=59
Jul  5 11:58:14 network3 kernel: [79542.330639] TRACE: nat:PREROUTING:rule:32 IN=br-ext OUT= MAC=de:01:31:2d:47:18:fa:16:3e:9d:79:ce:08:00 SRC=10.38.166.166 DST=51.38.166.167 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=49367 DF PROTO=ICMP TYPE=8 CODE=0 ID=4992 SEQ=59

如果我参考此图:http://inai.de/images/nf-packet-flow.png

它似乎被困在路由决策上。(我已经排除了被困在桥接决策上的可能性,因为如果我在不涉及任何桥接的情况下做完全相同的事情,它的行为是完全相同的)。

另一种可能性是它符合 NAT 预路由规则 32 但并未应用它,但我不知道为什么。

在这种情况下,我遗漏了什么线索吗?

答案1

在路由决策时丢弃数据包的最常见原因是rp_filter

检查命令的输出ip route get 51.38.166.167 from 10.38.166.166 iif br-ext。正常情况下,它应该返回有效路由。invalid cross-device link结果意味着数据包将被丢弃rp_filter。还要检查输出nstat -az TcpExtIPReversePathFilter。它是此类丢弃数据包的计数器。

检查rp_filter使用ip netconf show dev br-ext命令的当前模式。

使用sysctl命令来调整此参数。

相关内容