我正在尝试在 GNS3 中创建一个带有 Docker 容器的实验室,以了解有关生成树的更多信息。我的实验室非常简单:有两个 Linux/Alpine 容器,它们之间有两个链接:
-------- --------
| SW-1 | et2 -------------------------------- et2 | SW-2 |
| | et3 -------------------------------- et3 | |
-------- --------
每个桥均br0
配置有以下配置:
ifconfig eth2 down
ifconfig eth3 down
brctl addbr br0
brctl addif br0 eth2
brctl addif br0 eth3
brctl stp br0 on
ifconfig eth2 0.0.0.0 up
ifconfig eth3 0.0.0.0 up
ifconfig br0 up
网桥已启动,模块已加载,stp 似乎在每个主机上运行良好,但是它们不收敛。所有端口保持转发,L2 pkts 保持无限循环:
# BOTH SW-1 and SW-2:
# lsmod | egrep -i 'bridge|stp'
bridge 352256 1 br_netfilter
stp 16384 1 bridge
llc 16384 2 bridge,stp
SW-1:
br0
bridge id 8000.8615aca70489
designated root 8000.8615aca70489 <<== SW-1 believes it is the root
root port 0 path cost 0
max age 20.00 bridge max age 20.00
hello time 2.00 bridge hello time 2.00
forward delay 15.00 bridge forward delay 15.00
ageing time 300.00
hello timer 0.36 tcn timer 0.00
topology change timer 0.00 gc timer 116.61
flags
eth3 (2)
port id 8002 state forwarding
designated root 8000.8615aca70489 path cost 100
designated bridge 8000.8615aca70489 message age timer 0.00
designated port 8002 forward delay timer 0.00
designated cost 0 hold timer 0.00
flags
eth2 (1)
port id 8001 state forwarding
designated root 8000.8615aca70489 path cost 100
designated bridge 8000.8615aca70489 message age timer 0.00
designated port 8001 forward delay timer 0.00
designated cost 0 hold timer 0.00
flags
SW-2:
br0
bridge id 8000.16d0f207e210
designated root 8000.16d0f207e210 <<== SW-2 believes it is the root
root port 0 path cost 0
max age 20.00 bridge max age 20.00
hello time 2.00 bridge hello time 2.00
forward delay 15.00 bridge forward delay 15.00
ageing time 300.00
hello timer 0.57 tcn timer 0.00
topology change timer 0.00 gc timer 116.61
flags
eth3 (2)
port id 8002 state forwarding
designated root 8000.16d0f207e210 path cost 100
designated bridge 8000.16d0f207e210 message age timer 0.00
designated port 8002 forward delay timer 0.00
designated cost 0 hold timer 0.00
flags
eth2 (1)
port id 8001 state forwarding
designated root 8000.16d0f207e210 path cost 100
designated bridge 8000.16d0f207e210 message age timer 0.00
designated port 8001 forward delay timer 0.00
designated cost 0 hold timer 0.00
flags
当我在两台设备上运行tcpdump
的eth2
时候eth3
,我看到 BPDU 被发送/接收,但是显然每台设备都忽略了来自另一台设备的 BDPU(顺便说一句,由于循环的原因,我的机器中的平均负载出现峰值):
SW-1:
Spanning Tree Protocol
Protocol Identifier: Spanning Tree Protocol (0x0000)
Protocol Version Identifier: Spanning Tree (0)
BPDU Type: Configuration (0x00)
BPDU flags: 0x00
0... .... = Topology Change Acknowledgment: No
.... ...0 = Topology Change: No
Root Identifier: 32768 / 0 / 86:15:ac:a7:04:89
Root Bridge Priority: 32768
Root Bridge System ID Extension: 0
Root Bridge System ID: 86:15:ac:a7:04:89 (86:15:ac:a7:04:89)
Root Path Cost: 0
Bridge Identifier: 32768 / 0 / 86:15:ac:a7:04:89
Bridge Priority: 32768
Bridge System ID Extension: 0
Bridge System ID: 86:15:ac:a7:04:89 (86:15:ac:a7:04:89)
Port identifier: 0x8002
Message Age: 0
Max Age: 20
Hello Time: 2
Forward Delay: 15
SW-2:
Spanning Tree Protocol
Protocol Identifier: Spanning Tree Protocol (0x0000)
Protocol Version Identifier: Spanning Tree (0)
BPDU Type: Configuration (0x00)
BPDU flags: 0x00
0... .... = Topology Change Acknowledgment: No
.... ...0 = Topology Change: No
Root Identifier: 32768 / 0 / 16:d0:f2:07:e2:10
Root Bridge Priority: 32768
Root Bridge System ID Extension: 0
Root Bridge System ID: 16:d0:f2:07:e2:10 (16:d0:f2:07:e2:10)
Root Path Cost: 0
Bridge Identifier: 32768 / 0 / 16:d0:f2:07:e2:10
Bridge Priority: 32768
Bridge System ID Extension: 0
Bridge System ID: 16:d0:f2:07:e2:10 (16:d0:f2:07:e2:10)
Port identifier: 0x8002
Message Age: 0
Max Age: 20
Hello Time: 2
Forward Delay: 15
每个人都在告诉对方它是根桥。我等几分钟也没关系。dmesg
什么也没显示,只是:
[37533.507941] br0: received packet on eth2 with own address as source address (addr:86:15:ac:a7:04:89, vlan:0)
[37533.507942] br0: received packet on eth3 with own address as source address (addr:86:15:ac:a7:04:89, vlan:0)
default_pvid
据我所知,网桥无法识别 VLAN。我尝试过将此网桥的设置为0
,但没有任何效果。没有ebtable
应用任何过滤规则,我/proc/sys/net/bridge/
也将所有文件归零。我不明白为什么 BPDU 不会被消耗,设备最终会收敛。
我尝试了同样的实验,只使用一条链路连接网桥(即无环路),并且每个网桥后面都有一个主机连接到另一个接口,在主机中配置了静态 IP 地址并成功相互 ping 通,即网桥正在交换数据包:
-------- --------
| SW-1 | et2 -------------------------------- et2 | SW-2 |
-------- --------
et1 et1
| |
host1 host2
我也尝试过用专有图像替换容器openvswitch
,效果很好。有什么想法吗?
答案1
这个问题很有趣,所以我花了一些时间研究它。原来这个问题以前也问过,这个答案描述该问题:
根本原因是,stp 消息从 bridge_slaves 正确发送,但 rcv 例程仅限于 net/llc/llc_input.c 第 166 行(linux-source-5.15.0...)中的 init_ns
看起来当前 6.1.x 内核中存在相同的条件;参见例如这里。
您可以通过将网桥留在全局网络命名空间中来验证这是问题所在。我不熟悉 GNS3,所以我使用命令行设置了一个测试环境;我最终得到了如下结果:
在此图中,sw1-br0
和sw2-br0
是桥接设备,sw1
和sw2
是网络命名空间,其余都是veth
设备。这在很大程度上等同于您的示例(两个桥接器通过一对链路连接),但桥接器位于全局命名空间中。我们将命名空间接口附加到每个桥接器,以便我们可以测试端到端连接。
我用这个脚本设置了一切:
#!/bin/sh
set -ex
for dev in 1 2; do
ns=sw$dev
# create namespace
ip netns add $ns
# create bridge device
ip link add $ns-br0 type bridge stp_state 1
ip link set $ns-br0 up
# create link from namespace to bridge
ip link add $ns-int type veth peer name $ns-ext
# configure internal device
ip link set netns $ns dev $ns-int
ip -n $ns link set up dev $ns-int
ip -n $ns addr add 100.64.10.$(( dev * 10))/24 dev $ns-int
# add external device to bridge
ip link set master $ns-br0 dev $ns-ext
ip link set up dev $ns-ext
done
# create links between bridge devices
for port in port0 port1; do
ip link add sw1-$port type veth peer name sw2-$port
ip link set master sw1-br0 sw1-$port
ip link set master sw2-br0 sw2-$port
done
当此脚本运行完毕后,桥接器到桥接器的链接全部被禁用,而命名空间到桥接器的链接则处于启用状态。这给了我们:
$ bridge link | grep br0
5090: sw1-ext@if5091: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master sw1-br0 state forwarding priority 32 cost 2
5093: sw2-ext@if5094: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master sw2-br0 state forwarding priority 32 cost 2
5095: sw2-port0@sw1-port0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 master sw2-br0 state disabled priority 32 cost 2
5096: sw1-port0@sw2-port0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 master sw1-br0 state disabled priority 32 cost 2
5097: sw2-port1@sw1-port1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 master sw2-br0 state disabled priority 32 cost 2
5098: sw1-port1@sw2-port1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 master sw1-br0 state disabled priority 32 cost 2
如果我调出port0
界面:
ip link set sw1-port0 up
ip link set sw2-port0 up
我们最终看到:
5090: sw1-ext@if5091: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master sw1-br0 state forwarding priority 32 cost 2
5093: sw2-ext@if5094: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master sw2-br0 state forwarding priority 32 cost 2
5095: sw2-port0@sw1-port0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master sw2-br0 state forwarding priority 32 cost 2
5096: sw1-port0@sw2-port0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master sw1-br0 state forwarding priority 32 cost 2
5097: sw2-port1@sw1-port1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 master sw2-br0 state disabled priority 32 cost 2
5098: sw1-port1@sw2-port1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 master sw1-br0 state disabled priority 32 cost 2
当我最终建立链接port1
并等待 stp 收敛时:
ip link set sw1-port1 up
ip link set sw2-port1 up
我们看:
5090: sw1-ext@if5091: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master sw1-br0 state forwarding priority 32 cost 2
5093: sw2-ext@if5094: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master sw2-br0 state forwarding priority 32 cost 2
5095: sw2-port0@sw1-port0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master sw2-br0 state forwarding priority 32 cost 2
5096: sw1-port0@sw2-port0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master sw1-br0 state forwarding priority 32 cost 2
5097: sw2-port1@sw1-port1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master sw2-br0 state blocking priority 32 cost 2
5098: sw1-port1@sw2-port1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master sw1-br0 state forwarding priority 32 cost 2
在这里您可以看到网桥已成功检测到环路并将其中一个端口标记为blocking
。
我们可以验证两个命名空间之间的连通性:
# ip netns exec sw1 ping -c2 100.64.10.20
PING 100.64.10.20 (100.64.10.20) 56(84) bytes of data.
64 bytes from 100.64.10.20: icmp_seq=1 ttl=64 time=0.052 ms
64 bytes from 100.64.10.20: icmp_seq=2 ttl=64 time=0.067 ms
--- 100.64.10.20 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1004ms
rtt min/avg/max/mdev = 0.052/0.059/0.067/0.007 ms
通过tcpdump
在任意链接上运行,我们都可以验证没有循环。