Linux:STP 无法在 Linux 容器之间融合

Linux:STP 无法在 Linux 容器之间融合

我正在尝试在 GNS3 中创建一个带有 Docker 容器的实验室,以了解有关生成树的更多信息。我的实验室非常简单:有两个 Linux/Alpine 容器,它们之间有两个链接:

--------                                          --------
| SW-1 | et2 -------------------------------- et2 | SW-2 |
|      | et3 -------------------------------- et3 |      |
--------                                          --------

每个桥均br0配置有以下配置:

ifconfig eth2 down
ifconfig eth3 down
brctl addbr br0
brctl addif br0 eth2
brctl addif br0 eth3
brctl stp br0 on
ifconfig eth2 0.0.0.0 up
ifconfig eth3 0.0.0.0 up
ifconfig br0 up

网桥已启动,模块已加载,stp 似乎在每个主机上运行良好,但是它们不收敛。所有端口保持转发,L2 pkts 保持无限循环:

# BOTH SW-1 and SW-2:
# lsmod | egrep -i 'bridge|stp'
bridge                352256  1 br_netfilter
stp                    16384  1 bridge
llc                    16384  2 bridge,stp

SW-1:
br0
 bridge id              8000.8615aca70489
 designated root        8000.8615aca70489    <<== SW-1 believes it is the root
 root port                 0                    path cost                  0
 max age                  20.00                 bridge max age            20.00
 hello time                2.00                 bridge hello time          2.00
 forward delay            15.00                 bridge forward delay      15.00
 ageing time             300.00
 hello timer               0.36                 tcn timer                  0.00
 topology change timer     0.00                 gc timer                 116.61
 flags


eth3 (2)
 port id                8002                    state                forwarding
 designated root        8000.8615aca70489       path cost                100
 designated bridge      8000.8615aca70489       message age timer          0.00
 designated port        8002                    forward delay timer        0.00
 designated cost           0                    hold timer                 0.00
 flags

eth2 (1)
 port id                8001                    state                forwarding
 designated root        8000.8615aca70489       path cost                100
 designated bridge      8000.8615aca70489       message age timer          0.00
 designated port        8001                    forward delay timer        0.00
 designated cost           0                    hold timer                 0.00
 flags


SW-2:
br0
 bridge id              8000.16d0f207e210
 designated root        8000.16d0f207e210    <<== SW-2 believes it is the root
 root port                 0                    path cost                  0
 max age                  20.00                 bridge max age            20.00
 hello time                2.00                 bridge hello time          2.00
 forward delay            15.00                 bridge forward delay      15.00
 ageing time             300.00
 hello timer               0.57                 tcn timer                  0.00
 topology change timer     0.00                 gc timer                 116.61
 flags


eth3 (2)
 port id                8002                    state                forwarding
 designated root        8000.16d0f207e210       path cost                100
 designated bridge      8000.16d0f207e210       message age timer          0.00
 designated port        8002                    forward delay timer        0.00
 designated cost           0                    hold timer                 0.00
 flags

eth2 (1)
 port id                8001                    state                forwarding
 designated root        8000.16d0f207e210       path cost                100
 designated bridge      8000.16d0f207e210       message age timer          0.00
 designated port        8001                    forward delay timer        0.00
 designated cost           0                    hold timer                 0.00
 flags

当我在两台设备上运行tcpdumpeth2时候eth3,我看到 BPDU 被发送/接收,但是显然每台设备都忽略了来自另一台设备的 BDPU(顺便说一句,由于循环的原因,我的机器中的平均负载出现峰值):

SW-1:
Spanning Tree Protocol
    Protocol Identifier: Spanning Tree Protocol (0x0000)
    Protocol Version Identifier: Spanning Tree (0)
    BPDU Type: Configuration (0x00)
    BPDU flags: 0x00
        0... .... = Topology Change Acknowledgment: No
        .... ...0 = Topology Change: No
    Root Identifier: 32768 / 0 / 86:15:ac:a7:04:89
        Root Bridge Priority: 32768
        Root Bridge System ID Extension: 0
        Root Bridge System ID: 86:15:ac:a7:04:89 (86:15:ac:a7:04:89)
    Root Path Cost: 0
    Bridge Identifier: 32768 / 0 / 86:15:ac:a7:04:89
        Bridge Priority: 32768
        Bridge System ID Extension: 0
        Bridge System ID: 86:15:ac:a7:04:89 (86:15:ac:a7:04:89)
    Port identifier: 0x8002
    Message Age: 0
    Max Age: 20
    Hello Time: 2
    Forward Delay: 15

SW-2:
Spanning Tree Protocol
    Protocol Identifier: Spanning Tree Protocol (0x0000)
    Protocol Version Identifier: Spanning Tree (0)
    BPDU Type: Configuration (0x00)
    BPDU flags: 0x00
        0... .... = Topology Change Acknowledgment: No
        .... ...0 = Topology Change: No
    Root Identifier: 32768 / 0 / 16:d0:f2:07:e2:10
        Root Bridge Priority: 32768
        Root Bridge System ID Extension: 0
        Root Bridge System ID: 16:d0:f2:07:e2:10 (16:d0:f2:07:e2:10)
    Root Path Cost: 0
    Bridge Identifier: 32768 / 0 / 16:d0:f2:07:e2:10
        Bridge Priority: 32768
        Bridge System ID Extension: 0
        Bridge System ID: 16:d0:f2:07:e2:10 (16:d0:f2:07:e2:10)
    Port identifier: 0x8002
    Message Age: 0
    Max Age: 20
    Hello Time: 2
    Forward Delay: 15

每个人都在告诉对方它是根桥。我等几分钟也没关系。dmesg什么也没显示,只是:

[37533.507941] br0: received packet on eth2 with own address as source address (addr:86:15:ac:a7:04:89, vlan:0)
[37533.507942] br0: received packet on eth3 with own address as source address (addr:86:15:ac:a7:04:89, vlan:0)

default_pvid据我所知,网桥无法识别 VLAN。我尝试过将此网桥的设置为0,但没有任何效果。没有ebtable应用任何过滤规则,我/proc/sys/net/bridge/也将所有文件归零。我不明白为什么 BPDU 不会被消耗,设备最终会收敛。

我尝试了同样的实验,只使用一条链路连接网桥(即无环路),并且每个网桥后面都有一个主机连接到另一个接口,在主机中配置了静态 IP 地址并成功相互 ping 通,即网桥正在交换数据包:

--------                                          --------
| SW-1 | et2 -------------------------------- et2 | SW-2 |
--------                                          --------
   et1                                              et1
    |                                                |
   host1                                            host2

我也尝试过用专有图像替换容器openvswitch,效果很好。有什么想法吗?

答案1

这个问题很有趣,所以我花了一些时间研究它。原来这个问题以前也问过,这个答案描述该问题:

根本原因是,stp 消息从 bridge_slaves 正确发送,但 rcv 例程仅限于 net/llc/llc_input.c 第 166 行(linux-source-5.15.0...)中的 init_ns

看起来当前 6.1.x 内核中存在相同的条件;参见例如这里

您可以通过将网桥留在全局网络命名空间中来验证这是问题所在。我不熟悉 GNS3,所以我使用命令行设置了一个测试环境;我最终得到了如下结果:

在此处输入图片描述

在此图中,sw1-br0sw2-br0是桥接设备,sw1sw2是网络命名空间,其余都是veth设备。这在很大程度上等同于您的示例(两个桥接器通过一对链路连接),但桥接器位于全局命名空间中。我们将命名空间接口附加到每个桥接器,以便我们可以测试端到端连接。

我用这个脚本设置了一切:

#!/bin/sh

set -ex

for dev in 1 2; do
        ns=sw$dev

        # create namespace
        ip netns add $ns

        # create bridge device
        ip link add $ns-br0 type bridge stp_state 1
        ip link set $ns-br0 up

        # create link from namespace to bridge
        ip link add $ns-int type veth peer name $ns-ext

        # configure internal device
        ip link set netns $ns dev $ns-int
        ip -n $ns link set up dev $ns-int
        ip -n $ns addr add 100.64.10.$(( dev * 10))/24 dev $ns-int

        # add external device to bridge
        ip link set master $ns-br0 dev $ns-ext
        ip link set up dev $ns-ext
done

# create links between bridge devices
for port in port0 port1; do
        ip link add sw1-$port type veth peer name sw2-$port
        ip link set master sw1-br0 sw1-$port
        ip link set master sw2-br0 sw2-$port
done

当此脚本运行完毕后,桥接器到桥接器的链接全部被禁用,而命名空间到桥接器的链接则处于启用状态。这给了我们:

$ bridge link | grep br0
5090: sw1-ext@if5091: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master sw1-br0 state forwarding priority 32 cost 2
5093: sw2-ext@if5094: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master sw2-br0 state forwarding priority 32 cost 2
5095: sw2-port0@sw1-port0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 master sw2-br0 state disabled priority 32 cost 2
5096: sw1-port0@sw2-port0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 master sw1-br0 state disabled priority 32 cost 2
5097: sw2-port1@sw1-port1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 master sw2-br0 state disabled priority 32 cost 2
5098: sw1-port1@sw2-port1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 master sw1-br0 state disabled priority 32 cost 2

如果我调出port0界面:

ip link set sw1-port0 up
ip link set sw2-port0 up

我们最终看到:

5090: sw1-ext@if5091: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master sw1-br0 state forwarding priority 32 cost 2
5093: sw2-ext@if5094: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master sw2-br0 state forwarding priority 32 cost 2
5095: sw2-port0@sw1-port0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master sw2-br0 state forwarding priority 32 cost 2
5096: sw1-port0@sw2-port0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master sw1-br0 state forwarding priority 32 cost 2
5097: sw2-port1@sw1-port1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 master sw2-br0 state disabled priority 32 cost 2
5098: sw1-port1@sw2-port1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 master sw1-br0 state disabled priority 32 cost 2

当我最终建立链接port1并等待 stp 收敛时:

ip link set sw1-port1 up
ip link set sw2-port1 up

我们看:

5090: sw1-ext@if5091: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master sw1-br0 state forwarding priority 32 cost 2
5093: sw2-ext@if5094: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master sw2-br0 state forwarding priority 32 cost 2
5095: sw2-port0@sw1-port0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master sw2-br0 state forwarding priority 32 cost 2
5096: sw1-port0@sw2-port0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master sw1-br0 state forwarding priority 32 cost 2
5097: sw2-port1@sw1-port1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master sw2-br0 state blocking priority 32 cost 2
5098: sw1-port1@sw2-port1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master sw1-br0 state forwarding priority 32 cost 2

在这里您可以看到网桥已成功检测到环路并将其中一个端口标记为blocking

我们可以验证两个命名空间之间的连通性:

# ip netns exec sw1 ping -c2 100.64.10.20
PING 100.64.10.20 (100.64.10.20) 56(84) bytes of data.
64 bytes from 100.64.10.20: icmp_seq=1 ttl=64 time=0.052 ms
64 bytes from 100.64.10.20: icmp_seq=2 ttl=64 time=0.067 ms

--- 100.64.10.20 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1004ms
rtt min/avg/max/mdev = 0.052/0.059/0.067/0.007 ms

通过tcpdump在任意链接上运行,我们都可以验证没有循环。

相关内容