当不相关的多播组加入接口时，STP 数据包被丢弃（每 2 秒 1 个数据包）

2024-5-31 • tag-icon

我试图了解加入特定多播组时出现的奇怪的数据包丢失问题。

我认为这个问题与内核版本 2.6.37 中引入的补丁有关

Beginning with kernel 2.6.37, it has been changed the meaning of dropped
packet count. Before, dropped packets was most likely due to an error.
Now, the rx_dropped counter shows statistics for dropped frames because 
of:

Softnet backlog full -- (Measured from /proc/net/softnet_stat)
Bad / Unintended VLAN tags
Unknown / Unregistered protocols
IPv6 frames when the server is not configured for IPv6

If any frames meet those conditions, they are dropped before the 
protocol stack and the rx_dropped counter is incremented.

在干净的 SLES11 SP3 上，我通过加入 STP 多播组 (01:80:c2:00:00:00) 设法重现了这一点。

如果没有任何变化，则不会出现数据包丢失/proc/net/dev（RX）或者netstat -i因为我的系统没有加入 STP 多播组（因此忽略数据包）。

当我加入 STP 多播组时，我可以看到数据包丢失（每 2 秒 1 个数据包），我相信这些数据包是由于内核 2.6.37（未知/未注册的协议）中引入的补丁而丢失的，这是正常的。

hostname:~ # ip maddr add 01:80:c2:00:00:00 dev eth1

我的理解是，当我将 modprobe llc/stp 模块放入内核时，它会识别该协议，因此停止丢弃数据包（测试证明我是对的）。

Modprobingllc或stp模块（取决于 llc）“修复”丢包问题。

现在的问题是：

我有一个应用程序在启动时会加入多个多播组。出于某种原因一特定的连接触发丢包问题（每 2 秒 1 个数据包）。

问题是，不是stp 多播地址01:80:c2:00:00:00，但完全不同（01:00:5e:46:ac:04 aka 239.70.172.4）。插入 llc/stp 模块“修复”了丢弃数据包计数器增量。所有其他多播组不会导致此问题，例如（01:00:5e:46:ac:02）以及许多其他问题。

STP 帧是唯一每 2 秒出现在接口上的帧，但其目标 MAC 地址是 01:80:c2:00:00:00。

00:21:1b:4f:a3:bf > 01:80:c2:00:00:00, 802.3, length 119: LLC, dsap STP (0x42) Individual, ssap STP (0x42) Command, ctrl 0x03: STP 802.1s, Rapid STP, CIST Flags [Learn, Forward]

这怎么可能？为什么 01:00:5e:46:ac:04 多播组会触发此行为，就像它以某种方式与 STP 组相关并让帧/数据包进一步通过堆栈？

相关内容