Linux 服务器在 __netif_receive_skb_core 中丢弃 RX 数据包

2024-6-1 • tag-icon

Linux 服务器在 __netif_receive_skb_core 中丢弃 RX 数据包

我的 Ubuntu 18.04 服务器丢弃了接收到的数据包，但我不知道原因。

这是来自 netdata 的丢弃数据包的图表：

该服务器正在运行多个 docker 容器和网络，因此有多个 Linux 桥接器和 veth 接口。但问题与物理接口有关。没有配置 VLAN。

除了 Docker 生成的 IPtables 规则外，该机器没有其他 IPtables 规则。

该网卡是 Intel I210（igb驱动程序）。

通过 TCP（rsync）复制数据以 1G 线路速度进行，因此不会损坏很多 TCP 数据包。（我预计 TCP 丢失会因窗口大小减小而严重损害传输性能。）

# uname -a
Linux epyc 5.3.0-51-generic #44~18.04.2-Ubuntu SMP Thu Apr 23 14:27:18 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

可以看到上的 RX-DRP eno1，这是机器唯一的物理接口。在流量非常小的情况下（管理员 ssh，只有少量 dns 查询），它以大约 2 个数据包/秒的速度增加。

# netstat -ni
Kernel Interface table
Iface      MTU    RX-OK RX-ERR RX-DRP RX-OVR    TX-OK TX-ERR TX-DRP TX-OVR Flg
br-69eb0  1500    45768      0      0 0         41072      0      0      0 BMRU
br-bf2c7  1500       71      0      0 0            85      0      0      0 BMRU
br-f4e34  1500   187676      0      0 0        192128      0      0      0 BMRU
docker0   1500    62739      0      0 0         70194      0      0      0 BMRU
eno1      1500 55517866      0 271391 35     19796132      0      0      0 BMRU
lo       65536     7381      0      0 0          7381      0      0      0 LRU
veth078d  1500    40657      0      0 0         48148      0      0      0 BMRU
veth231e  1500     2582      0      0 0          2323      0      0      0 BMRU
veth2f4f  1500       19      0      0 0           164      0      0      0 BMRU

网卡设置（ethtool）

我尝试禁用所有我可以编辑的 RX 硬件卸载设置，但没有帮助。

我增加了缓冲区，但也没有帮助。

# ethtool -S eno1 | grep rx
     rx_packets: 55580744
     rx_bytes: 76852450760
     rx_broadcast: 294019
     rx_multicast: 228993
     rx_crc_errors: 0
     rx_no_buffer_count: 0
     rx_missed_errors: 0
     rx_long_length_errors: 0
     rx_short_length_errors: 0
     rx_align_errors: 0
     rx_flow_control_xon: 0
     rx_flow_control_xoff: 0
     rx_long_byte_count: 76852450760
     rx_smbus: 66009
     os2bmc_rx_by_bmc: 19137
     os2bmc_rx_by_host: 190
     rx_hwtstamp_cleared: 0
     rx_errors: 0
     rx_length_errors: 0
     rx_over_errors: 0
     rx_frame_errors: 0
     rx_fifo_errors: 35
     rx_queue_0_packets: 16271369
     rx_queue_0_bytes: 22437386945
     rx_queue_0_drops: 0
     rx_queue_0_csum_err: 0
     rx_queue_0_alloc_failed: 0
     rx_queue_1_packets: 5913593
     rx_queue_1_bytes: 6458814275
     rx_queue_1_drops: 0
     rx_queue_1_csum_err: 1
     rx_queue_1_alloc_failed: 0
     rx_queue_2_packets: 29208019
     rx_queue_2_bytes: 42357497354
     rx_queue_2_drops: 35
     rx_queue_2_csum_err: 0
     rx_queue_2_alloc_failed: 0
     rx_queue_3_packets: 4121883
     rx_queue_3_bytes: 5366292094
     rx_queue_3_drops: 0
     rx_queue_3_csum_err: 0
     rx_queue_3_alloc_failed: 0

# ethtool -k eno1 | grep -vE 'tx|fixed'
Features for eno1:
rx-checksumming: off
scatter-gather: off
tcp-segmentation-offload: off
udp-fragmentation-offload: off
generic-segmentation-offload: off
generic-receive-offload: off
rx-vlan-offload: off
ntuple-filters: off
receive-hashing: off
rx-all: off
hw-tc-offload: on

# ethtool -g eno1
Ring parameters for eno1:
Pre-set maximums:
RX:     4096
RX Mini:    0
RX Jumbo:   0
TX:     4096
Current hardware settings:
RX:     256
RX Mini:    0
RX Jumbo:   0
TX:     256

Dropwatch

从这篇博文我找到了这个工具滴滴手表，输出结果如下：

# sudo ./dropwatch -l kas
Initalizing kallsyms db
dropwatch> start
Enabling monitoring...
Kernel monitoring activated.
Issue Ctrl-C to stop monitoring
6 drops at __netif_receive_skb_core+4a0 (0xffffffff979002d0)
1 drops at icmpv6_rcv+310 (0xffffffff97a2e6a0)
1 drops at ip_rcv_finish_core.isra.18+1b4 (0xffffffff97976644)
1 drops at __udp4_lib_rcv+a34 (0xffffffff979b0fc4)
3 drops at __udp4_lib_rcv+a34 (0xffffffff979b0fc4)
1 drops at unix_release_sock+1a7 (0xffffffff979f9977)
1 drops at unix_release_sock+1a7 (0xffffffff979f9977)
1 drops at sk_stream_kill_queues+4d (0xffffffff978eeffd)
2 drops at unix_stream_connect+2e5 (0xffffffff979fae75)
12 drops at __netif_receive_skb_core+4a0 (0xffffffff979002d0)
1 drops at sk_stream_kill_queues+4d (0xffffffff978eeffd)
1 drops at sk_stream_kill_queues+4d (0xffffffff978eeffd)
2 drops at __udp4_lib_rcv+a34 (0xffffffff979b0fc4)
2 drops at unix_stream_connect+2e5 (0xffffffff979fae75)
6 drops at ip_forward+1b5 (0xffffffff97978615)
1 drops at unix_release_sock+1a7 (0xffffffff979f9977)
1 drops at __udp4_lib_rcv+a34 (0xffffffff979b0fc4)
1 drops at sk_stream_kill_queues+4d (0xffffffff978eeffd)
1 drops at sk_stream_kill_queues+4d (0xffffffff978eeffd)
2 drops at unix_stream_connect+2e5 (0xffffffff979fae75)
2 drops at unix_stream_connect+2e5 (0xffffffff979fae75)
1 drops at unix_release_sock+1a7 (0xffffffff979f9977)
12 drops at __netif_receive_skb_core+4a0 (0xffffffff979002d0)
6 drops at ip_forward+1b5 (0xffffffff97978615)
1 drops at tcp_v6_rcv+16c (0xffffffff97a3829c)
2 drops at unix_stream_connect+2e5 (0xffffffff979fae75)
12 drops at __netif_receive_skb_core+4a0 (0xffffffff979002d0)
1 drops at sk_stream_kill_queues+4d (0xffffffff978eeffd)
2 drops at unix_stream_connect+2e5 (0xffffffff979fae75)
^C

我读到这是因为大多数下降都发生在__netif_receive_skb_core。

这Red Hat Enterprise Linux 网络性能调整指南说（“适配器队列”一章）：

内核netif_receive_skb()函数将找到数据包对应的 CPU，并将数据包排入该 CPU 的队列。如果该处理器的队列已满且已达到最大大小，则数据包将被丢弃。要调整此设置，首先确定是否需要增加积压。该/proc/net/softnet_stat文件的第二列包含一个计数器，当 netdev 积压队列溢出时，该计数器会增加。如果此值随时间增加，则netdev_max_backlog需要增加。

增加netdev_max_backlog并没有帮助，但是这让我想到了 SoftIRQ：

软件中断

根据 Red Hat 文档，SoftIRQ 有几个有趣的方面：

# cat /proc/net/softnet_stat
00024f83 00000000 000000e8 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
000152c0 00000000 0000008d 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00010238 00000000 00000061 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00010d8c 00000000 00000081 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
000f3cb3 00000000 00000d83 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0009e391 00000000 0000050d 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0025265b 00000000 00001023 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00111a24 00000000 0000095a 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
008bcbf0 00000000 0000355d 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
004875d8 00000000 00002408 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0001c93c 00000000 000000cc 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00025fdb 00000000 000000fa 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0005d1e5 00000000 000005f2 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
000f9bfd 00000000 00000b9e 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
000448bc 00000000 00000407 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00044f25 00000000 00000415 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

正如您所看到的，第二列始终为 0，与上述陈述不符netif_receive_skb()，但我看到第三列。

这在“SoftIRQ Misses”中有描述。再次引用 Red Hat 文档：

有时，需要增加 SoftIRQ 在 CPU 上运行的时间。这称为netdev_budget。预算的默认值为 300。如果中的第 3 列/proc/net/softnet_stat增加，则可以将此值加倍，这表明 SoftIRQ 没有获得足够的 CPU 时间

应该增加到net.core.netdev_budget600。什么都没改变。

我尝试过的其他事情

统计数据仍然显示共享接口上的 BMC 流量。从那时起，我尝试将 BMC 移至专用接口，但情况并没有改善。

这个文件SuSE 给出了一些丢包的合理原因以及确认丢包无害的方法：将接口设置为 PROMISC 模式时，丢包应该会消失，因为它们是由未知协议或错误的 VLAN 标签引起的。我启用了 promisc 模式，但仍然会丢包。

由于怀疑巨型帧过大，我将 MTU 大小修改为 9000。但这没有帮助。

答案1

我终于找到了问题的根源：这些是以太网帧，其以太网类型未知。我以为这些丢包应该会在 PROMISC 模式下消失，但显然不会。

在我的例子中，它是 ethertype0x8912和0x88e1AVM FritzBox 路由器发送的用于检测电源线适配器的帧。为了确认，我通过nftables以下规则集阻止了这些帧/etc/nftables.conf：

table netdev filter {
    chain ingress {
        type filter hook ingress device eno1 priority 0; policy accept;
        meta protocol {0x8912, 0x88e1} drop
    }
}

之后，网络掉线就消失了！即使没有被阻止，这些也是无害的，不会干扰我的服务器。无论如何，我都会阻止它们，以保持监控干净，并看到真正的接口掉线/性能问题。

关于这些框架的更多信息可以在这里找到：

网卡设置（ethtool）

Dropwatch

软件中断

我尝试过的其他事情

答案1

相关内容