增加 10 GbE NIC 可用的 PCIe 带宽

增加 10 GbE NIC 可用的 PCIe 带宽

我在 Dell 7920 Rack 上运行 Red Hat Enterprise Linux 8.0 (RHEL 8.0)。这台机器有 8 GB 的 RAM。我没有选择添加更多 RAM(其硬件配置已锁定)。

这台机器还配备了 Myricom10G-PCIE2-8B2-2S安装了 2 端口 10 GbE NIC。我没有使用巨型帧(因为连接到此机器的设备不使用巨型帧),因此 MTU 设置为 1500。此 NIC 支持的最大 RX 环形缓冲区大小为 512,这就是 RX 环形缓冲区大小的设置(如ethtool -g eth0)。此 NIC 仅接收数据(用于从连接的设备捕获数据包)。接收的流量始终是 UDP。

从现在开始,我将把连接的设备称为被测设备 (DUT)

如果我把 DUT 取出来,然后连接一个运行 RHEL 8.0 的相同配置的 Dell 7920 Rack,iperf3产生次优结果。接下来的情况很典型。

# Server Invocation
[root]# iperf3 -s -V --udp-counters-64bit

# Client Invocation
[root] # iperf3 -u -V -b 0 --udp-counters-64bit -t 30 -c 192.168.0.1

# Result
[  5]   0.00-30.04  sec  19.6 GBytes  5.61 Gbits/sec  0.004 ms  2002/14543900 (0.014%)  receiver

所以,~5.6 Gbps达到了,并且有一些帧丢失。

所连接的 DUT 也会出现帧丢失的情况。解决这种帧丢失问题是这篇文章的目标。

(顺便提一下,当 DUT 不在图中,并且我将两台机器的 MTU 都设置为 9000(即巨型帧)时,iperf3吞吐量跃升至约 9.4 Gbps。遗憾的是,正如我所说,DUT 仅支持 1500 的 MTU。)

由于 MTU 和 RX 环形缓冲区大小已设置为最佳可用值,因此我一直在寻找其他地方。具体来说,我一直在研究 PCIe 总线。

以下内容摘自解码

处理 指定 类型 当前使用情况 长度 ID 总线地址
0x0900 PCIe 插槽 1 x16 PCI Express 3 正在使用 长的 1 0000:65:00.0
0x0901 PCIe 插槽 2 x8 PCI Express 3 x16 正在使用 长的 2 0000:b5:00.0
0x0902 PCIe 插槽 3 x8 PCI Express 3 x16 正在使用 长的 3 0000:b3:00.0
0x0903 PCIe 插槽 4 x16 PCI Express 3 可用的 长的 4
0x0904 PCIe 插槽 5 x8 PCI Express 3 x16 可用的 长的 5
0x0905 PCIe 插槽 6 x8 PCI Express 3 x16 正在使用 长的 6 0000:17:00.0
0x0906 PCIe 插槽 7 x8 PCI Express 3 x16 可用的 长的 7
0x0907 PCIe 插槽 8 x16 PCI Express 3 可用的 长的 8

我还注意到以下内容出现在输出中消息

32.000 Gb/s available PCIe bandwidth, limited by 5 GT/s x8 link at 0000:b2:02.0 (capable of 63.008 Gb/s with 8 GT/s x8 link)
16.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s x8 link at 0000:b6:02.0 (capable of 63.008 Gb/s with 8 GT/s x8 link)
16.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s x8 link at 0000:b6:04.0 (capable of 63.008 Gb/s with 8 GT/s x8 link)

最后,以下是输出的相关部分lspci-nnvv 复制代码

b7:00.0 Ethernet controller [0200]: MYRICOM Inc. Myri-10G Dual-Protocol NIC [14c1:0008] (rev 01)
        Subsystem: MYRICOM Inc. 10G-PCIE-8B [14c1:000a]
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 115
        NUMA node: 0
        Region 0: Memory at b8000000 (64-bit, prefetchable) [size=16M]
        Region 2: Memory at b9100000 (64-bit, non-prefetchable) [size=1M]
        Expansion ROM at <ignored> [disabled]
        Capabilities: [44] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000fee00378  Data: 0000
        Capabilities: [54] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [5c] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
                DevCtl: Report errors: Correctable- Non-Fatal+ Fatal+ Unsupported+
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 256 bytes, MaxReadReq 4096 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x8, ASPM L0s, Exit Latency L0s unlimited
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x8, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range BCD, TimeoutDis+, LTR-, OBFF Not Supported
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis-, LTR-, OBFF Disabled
                         AtomicOpsCtl: ReqEn-
                LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [a0] Vendor Specific Information: Len=20 <?>
        Capabilities: [d0] MSI-X: Enable- Count=128 Masked-
                Vector table: BAR=2 offset=000f0000
                PBA: BAR=2 offset=000f9000
        Capabilities: [e0] Vital Product Data
                Product Name: 10G-PCIE2-8B2-2S
                Read-only fields:
                        [PN] Part number: 09-04244
                        [SN] Serial number: XXXXXX
                        [RV] Reserved: checksum good, 0 byte(s) reserved
                End
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol+
                UESvrt: DLP+ SDES- TLP+ FCP+ CmpltTO+ CmpltAbrt+ UnxCmplt- RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ NonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn+ ECRCChkCap+ ECRCChkEn+
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [1a8 v1] Device Serial Number XX-XX-XX-XX-XX-XX-XX-XX
        Kernel driver in use: myri10ge
        Kernel modules: myri10ge

b8:00.0 Ethernet controller [0200]: MYRICOM Inc. Myri-10G Dual-Protocol NIC [14c1:0008] (rev 01)
        Subsystem: MYRICOM Inc. 10G-PCIE-8B [14c1:000a]
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 114
        NUMA node: 0
        Region 0: Memory at b7000000 (64-bit, prefetchable) [size=16M]
        Region 2: Memory at b9000000 (64-bit, non-prefetchable) [size=1M]
        Expansion ROM at <ignored> [disabled]
        Capabilities: [44] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000fee00358  Data: 0000
        Capabilities: [54] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [5c] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
                DevCtl: Report errors: Correctable- Non-Fatal+ Fatal+ Unsupported+
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 256 bytes, MaxReadReq 4096 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x8, ASPM L0s, Exit Latency L0s unlimited
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x8, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range BCD, TimeoutDis+, LTR-, OBFF Not Supported
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis-, LTR-, OBFF Disabled
                         AtomicOpsCtl: ReqEn-
                LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [a0] Vendor Specific Information: Len=20 <?>
        Capabilities: [d0] MSI-X: Enable- Count=128 Masked-
                Vector table: BAR=2 offset=000f0000
                PBA: BAR=2 offset=000f9000
        Capabilities: [e0] Vital Product Data
                Product Name: 10G-PCIE2-8B2-2S
                Read-only fields:
                        [PN] Part number: 09-04244
                        [SN] Serial number: XXXXXX
                        [RV] Reserved: checksum good, 0 byte(s) reserved
                End
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol+
                UESvrt: DLP+ SDES- TLP+ FCP+ CmpltTO+ CmpltAbrt+ UnxCmplt- RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ NonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn+ ECRCChkCap+ ECRCChkEn+
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [1a8 v1] Device Serial Number XX-XX-XX-XX-XX-XX-XX-XX
        Kernel driver in use: myri10ge
        Kernel modules: myri10ge

我的问题

  1. 如何为 NIC 所在的 PCIe 插槽分配更多带宽?
  2. PCIe 总线之外的上述数据中是否有任何内容可以帮助我克服遇到的帧丢失问题?

相关内容