我在 Dell 7920 Rack 上运行 Red Hat Enterprise Linux 8.0 (RHEL 8.0)。这台机器有 8 GB 的 RAM。我没有选择添加更多 RAM(其硬件配置已锁定)。
这台机器还配备了 Myricom10G-PCIE2-8B2-2S安装了 2 端口 10 GbE NIC。我没有使用巨型帧(因为连接到此机器的设备不使用巨型帧),因此 MTU 设置为 1500。此 NIC 支持的最大 RX 环形缓冲区大小为 512,这就是 RX 环形缓冲区大小的设置(如ethtool -g eth0)。此 NIC 仅接收数据(用于从连接的设备捕获数据包)。接收的流量始终是 UDP。
从现在开始,我将把连接的设备称为被测设备 (DUT)。
如果我把 DUT 取出来,然后连接一个运行 RHEL 8.0 的相同配置的 Dell 7920 Rack,iperf3产生次优结果。接下来的情况很典型。
# Server Invocation
[root]# iperf3 -s -V --udp-counters-64bit
# Client Invocation
[root] # iperf3 -u -V -b 0 --udp-counters-64bit -t 30 -c 192.168.0.1
# Result
[ 5] 0.00-30.04 sec 19.6 GBytes 5.61 Gbits/sec 0.004 ms 2002/14543900 (0.014%) receiver
所以,~5.6 Gbps达到了,并且有一些帧丢失。
所连接的 DUT 也会出现帧丢失的情况。解决这种帧丢失问题是这篇文章的目标。
(顺便提一下,当 DUT 不在图中,并且我将两台机器的 MTU 都设置为 9000(即巨型帧)时,iperf3吞吐量跃升至约 9.4 Gbps。遗憾的是,正如我所说,DUT 仅支持 1500 的 MTU。)
由于 MTU 和 RX 环形缓冲区大小已设置为最佳可用值,因此我一直在寻找其他地方。具体来说,我一直在研究 PCIe 总线。
以下内容摘自解码:
处理 | 指定 | 类型 | 当前使用情况 | 长度 | ID | 总线地址 |
---|---|---|---|---|---|---|
0x0900 | PCIe 插槽 1 | x16 PCI Express 3 | 正在使用 | 长的 | 1 | 0000:65:00.0 |
0x0901 | PCIe 插槽 2 | x8 PCI Express 3 x16 | 正在使用 | 长的 | 2 | 0000:b5:00.0 |
0x0902 | PCIe 插槽 3 | x8 PCI Express 3 x16 | 正在使用 | 长的 | 3 | 0000:b3:00.0 |
0x0903 | PCIe 插槽 4 | x16 PCI Express 3 | 可用的 | 长的 | 4 | |
0x0904 | PCIe 插槽 5 | x8 PCI Express 3 x16 | 可用的 | 长的 | 5 | |
0x0905 | PCIe 插槽 6 | x8 PCI Express 3 x16 | 正在使用 | 长的 | 6 | 0000:17:00.0 |
0x0906 | PCIe 插槽 7 | x8 PCI Express 3 x16 | 可用的 | 长的 | 7 | |
0x0907 | PCIe 插槽 8 | x16 PCI Express 3 | 可用的 | 长的 | 8 |
我还注意到以下内容出现在输出中消息:
32.000 Gb/s available PCIe bandwidth, limited by 5 GT/s x8 link at 0000:b2:02.0 (capable of 63.008 Gb/s with 8 GT/s x8 link)
16.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s x8 link at 0000:b6:02.0 (capable of 63.008 Gb/s with 8 GT/s x8 link)
16.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s x8 link at 0000:b6:04.0 (capable of 63.008 Gb/s with 8 GT/s x8 link)
最后,以下是输出的相关部分lspci-nnvv 复制代码
b7:00.0 Ethernet controller [0200]: MYRICOM Inc. Myri-10G Dual-Protocol NIC [14c1:0008] (rev 01)
Subsystem: MYRICOM Inc. 10G-PCIE-8B [14c1:000a]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 115
NUMA node: 0
Region 0: Memory at b8000000 (64-bit, prefetchable) [size=16M]
Region 2: Memory at b9100000 (64-bit, non-prefetchable) [size=1M]
Expansion ROM at <ignored> [disabled]
Capabilities: [44] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000fee00378 Data: 0000
Capabilities: [54] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [5c] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
DevCtl: Report errors: Correctable- Non-Fatal+ Fatal+ Unsupported+
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 256 bytes, MaxReadReq 4096 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x8, ASPM L0s, Exit Latency L0s unlimited
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x8, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range BCD, TimeoutDis+, LTR-, OBFF Not Supported
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis-, LTR-, OBFF Disabled
AtomicOpsCtl: ReqEn-
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [a0] Vendor Specific Information: Len=20 <?>
Capabilities: [d0] MSI-X: Enable- Count=128 Masked-
Vector table: BAR=2 offset=000f0000
PBA: BAR=2 offset=000f9000
Capabilities: [e0] Vital Product Data
Product Name: 10G-PCIE2-8B2-2S
Read-only fields:
[PN] Part number: 09-04244
[SN] Serial number: XXXXXX
[RV] Reserved: checksum good, 0 byte(s) reserved
End
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol+
UESvrt: DLP+ SDES- TLP+ FCP+ CmpltTO+ CmpltAbrt+ UnxCmplt- RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk: RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ NonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn+ ECRCChkCap+ ECRCChkEn+
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [1a8 v1] Device Serial Number XX-XX-XX-XX-XX-XX-XX-XX
Kernel driver in use: myri10ge
Kernel modules: myri10ge
b8:00.0 Ethernet controller [0200]: MYRICOM Inc. Myri-10G Dual-Protocol NIC [14c1:0008] (rev 01)
Subsystem: MYRICOM Inc. 10G-PCIE-8B [14c1:000a]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 114
NUMA node: 0
Region 0: Memory at b7000000 (64-bit, prefetchable) [size=16M]
Region 2: Memory at b9000000 (64-bit, non-prefetchable) [size=1M]
Expansion ROM at <ignored> [disabled]
Capabilities: [44] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000fee00358 Data: 0000
Capabilities: [54] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [5c] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
DevCtl: Report errors: Correctable- Non-Fatal+ Fatal+ Unsupported+
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 256 bytes, MaxReadReq 4096 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x8, ASPM L0s, Exit Latency L0s unlimited
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x8, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range BCD, TimeoutDis+, LTR-, OBFF Not Supported
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis-, LTR-, OBFF Disabled
AtomicOpsCtl: ReqEn-
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [a0] Vendor Specific Information: Len=20 <?>
Capabilities: [d0] MSI-X: Enable- Count=128 Masked-
Vector table: BAR=2 offset=000f0000
PBA: BAR=2 offset=000f9000
Capabilities: [e0] Vital Product Data
Product Name: 10G-PCIE2-8B2-2S
Read-only fields:
[PN] Part number: 09-04244
[SN] Serial number: XXXXXX
[RV] Reserved: checksum good, 0 byte(s) reserved
End
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol+
UESvrt: DLP+ SDES- TLP+ FCP+ CmpltTO+ CmpltAbrt+ UnxCmplt- RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk: RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ NonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn+ ECRCChkCap+ ECRCChkEn+
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [1a8 v1] Device Serial Number XX-XX-XX-XX-XX-XX-XX-XX
Kernel driver in use: myri10ge
Kernel modules: myri10ge
我的问题
- 如何为 NIC 所在的 PCIe 插槽分配更多带宽?
- PCIe 总线之外的上述数据中是否有任何内容可以帮助我克服遇到的帧丢失问题?