我使用的是 M2 PCIe I225v NIC(http://www.iocrest.com/index.php?id=2316) 作为我的 Proxmox 主机上的 PCI 直通 - 传递给 pfSense VM(运行 freeBSD-14,但在 freeBSD-13 上也发生同样的情况)。一段时间以来,它一直存在问题,并且接口不断启动和关闭,尤其是在负载下。dmesg 输出给出:
igc0: link state changed to DOWN
igc0: link state changed to UP
igc0: link state changed to DOWN
igc0: link state changed to UP
igc0: link state changed to DOWN
igc0: link state changed to UP
...
它用作我的 WAN 接口,当然,发生这种情况时网络会断开。我仔细检查了接线和连接,一切看起来都很好。
PROXMOX 主机上的 lspci -s 03:00.0 -vv 的输出
03:00.0 Ethernet controller: Intel Corporation Ethernet Controller I225-V (rev 03)
Subsystem: Intel Corporation Ethernet Controller I225-V
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 16
IOMMU group: 10
Region 0: Memory at c1100000 (32-bit, non-prefetchable) [size=1M]
Region 3: Memory at c1200000 (32-bit, non-prefetchable) [size=16K]
Expansion ROM at c1000000 [disabled] [size=1M]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
Address: 0000000000000000 Data: 0000
Masking: 00000000 Pending: 00000000
Capabilities: [70] MSI-X: Enable+ Count=5 Masked-
Vector table: BAR=3 offset=00000000
PBA: BAR=3 offset=00002000
Capabilities: [a0] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L1, Exit Latency L1 <4us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s (ok), Width x1 (ok)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR+
10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- TPHComp- ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ OBFF Disabled,
AtomicOpsCtl: ReqEn-
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [100 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [140 v1] Device Serial Number 88-c9-b3-ff-ff-bf-77-ab
Capabilities: [1c0 v1] Latency Tolerance Reporting
Max snoop latency: 3145728ns
Max no snoop latency: 3145728ns
Capabilities: [1f0 v1] Precision Time Measurement
PTMCap: Requester:+ Responder:- Root:-
PTMClockGranularity: 4ns
PTMControl: Enabled:+ RootSelected:-
PTMEffectiveGranularity: 4ns
Capabilities: [1e0 v1] L1 PM Substates
L1SubCap: PCI-PM_L1.2- PCI-PM_L1.1+ ASPM_L1.2- ASPM_L1.1+ L1_PM_Substates+
L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
L1SubCtl2:
Kernel driver in use: vfio-pci
Kernel modules: igc
对我来说这看起来还不错。
以下是我尝试的方法:
- 禁用/启用卸载:禁用硬件 checksun 卸载会使情况变得更糟,启用 TSO 或 LRO 也会使情况变得更糟,即使驱动程序 (igc) 支持它们。通过 pfSense 界面完成。
- 禁用 EEE,使用 /boot/loader.conf 中的以下条目:
hw.igc.eee_enable=0
- 在 pfSense GUI 中手动将速度/双工设置为 2500Base-T,而不是自动选择
- 观察 PROXMOX 主机上的 lspci 输出以检查 PM 变化。即使主机崩溃,PMv3 状态仍保持在 D0。
- 检查了 pfSense GUI 中的输入/输出错误 - 没有发现任何错误。中断看起来也正常(大约 100-200/秒)。
所以我没有主意了。我该如何进一步诊断这个问题,还有其他什么原因?谢谢!