新系统出现 pcie 错误需要帮助调试

新系统出现 pcie 错误需要帮助调试

我遇到了一些错误,希望有人能帮我调试。首先这是什么意思?其次,如果可能的话,我应该如何进一步研究调试步骤和完整的解决方案?

运行 Aorus Gaming 7 主板,配备 1950x Threadripper CPU 和 Nvidia 1070 以及最新驱动程序。

这是粘贴的链接

system log
-------------------------
8/23/17 9:30 PM -x399   kernel  [19510.161819] dpc 0000:00:01.1:pcie010: DPC containment event, status:0x1f00 source:0x0000
8/23/17 9:30 PM -x399   kernel  [19510.161833] pcieport 0000:00:01.1: AER: Corrected error received: id=0000
8/23/17 9:30 PM -x399   kernel  [19510.161837] pcieport 0000:00:01.1: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0009(Receiver ID)
8/23/17 9:30 PM -x399   kernel  [19510.161840] pcieport 0000:00:01.1:   device [1022:1453] error status/mask=00000040/00006000
8/23/17 9:30 PM -x399   kernel  [19510.161842] pcieport 0000:00:01.1:    [ 6] Bad TLP               
8/23/17 9:31 PM -x399   kernel  [19539.323943] dpc 0000:00:01.1:pcie010: DPC containment event, status:0x1f00 source:0x0000
8/23/17 9:31 PM -x399   kernel  [19539.323957] pcieport 0000:00:01.1: AER: Corrected error received: id=0000
8/23/17 9:31 PM -x399   kernel  [19539.323961] pcieport 0000:00:01.1: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0009(Receiver ID)
8/23/17 9:31 PM -x399   kernel  [19539.323964] pcieport 0000:00:01.1:   device [1022:1453] error status/mask=00000040/00006000
8/23/17 9:31 PM -x399   kernel  [19539.323967] pcieport 0000:00:01.1:    [ 6] Bad TLP               
8/23/17 9:42 PM -x399   kernel  [20194.657679] dpc 0000:00:01.1:pcie010: DPC containment event, status:0x1f00 source:0x0000
8/23/17 9:42 PM -x399   kernel  [20194.657692] pcieport 0000:00:01.1: AER: Corrected error received: id=0000
8/23/17 9:42 PM -x399   kernel  [20194.657696] pcieport 0000:00:01.1: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0009(Receiver ID)
8/23/17 9:42 PM -x399   kernel  [20194.657699] pcieport 0000:00:01.1:   device [1022:1453] error status/mask=00000040/00006000
8/23/17 9:42 PM -x399   kernel  [20194.657702] pcieport 0000:00:01.1:    [ 6] Bad TLP

lspci output
-------------------------
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1450
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Device 1451
00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1452
00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1453
00:01.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1453
00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1452
00:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1452
00:04.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1452
00:07.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1452
00:07.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1454
00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1452
00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1454
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 59)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51)
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1460
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1461
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1462
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1463
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1464
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1465
00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1466
00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1467
00:19.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1460
00:19.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1461
00:19.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1462
00:19.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1463
00:19.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1464
00:19.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1465
00:19.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1466
00:19.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1467
01:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] Device 43ba (rev 02)
01:00.1 SATA controller: Advanced Micro Devices, Inc. [AMD] Device 43b6 (rev 02)
01:00.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43b1 (rev 02)
02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43b4 (rev 02)
02:02.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43b4 (rev 02)
02:03.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43b4 (rev 02)
02:04.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43b4 (rev 02)
03:00.0 USB controller: ASMedia Technology Inc. Device 1343
04:00.0 Network controller: Intel Corporation Device 24fd (rev 78)
05:00.0 Ethernet controller: Qualcomm Atheros Device e0b1 (rev 10)
07:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd Device a804
08:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device 145a
08:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Device 1456
08:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Device 145c
09:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device 1455
09:00.2 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 51)
09:00.3 Audio device: Advanced Micro Devices, Inc. [AMD] Device 1457
40:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1450
40:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Device 1451
40:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1452
40:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1452
40:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1452
40:03.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1453
40:04.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1452
40:07.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1452
40:07.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1454
40:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1452
40:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1454
41:00.0 VGA compatible controller: NVIDIA Corporation Device 1b81 (rev a1)
41:00.1 Audio device: NVIDIA Corporation Device 10f0 (rev a1)
42:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device 145a
42:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Device 1456
42:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Device 145c
43:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device 1455
43:00.2 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 51)

答案1

更新:我通过 BIOS 升级到 F12 版本,现在问题已经消失,无需修改 GRUB。

似乎这个问题发生在从英特尔的 x99 到 AMD 的 x39​​9 的许多主板上。

尽管我无法完全解释发生了什么,但我至少可以提供一些细节。

我原本以为 TLP 是一些电源问题,但经过一番研究后,我发现它实际上代表事务层数据包 (TLP)。

硬件通常会检测到错误的数据包,然后 Linux 内核会将其以消息形式报告。

内核选项 pci=nommconf 禁用内存映射 PCI 配置空间。您可以使用此命令编辑 grub 来添加它。

sudo nano /etc/default/grub

找到变量 GRUB_CMDLINE_LINUX_DEFAULT 并在末尾的引号中添加下面的行。

pci=nommconf

我的后来看起来像这样。

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash pci=nommconf"

这可能是设备或控制器中的硬件错误,或者是其他完全不同的问题。

虽然这是一个真正的解决方案,可以解决错误,而不仅仅是抑制错误,但不需要太多技术知识,感觉这是一个很好的解决方案。不过我个人会留意更多主板 BIOS 更新以及内核更新,并暂时删除更改以查看是否已解决。

相关内容