系统日志中每天有 35 GB 的“PCIe 总线错误:严重性=已更正,类型=数据链路层...”

系统日志中每天有 35 GB 的“PCIe 总线错误:严重性=已更正,类型=数据链路层...”
  • Ubuntu 20.04.4 LTS
  • 戴尔 XPS 8940
  • 最新 BIOS 2.4.0

新安装,系统似乎运行良好,但我每天收到 35 GB 的系统日志错误消息,例如:

Feb 25 00:00:10 mumsilar kernel: [32409.088886] pcieport 0000:00:01.0: AER: Multiple Corrected error received: 0000:00:01.0
Feb 25 00:00:10 mumsilar kernel: [32409.088907] pcieport 0000:00:01.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
Feb 25 00:00:10 mumsilar kernel: [32409.088908] pcieport 0000:00:01.0:   device [8086:4c01] error status/mask=00000040/00002000
Feb 25 00:00:10 mumsilar kernel: [32409.088910] pcieport 0000:00:01.0:    [ 6] BadTLP
Feb 25 00:00:19 mumsilar kernel: [32418.024062] pcieport 0000:00:01.0: AER: Multiple Corrected error received: 0000:00:01.0
Feb 25 00:00:19 mumsilar kernel: [32418.024100] pcieport 0000:00:01.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Feb 25 00:00:19 mumsilar kernel: [32418.024102] pcieport 0000:00:01.0:   device [8086:4c01] error status/mask=00000001/00002000
Feb 25 00:00:19 mumsilar kernel: [32418.024103] pcieport 0000:00:01.0:    [ 0] RxErr
Feb 25 00:00:20 mumsilar kernel: [32418.431966] pcieport 0000:00:01.0: AER: Multiple Corrected error received: 0000:00:01.0
Feb 25 00:00:20 mumsilar kernel: [32418.432012] pcieport 0000:00:01.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Feb 25 00:00:20 mumsilar kernel: [32418.432014] pcieport 0000:00:01.0:   device [8086:4c01] error status/mask=00000001/00002000
Feb 25 00:00:20 mumsilar kernel: [32418.432016] pcieport 0000:00:01.0:    [ 0] RxErr
Feb 25 00:00:20 mumsilar kernel: [32418.443484] pcieport 0000:00:01.0: AER: Corrected error received: 0000:00:01.0
Feb 25 00:00:20 mumsilar kernel: [32418.443492] pcieport 0000:00:01.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
Feb 25 00:00:20 mumsilar kernel: [32418.443494] pcieport 0000:00:01.0:   device [8086:4c01] error status/mask=00000040/00002000
Feb 25 00:00:20 mumsilar kernel: [32418.443495] pcieport 0000:00:01.0:    [ 6] BadTLP
...

以下是以下命令的输出

sudo lspci -nn
sudo lspci -tv
sudo lshw -C network
sudo sysctl vm.swappiness
inxi -Fxxxrz
sudo lspci -s 00:01.0 -vvv

https://pastebin.com/XnvMbxm5 (添加lspci -s 00:01.0 -vvv输出)

感谢您提供的任何帮助。干杯。

附言

我已将这一行添加到/etc/modprobe.d/alsa-base.conf

# apparently after power saving shuts down the audio, the next time it turns on
# it will audibly pop.  Turn off shutting down the audio to prevent the popping.
# see https://superuser.com/questions/1493096/linux-ubuntu-speakers-popping-every-few-seconds#:~:text=The%20operation%20system's%20default%20behavior,value%20from%201%20to%200.
options snd-hda-intel power_save=0 power_save_controller=N

答案1

看起来您的 Nvidia 卡导致了系统日志问题。


Feb 25 00:00:10 mumsilar kernel: [32409.088886] pcieport 0000:00:01.0: AER: Multiple Corrected error received: 0000:00:01.0
Feb 25 00:00:10 mumsilar kernel: [32409.088907] pcieport 0000:00:01.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
Feb 25 00:00:10 mumsilar kernel: [32409.088908] pcieport 0000:00:01.0:   device [8086:4c01] error status/mask=00000040/00002000

00:01.0 PCI bridge [0604]: Intel Corporation Device [8086:4c01] (rev 01)

-[0000:00]-+-00.0-[ff]--
           +-01.0-[02]--+-00.0  NVIDIA Corporation TU116 [GeForce GTX 1660 Ti]
           |            +-00.1  NVIDIA Corporation TU116 High Definition Audio Controller
           |            +-00.2  NVIDIA Corporation TU116 USB 3.1 Host Controller
           |            \-00.3  NVIDIA Corporation TU116 [GeForce GTX 1650 SUPER]

尝试 Nvidia 510.54。如果您可以使用其他显卡,请尝试使用其他显卡。

要消除系统日志噪音,请执行以下操作...

sudo -H gedit /etc/default/grub# 编辑此文件

寻找:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"

更改为:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash pci=noaer"

保存文件。

sudo update-grub# 更新 GRUB

reboot# 重启计算机

更新#1:

通过关闭 ASPM 修复:pcie_aspm=off

https://forums.developer.nvidia.com/t/pcie-bus-error-severity-corrected-on-jetson-nano/155780

相关内容