我最近将 PC 上的 Ubuntu 16.04 升级到了 18.04(配备 Nvidia GeForce GTX 1080 Ti GPU),从那时起我的文件系统就一直在从这些巨大的日志文件中构建。每次我关闭电脑时,我都会收到无数的 pcieport 消息,直到 PC 自行关闭。
昨天晚上我让机器一直开着,当我回来时,出现了一个通知,说磁盘空间已全部用完,也就是说/dev/sda1
使用了 100%。我设法通过 -command 发现了这个问题du
,并将问题定位到/var/log/
-folder 中的日志文件,其中包含超过350GB
.
目前,日志文件正在再次累积,现在它们占用的空间约为150GB
。导致问题的日志文件是:syslog.1, syslog
和kern.log
我的问题是:是什么导致了这个问题以及如何解决它?
我在下面列出了有关我的系统的信息和日志文件中的几行。我会再次删除它们,但无休止地删除它们似乎不是最好的长期解决方案。
Distributor ID: Ubuntu
Description: Ubuntu 18.04.3 LTS
Release: 18.04
Codename: bionic
nvidia-smi
Thu Aug 15 09:25:53 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.40 Driver Version: 430.40 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:01:00.0 On | N/A |
| 24% 58C P0 67W / 250W | 1373MiB / 11177MiB | 1% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1298 G /usr/lib/xorg/Xorg 89MiB |
| 0 1337 G /usr/bin/gnome-shell 50MiB |
| 0 2258 G /usr/lib/xorg/Xorg 726MiB |
| 0 2465 G /usr/bin/gnome-shell 189MiB |
| 0 14914 G ...e --type=gpu-process --field-trial-hand 154MiB |
| 0 18206 C /usr/lib/libreoffice/program/soffice.bin 137MiB |
+-----------------------------------------------------------------------------+
lspci -vt
-[0000:00]-+-00.0 Intel Corporation 8th Gen Core Processor Host Bridge/DRAM Registers
+-01.0-[01]--+-00.0 NVIDIA Corporation GP102 [GeForce GTX 1080 Ti]
| \-00.1 NVIDIA Corporation GP102 HDMI Audio Controller
+-02.0 Intel Corporation Device 3e92
+-14.0 Intel Corporation 200 Series/Z370 Chipset Family USB 3.0 xHCI Controller
+-16.0 Intel Corporation 200 Series PCH CSME HECI #1
+-17.0 Intel Corporation 200 Series PCH SATA controller [AHCI mode]
+-1b.0-[02]--
+-1c.0-[03]--
+-1c.4-[04]----00.0 ASMedia Technology Inc. Device 2142
+-1c.7-[05]----00.0 Realtek Semiconductor Co., Ltd. RTL8812AE 802.11ac PCIe Wireless Network Adapter
+-1d.0-[06]--
+-1f.0 Intel Corporation Z370 Chipset LPC/eSPI Controller
+-1f.2 Intel Corporation 200 Series/Z370 Chipset Family Power Management Controller
+-1f.3 Intel Corporation 200 Series PCH HD Audio
+-1f.4 Intel Corporation 200 Series/Z370 Chipset Family SMBus Controller
\-1f.6 Intel Corporation Ethernet Connection (2) I219-V
系统日志.1
Aug 14 10:14:03 user kernel: [ 10.680132] pcieport 0000:00:1c.7: AER: Corrected error received: 0000:00:1c.7
Aug 14 10:14:03 user kernel: [ 10.680135] pcieport 0000:00:1c.7: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Aug 14 10:14:03 user kernel: [ 10.680135] pcieport 0000:00:1c.7: device [8086:a297] error status/mask=00000001/00002000
Aug 14 10:14:03 user kernel: [ 10.680136] pcieport 0000:00:1c.7: [ 0] RxErr
Aug 14 10:14:03 user kernel: [ 10.680187] pcieport 0000:00:1c.7: AER: Corrected error received: 0000:00:1c.7
Aug 14 10:14:03 user kernel: [ 10.680190] pcieport 0000:00:1c.7: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Aug 14 10:14:03 user kernel: [ 10.680190] pcieport 0000:00:1c.7: device [8086:a297] error status/mask=00000001/00002000
Aug 14 10:14:03 user kernel: [ 10.680191] pcieport 0000:00:1c.7: [ 0] RxErr
Aug 14 10:14:03 user kernel: [ 10.680281] pcieport 0000:00:1c.7: AER: Corrected error received: 0000:00:1c.7
Aug 14 10:14:03 user kernel: [ 10.680284] pcieport 0000:00:1c.7: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Aug 14 10:14:03 user kernel: [ 10.680284] pcieport 0000:00:1c.7: device [8086:a297] error status/mask=00000001/00002000
Aug 14 10:14:03 user kernel: [ 10.680285] pcieport 0000:00:1c.7: [ 0] RxErr
Aug 14 10:14:03 user kernel: [ 10.680374] pcieport 0000:00:1c.7: AER: Multiple Corrected error received: 0000:00:1c.7
Aug 14 10:14:03 user kernel: [ 10.680378] pcieport 0000:00:1c.7: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Aug 14 10:14:03 user kernel: [ 10.680379] pcieport 0000:00:1c.7: device [8086:a297] error status/mask=00000001/00002000
Aug 14 10:14:03 user kernel: [ 10.680380] pcieport 0000:00:1c.7: [ 0] RxErr
Aug 14 10:14:03 user kernel: [ 10.680586] pcieport 0000:00:1c.7: AER: Multiple Corrected error received: 0000:00:1c.7
Aug 14 10:14:03 user kernel: [ 10.680590] pcieport 0000:00:1c.7: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Aug 14 10:14:03 user kernel: [ 10.680591] pcieport 0000:00:1c.7: device [8086:a297] error status/mask=00000001/00002000
Aug 14 10:14:03 user kernel: [ 10.680591] pcieport 0000:00:1c.7: [ 0] RxErr
系统日志
Aug 15 09:04:23 user kernel: [ 307.590656] pcieport 0000:00:1c.7: [ 0] RxErr
Aug 15 09:04:23 user kernel: [ 307.590836] pcieport 0000:00:1c.7: AER: Multiple Corrected error received: 0000:00:1c.7
Aug 15 09:04:23 user kernel: [ 307.590841] pcieport 0000:00:1c.7: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Aug 15 09:04:23 user kernel: [ 307.590843] pcieport 0000:00:1c.7: device [8086:a297] error status/mask=00000001/00002000
Aug 15 09:04:23 user kernel: [ 307.590844] pcieport 0000:00:1c.7: [ 0] RxErr
Aug 15 09:04:23 user kernel: [ 307.591125] pcieport 0000:00:1c.7: AER: Multiple Corrected error received: 0000:00:1c.7
Aug 15 09:04:23 user kernel: [ 307.591134] pcieport 0000:00:1c.7: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Aug 15 09:04:23 user kernel: [ 307.591135] pcieport 0000:00:1c.7: device [8086:a297] error status/mask=00000001/00002000
Aug 15 09:04:23 user kernel: [ 307.591136] pcieport 0000:00:1c.7: [ 0] RxErr
Aug 15 09:04:23 user kernel: [ 307.591414] pcieport 0000:00:1c.7: AER: Multiple Corrected error received: 0000:00:1c.7
Aug 15 09:04:23 user kernel: [ 307.591419] pcieport 0000:00:1c.7: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Aug 15 09:04:23 user kernel: [ 307.591420] pcieport 0000:00:1c.7: device [8086:a297] error status/mask=00000001/00002000
Aug 15 09:04:23 user kernel: [ 307.591422] pcieport 0000:00:1c.7: [ 0] RxErr
Aug 15 09:04:23 user kernel: [ 307.591607] pcieport 0000:00:1c.7: AER: Multiple Corrected error received: 0000:00:1c.7
Aug 15 09:04:23 user kernel: [ 307.591614] pcieport 0000:00:1c.7: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Aug 15 09:04:23 user kernel: [ 307.591616] pcieport 0000:00:1c.7: device [8086:a297] error status/mask=00000001/00002000
Aug 15 09:04:23 user kernel: [ 307.591617] pcieport 0000:00:1c.7: [ 0] RxErr
Aug 15 09:04:23 user kernel: [ 307.591896] pcieport 0000:00:1c.7: AER: Multiple Corrected error received: 0000:00:1c.7
Aug 15 09:04:23 user kernel: [ 307.591901] pcieport 0000:00:1c.7: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
内核日志
Aug 14 10:14:03 user kernel: [ 11.219257] pcieport 0000:00:1c.7: AER: Corrected error received: 0000:00:1c.7
Aug 14 10:14:03 user kernel: [ 11.219259] pcieport 0000:00:1c.7: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Aug 14 10:14:03 user kernel: [ 11.219260] pcieport 0000:00:1c.7: device [8086:a297] error status/mask=00000001/00002000
Aug 14 10:14:03 user kernel: [ 11.219260] pcieport 0000:00:1c.7: [ 0] RxErr
Aug 14 10:14:03 user kernel: [ 11.219443] pcieport 0000:00:1c.7: AER: Multiple Corrected error received: 0000:00:1c.7
Aug 14 10:14:03 user kernel: [ 11.219448] pcieport 0000:00:1c.7: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Aug 14 10:14:03 user kernel: [ 11.219448] pcieport 0000:00:1c.7: device [8086:a297] error status/mask=00000001/00002000
Aug 14 10:14:03 user kernel: [ 11.219449] pcieport 0000:00:1c.7: [ 0] RxErr
Aug 14 10:14:03 user kernel: [ 11.219714] pcieport 0000:00:1c.7: AER: Corrected error received: 0000:00:1c.7
Aug 14 10:14:03 user kernel: [ 11.219717] pcieport 0000:00:1c.7: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Aug 14 10:14:03 user kernel: [ 11.219718] pcieport 0000:00:1c.7: device [8086:a297] error status/mask=00000001/00002000
Aug 14 10:14:03 user kernel: [ 11.219718] pcieport 0000:00:1c.7: [ 0] RxErr
Aug 14 10:14:03 user kernel: [ 11.219916] pcieport 0000:00:1c.7: AER: Multiple Corrected error received: 0000:00:1c.7
Aug 14 10:14:03 user kernel: [ 11.219922] pcieport 0000:00:1c.7: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Aug 14 10:14:03 user kernel: [ 11.219923] pcieport 0000:00:1c.7: device [8086:a297] error status/mask=00000001/00002000
Aug 14 10:14:03 user kernel: [ 11.219924] pcieport 0000:00:1c.7: [ 0] RxErr
Aug 14 10:14:03 user kernel: [ 11.220101] pcieport 0000:00:1c.7: AER: Corrected error received: 0000:00:1c.7
Aug 14 10:14:03 user kernel: [ 11.220104] pcieport 0000:00:1c.7: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Aug 14 10:14:03 user kernel: [ 11.220105] pcieport 0000:00:1c.7: device [8086:a297] error status/mask=00000001/00002000
Aug 14 10:14:03 user kernel: [ 11.220105] pcieport 0000:00:1c.7: [ 0] RxErr
答案1
我已经按照此处的说明解决了这个问题(我认为):
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1521173
解决方法:将 pci=noaer 添加到内核命令行:
(1)编辑 /etc/default/grub,并将 pci=noaer 添加到以 GRUB_CMDLINE_LINUX_DEFAULT 开头的行。它将看起来像这样:
GRUB_CMDLINE_LINUX_DEFAULT="安静启动 pci=noaer"
(2)运行“sudo update-grub”
(3)重启
执行这些步骤之后,关闭系统后我不再收到pcieport
消息,并且日志文件的大小也停止大幅增长。
但是,我不知道这是否解决了错误消息的根本原因......