一段时间以来,我一直遇到一些不明原因的意外重启。大约一年前购买电脑时,一切都很稳定。
检查重启日志,上一个会话报告正在运行
$ ~ ❯ last reboot 05:14:52
reboot system boot 5.15.0-50-generi Sat Oct 15 05:11 still running
reboot system boot 5.15.0-50-generi Wed Oct 12 13:21 still running
reboot system boot 5.15.0-48-generi Mon Oct 10 19:53 - 13:21 (1+17:27)
reboot system boot 5.15.0-48-generi Thu Oct 6 08:08 - 13:21 (6+05:13)
reboot system boot 5.15.0-47-generi Thu Sep 29 10:13 - 13:21 (13+03:07)
进一步查看 utmp 文件后发现发生了崩溃。
$ /var/log ❯ last -f wtmp andrea tty2 tty2 Sat Oct 15 05:11 still logged in
reboot system boot 5.15.0-50-generi Sat Oct 15 05:11 still running
andrea tty2 tty2 Wed Oct 12 13:22 - crash (2+15:49)
andrea tty2 tty2 Wed Oct 12 13:21 - 13:21 (00:00)
reboot system boot 5.15.0-50-generi Wed Oct 12 13:21 still running
andrea tty2 tty2 Mon Oct 10 19:53 - down (1+17:27)
reboot system boot 5.15.0-48-generi Mon Oct 10 19:53 - 13:21 (1+17:27)
andrea tty2 tty2 Thu Oct 6 08:08 - crash (4+11:45)
reboot system boot 5.15.0-48-generi Thu Oct 6 08:08 - 13:21 (6+05:13)
andrea tty2 tty2 Thu Sep 29 10:19 - crash (6+21:48)
andrea tty2 tty2 Thu Sep 29 10:17 - 10:19 (00:01)
reboot system boot 5.15.0-47-generi Thu Sep 29 10:13 - 13:21 (13+03:07)
andrea tty2 tty2 Thu Sep 29 09:33 - down (00:40)
reboot system boot 5.15.0-47-generi Thu Sep 29 09:29 - 10:13 (00:44)
andrea tty2 tty2 Thu Aug 25 09:40 - down (34+23:48)
reboot system boot 5.15.0-46-generi Thu Aug 25 09:37 - 09:29 (34+23:51)
andrea tty3 Sat Jul 2 15:13 - 15:13 (00:00)
andrea tty2 tty2 Sat Jul 2 13:57 - down (53+19:40)
reboot system boot 5.15.0-40-generi Sat Jul 2 13:56 - 09:37 (53+19:40)
andrea tty2 tty2 Sat Jul 2 10:52 - down (03:04)
reboot system boot 5.15.0-40-generi Sat Jul 2 10:51 - 13:56 (03:04)
andrea tty2 tty2 Sat Jul 2 11:17 - down (-00:28)
我正在运行 5.15.0-50-generic 内核,下面是硬件快照。完整详细信息可用这里。
H/W path Device Class Description
===========================================================
system MACHD-WXX9 (C100)
/0 bus MACHD-WXX9-PCB
/0/0 memory 128KiB BIOS
/0/4 processor 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
/0/4/6 memory 128KiB L1 cache
/0/4/7 memory 5MiB L2 cache
/0/4/8 memory 8MiB L3 cache
/0/5 memory 192KiB L1 cache
/0/d memory 16GiB System Memory
/0/d/0 memory 2GiB Row of chips LPDDR4 Synchronous 4267 MHz (0.2 ns)
/0/d/1 memory 2GiB Row of chips LPDDR4 Synchronous 4267 MHz (0.2 ns)
/0/d/2 memory 2GiB Row of chips LPDDR4 Synchronous 4267 MHz (0.2 ns)
/0/d/3 memory 2GiB Row of chips LPDDR4 Synchronous 4267 MHz (0.2 ns)
/0/d/4 memory 2GiB Row of chips LPDDR4 Synchronous 4267 MHz (0.2 ns)
/0/d/5 memory 2GiB Row of chips LPDDR4 Synchronous 4267 MHz (0.2 ns)
/0/d/6 memory 2GiB Row of chips LPDDR4 Synchronous 4267 MHz (0.2 ns)
/0/d/7 memory 2GiB Row of chips LPDDR4 Synchronous 4267 MHz (0.2 ns)
/0/100 bridge 11th Gen Core Processor Host Bridge/DRAM Registers
/0/100/2 /dev/fb0 display TigerLake-LP GT2 [Iris Xe Graphics]
/0/100/2/0 input15 input DP-3
/0/100/4 generic TigerLake-LP Dynamic Tuning Processor Participant
/0/100/7 bridge Tiger Lake-LP Thunderbolt 4 PCI Express Root Port #0
/0/100/7.2 bridge Tiger Lake-LP Thunderbolt 4 PCI Express Root Port #2
/0/100/d bus Tiger Lake-LP Thunderbolt 4 USB Controller
/0/100/d/0 usb1 bus xHCI Host Controller
/0/100/d/1 usb2 bus xHCI Host Controller
/0/100/d/1/3 bus USB3.0 Hub
/0/100/d.2 bus Tiger Lake-LP Thunderbolt 4 NHI #0
/0/100/d.3 bus Tiger Lake-LP Thunderbolt 4 NHI #1
/0/100/14 bus Tiger Lake-LP USB 3.2 Gen 2x1 xHCI Host Controller
/0/100/14/0 usb3 bus xHCI Host Controller
/0/100/14/0/4 bus USB2.0 Hub
/0/100/14/0/7 input16 multimedia HD Camera: HD Camera
/0/100/14/0/a communication AX201 Bluetooth
/0/100/14/1 usb4 bus xHCI Host Controller
/0/100/14.2 memory RAM memory
/0/100/14.3 wlp0s20f3 network Wi-Fi 6 AX201
/0/100/15 bus Tiger Lake-LP Serial IO I2C Controller #0
/0/100/15.1 bus Tiger Lake-LP Serial IO I2C Controller #1
/0/100/16 communication Tiger Lake-LP Management Engine Interface
/0/100/1d bridge Tiger Lake-LP PCI Express Root Port #9
/0/100/1d/0 /dev/nvme0 storage SAMSUNG MZVLB512HBJQ-00000
/0/100/1d/0/0 hwmon3 disk NVMe disk
/0/100/1d/0/2 /dev/ng0n1 disk NVMe disk
/0/100/1d/0/1 /dev/nvme0n1 disk 512GB NVMe disk
/0/100/1d/0/1/1 /dev/nvme0n1p1 volume 199MiB Windows FAT volume
/0/100/1d/0/1/2 /dev/nvme0n1p2 volume 15MiB reserved partition
/0/100/1d/0/1/3 /dev/nvme0n1p3 volume 79GiB Windows NTFS volume
/0/100/1d/0/1/4 /dev/nvme0n1p4 volume 511MiB Windows FAT volume
/0/100/1d/0/1/5 /dev/nvme0n1p5 volume 17GiB Windows NTFS volume
/0/100/1d/0/1/6 /dev/nvme0n1p6 volume 1023MiB Windows NTFS volume
/0/100/1d/0/1/7 /dev/nvme0n1p7 volume 347GiB EXT4 volume
/0/100/1d/0/1/8 /dev/nvme0n1p8 volume 29GiB Linux swap volume
/0/100/1e communication Tiger Lake-LP Serial IO UART Controller #0
/0/100/1e.3 bus Tiger Lake-LP Serial IO SPI Controller #1
/0/100/1f bridge Tiger Lake-LP LPC Controller
/0/100/1f/0 system PnP device PNP0c02
/0/100/1f/1 generic PnP device INT3f0d
/0/100/1f/2 input PnP device PNP0303
/0/100/1f/3 system PnP device PNP0c02
/0/100/1f/4 system PnP device PNP0c02
/0/100/1f/5 system PnP device PNP0c02
/0/100/1f/6 system PnP device PNP0c02
/0/100/1f.3 card0 multimedia Tiger Lake-LP Smart Sound Technology Audio Controller
/0/100/1f.4 bus Tiger Lake-LP SMBus Controller
/0/100/1f.5 bus Tiger Lake-LP SPI Controller
/1 power HB4593R1ECW-22T0
/2 input0 input Lid Switch
/3 input1 input Power Button
/4 input10 input GXTP7863:00 27C6:01E0 Touchpad
/5 input12 input SYNA2393:00 06CB:19AC
/6 input14 input Video Bus
/7 input17 input sof-hda-dsp Headphone
/8 input18 input sof-hda-dsp HDMI/DP,pcm=3
/9 input19 input sof-hda-dsp HDMI/DP,pcm=4
/a input2 input AT Translated Set 2 keyboard
/b input20 input sof-hda-dsp HDMI/DP,pcm=5
/c input23 input Paris Keyboard
/d input26 input Paris Mouse
/e input8 input Huawei WMI hotkeys
/f input9 input GXTP7863:00 27C6:01E0 Mouse
完整的 HTML 报告以红色显示两个部分:内存和串行总线控制器。
我不确定红色是否表示存在问题,但我记得我曾经修理过几台电脑,由于 1) 内存故障 2) CPU 和内存不兼容而突然崩溃。不确定这里的 1) 问题是否相同。
/proc/sys/kernel/panic 文件中有一个 0;我猜这意味着没有故障驱动程序导致重启?
任何有助于继续调查和解决问题的帮助都将不胜感激。
答案1
如果您发现随机软件问题,即如果出现“无法通过读取日志进行分析” -
那么您可能考虑转而查找硬件问题。
我对非理性行为的体验“总是”以硬件之间的“令人讨厌的”连接而告终;
例如,带有肮脏连接器的 RAM 模块(使用铅笔橡皮擦清洁!)和
有故障的旧 Molex 型 PSU 供电(更换!)。
计算机外壳越小,确保充足的冷却就越重要。
定期清除所有积聚的灰尘,
确保使用时冷却空气能够自由流动;
始终“不惜一切代价”避免堵塞。
如果清除污垢后问题仍然存在,则您的电子设备可能由于过热次数过多而受到永久性损坏:那么除了更换之外没有其他补救措施。
(很难发现问题,可能存在电路之间的“丢失时序”,通常无法通过简单的方法修复)。