启动失败后如何读取这些日志条目?

启动失败后如何读取这些日志条目?

我的系统今天多次无法启动。幸运的是,在完全断开电源后,我设法将其重新打开,但我想了解发生了什么。

从用户的角度来看,情况是这样的:我打开电脑,风扇比平时更吵,启动过程有时会卡在启动屏幕上,有时会卡在 GRUB 之后,有时会卡在映像出现的短时间内。我的显卡显示出来了。

我检查了日记,发现了以下几行以前没有出现过的内容。这是其中一只失败靴子的完整日志条目。其他条目看起来类似。

-- Boot 4c3326651829453c89c08358e88b8071 --
Jan 22 01:42:53 hostname kernel: Linux version 5.10.9-arch1-1 (linux@archlinux) (gcc (GCC) 10.2.0, GNU ld (GNU Binutils) 2.35.1) #1 SMP PREEMPT Tue, 19 Jan 2021 22:06:06 +0000
Jan 22 01:42:53 hostname kernel: Command line: BOOT_IMAGE=/vmlinuz-linux root=UUID=d825234f-4397-494f-9c61-e719a008ecbd rw loglevel=3 quiet
Jan 22 01:38:09 hostname kernel: input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:02.1/0000:01:00.1/sound/card1/input15
Jan 22 01:38:09 hostname kernel: input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:02.1/0000:01:00.1/sound/card1/input16
Jan 22 01:38:09 hostname kernel: input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:02.1/0000:01:00.1/sound/card1/input17
Jan 22 01:38:09 hostname kernel: input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:02.1/0000:01:00.1/sound/card1/input18
Jan 22 01:38:09 hostname kernel: input: HDA NVidia HDMI/DP,pcm=10 as /devices/pci0000:00/0000:00:02.1/0000:01:00.1/sound/card1/input19
Jan 22 01:38:09 hostname kernel: input: HDA NVidia HDMI/DP,pcm=11 as /devices/pci0000:00/0000:00:02.1/0000:01:00.1/sound/card1/input20
Jan 22 01:38:09 hostname kernel: input: HDA NVidia HDMI/DP,pcm=12 as /devices/pci0000:00/0000:00:02.1/0000:01:00.1/sound/card1/input21
Jan 22 01:38:09 hostname kernel: input: HD-Audio Generic Front Mic as /devices/pci0000:00/0000:00:14.2/sound/card0/input22
Jan 22 01:38:09 hostname kernel: input: HD-Audio Generic Rear Mic as /devices/pci0000:00/0000:00:14.2/sound/card0/input23
Jan 22 01:38:09 hostname kernel: input: HD-Audio Generic Line as /devices/pci0000:00/0000:00:14.2/sound/card0/input24
Jan 22 01:38:09 hostname kernel: input: HD-Audio Generic Line Out as /devices/pci0000:00/0000:00:14.2/sound/card0/input25
Jan 22 01:38:09 hostname audit[425]: NETFILTER_CFG table=filter family=10 entries=144 op=xt_replace pid=425 comm="ip6tables-resto"
Jan 22 01:38:09 hostname kernel: input: HD-Audio Generic Front Headphone as /devices/pci0000:00/0000:00:14.2/sound/card0/input26
Jan 22 01:38:09 hostname kernel: EDAC amd64: F15h detected (node 0).
Jan 22 01:38:09 hostname kernel: EDAC amd64: Error: F1 not found: device 0x1601 (broken BIOS?)
Jan 22 01:38:09 hostname audit[426]: NETFILTER_CFG table=filter family=10 entries=160 op=xt_replace pid=426 comm="ip6tables-resto"
Jan 22 01:38:09 hostname systemd-udevd[258]: controlC0: Process '/usr/bin/alsactl restore 0' failed with exit code 99.
Jan 22 01:38:09 hostname systemd[1]: Condition check resulted in First Boot Wizard being skipped.
Jan 22 01:38:09 hostname systemd[1]: Condition check resulted in File System Check on Root Device being skipped.
Jan 22 01:38:09 hostname systemd[1]: Condition check resulted in Rebuild Hardware Database being skipped.
Jan 22 01:38:09 hostname systemd[1]: Condition check resulted in Repartition Root Disk being skipped.
Jan 22 01:38:09 hostname systemd[1]: Condition check resulted in Create System Users being skipped.
Jan 22 01:38:09 hostname audit[435]: NETFILTER_CFG table=filter family=10 entries=168 op=xt_replace pid=435 comm="ip6tables-resto"
Jan 22 01:38:09 hostname kernel: EDAC amd64: F15h detected (node 0).
Jan 22 01:38:09 hostname kernel: EDAC amd64: Error: F1 not found: device 0x1601 (broken BIOS?)
Jan 22 01:38:09 hostname systemd[1]: Finished CLI Netfilter Manager.
Jan 22 01:38:09 hostname audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=ufw comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jan 22 01:38:09 hostname kernel: EDAC amd64: F15h detected (node 0).
Jan 22 01:38:09 hostname kernel: EDAC amd64: Error: F1 not found: device 0x1601 (broken BIOS?)
Jan 22 01:38:10 hostname kernel: nvidia: module license 'NVIDIA' taints kernel.
Jan 22 01:38:10 hostname kernel: Disabling lock debugging due to kernel taint
Jan 22 01:38:10 hostname kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 237
Jan 22 01:38:10 hostname kernel: 
Jan 22 01:38:10 hostname kernel: nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
Jan 22 01:38:10 hostname kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module  460.32.03  Sun Dec 27 19:00:34 UTC 2020
Jan 22 01:38:10 hostname systemd-udevd[267]: controlC1: Process '/usr/bin/alsactl restore 1' failed with exit code 99.
Jan 22 01:38:10 hostname systemd[1]: Condition check resulted in First Boot Wizard being skipped.
Jan 22 01:38:10 hostname systemd[1]: Condition check resulted in File System Check on Root Device being skipped.
Jan 22 01:38:10 hostname systemd[1]: Condition check resulted in Rebuild Hardware Database being skipped.
Jan 22 01:38:10 hostname systemd[1]: Condition check resulted in Repartition Root Disk being skipped.
Jan 22 01:38:10 hostname systemd[1]: Condition check resulted in Create System Users being skipped.
Jan 22 01:38:10 hostname kernel: random: crng init done
Jan 22 01:38:10 hostname kernel: random: 7 urandom warning(s) missed due to ratelimiting
Jan 22 01:38:10 hostname systemd[1]: Finished Load/Save Random Seed.
Jan 22 01:38:10 hostname audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-random-seed comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jan 22 01:38:10 hostname systemd[1]: Condition check resulted in First Boot Complete being skipped.
Jan 22 01:38:10 hostname kernel: nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  460.32.03  Sun Dec 27 18:51:11 UTC 2020
Jan 22 01:38:10 hostname kernel: [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
Jan 22 01:38:10 hostname kernel: [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 0
Jan 22 01:38:10 hostname systemd[1]: Created slice system-systemd\x2dbacklight.slice.
Jan 22 01:38:10 hostname systemd[1]: Starting Load/Save Screen Backlight Brightness of backlight:acpi_video0...
Jan 22 01:38:10 hostname systemd[1]: Finished Load/Save Screen Backlight Brightness of backlight:acpi_video0.
Jan 22 01:38:10 hostname audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-backlight@backlight:acpi_video0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? termi>
Jan 22 01:38:18 hostname kernel: ata2: softreset failed (1st FIS failed)
Jan 22 01:38:28 hostname kernel: ata2: softreset failed (1st FIS failed)

请注意时间戳。这种回到过去的跳跃发生了好几次。这是正常行为还是已经表明存在问题?我了解到它们应该始终按正确的时间顺序排列。当我关闭并重新打开日记时,日记条目的排序是正确的,关于它们的时间戳。

这个失败系列的第一个启动条目要长得多(可能是由于软重置)。除了上述内容之外,我还注意到以下条目:

    Jan 22 01:38:08 hostname kernel: ACPI BIOS Warning (bug): Optional FADT field Pm2ControlBlock has valid Length but zero Address: 0x0000000000000000/0x1 (20200925/tbfadt-615)
    ...
    Jan 22 01:38:08 hostname kernel: ata2.00: exception Emask 0x0 SAct 0x30000 SErr 0x0 action 0x6
    Jan 22 01:38:08 hostname kernel: ata2.00: irq_stat 0x40000008
    Jan 22 01:38:08 hostname kernel: ata2.00: failed command: READ FPDMA QUEUED
    Jan 22 01:38:08 hostname kernel: ata2.00: cmd 60/78:80:88:00:00/00:00:00:00:00/40 tag 16 ncq dma 61440 in
                                             res 43/84:78:f0:00:00/00:00:00:00:00/00 Emask 0x410 (ATA bus error) <F>
    Jan 22 01:38:08 hostname kernel: ata2.00: status: { DRDY SENSE ERR }
    Jan 22 01:38:08 hostname kernel: ata2.00: error: { ICRC ABRT }
    Jan 22 01:38:08 hostname kernel: ata2: hard resetting link
    Jan 22 01:38:08 hostname kernel: ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
    Jan 22 01:38:08 hostname kernel: ata2.00: NCQ Send/Recv Log not supported
    Jan 22 01:38:08 hostname kernel: ata2.00: NCQ Send/Recv Log not supported
    Jan 22 01:38:08 hostname kernel: ata2.00: configured for UDMA/133
    Jan 22 01:38:08 hostname kernel: sd 1:0:0:0: [sdb] tag#16 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=1s
    Jan 22 01:38:08 hostname kernel: sd 1:0:0:0: [sdb] tag#16 Sense Key : Aborted Command [current] 
    Jan 22 01:38:08 hostname kernel: sd 1:0:0:0: [sdb] tag#16 Add. Sense: Scsi parity error
    Jan 22 01:38:08 hostname kernel: sd 1:0:0:0: [sdb] tag#16 CDB: Read(10) 28 00 00 00 00 88 00 00 78 00
    Jan 22 01:38:08 hostname kernel: blk_update_request: I/O error, dev sdb, sector 136 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0

我怀疑电源线是罪魁祸首,因为我记得之前遇到过启动问题,我可以通过重新连接电源线来解决。这里有没有人可以帮助我更好地理解这些日志。

我对涉及 BIOS 的条目特别好奇。在不了解情况的情况下,你能从中提取什么信息?

答案1

您拥有现代多核处理器并且您的发行版使用systemd.因此,在启动时,许多事情会并行发生,有时没有固定的顺序。如果某些日志消息使用不同的路由(本机systemd日志记录与内核的审核子系统与 syslog 系统调用),则它们可能会稍微出现顺序混乱。

我将按基本顺序浏览这些消息,但将一些类似的消息分组在一起。

Jan 22 01:42:53 hostname kernel: Linux version 5.10.9-arch1-1 (linux@archlinux) (gcc (GCC) 10.2.0, GNU ld (GNU Binutils) 2.35.1) #1 SMP PREEMPT Tue, 19 Jan 2021 22:06:06 +0000
Jan 22 01:42:53 hostname kernel: Command line: BOOT_IMAGE=/vmlinuz-linux root=UUID=d825234f-4397-494f-9c61-e719a008ecbd rw loglevel=3 quiet

这些通常是 Linux 内核在开始执行后输出的第一行。此时,系统时钟仅使用系统固件初始化的时间值,该时间值通常源自电池供电的时钟芯片。请注意loglevel=3 quiet内核选项:这些选项将使许多早期启动消息静音,因此在此之后和下一个消息之前可能会发生很多事情。

Jan 22 01:38:09 hostname kernel: input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:02.1/0000:01:00.1/sound/card1/input15
Jan 22 01:38:09 hostname kernel: input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:02.1/0000:01:00.1/sound/card1/input16
Jan 22 01:38:09 hostname kernel: input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:02.1/0000:01:00.1/sound/card1/input17
Jan 22 01:38:09 hostname kernel: input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:02.1/0000:01:00.1/sound/card1/input18
Jan 22 01:38:09 hostname kernel: input: HDA NVidia HDMI/DP,pcm=10 as /devices/pci0000:00/0000:00:02.1/0000:01:00.1/sound/card1/input19
Jan 22 01:38:09 hostname kernel: input: HDA NVidia HDMI/DP,pcm=11 as /devices/pci0000:00/0000:00:02.1/0000:01:00.1/sound/card1/input20
Jan 22 01:38:09 hostname kernel: input: HDA NVidia HDMI/DP,pcm=12 as /devices/pci0000:00/0000:00:02.1/0000:01:00.1/sound/card1/input21
Jan 22 01:38:09 hostname kernel: input: HD-Audio Generic Front Mic as /devices/pci0000:00/0000:00:14.2/sound/card0/input22
Jan 22 01:38:09 hostname kernel: input: HD-Audio Generic Rear Mic as /devices/pci0000:00/0000:00:14.2/sound/card0/input23
Jan 22 01:38:09 hostname kernel: input: HD-Audio Generic Line as /devices/pci0000:00/0000:00:14.2/sound/card0/input24
Jan 22 01:38:09 hostname kernel: input: HD-Audio Generic Line Out as /devices/pci0000:00/0000:00:14.2/sound/card0/input25
Jan 22 01:38:09 hostname kernel: input: HD-Audio Generic Front Headphone as /devices/pci0000:00/0000:00:14.2/sound/card0/input26

这些消息来自输入子系统,将各种音频连接器上的插头检测电路注册为输入。这意味着此时实际的声音驱动程序已经加载。声音芯片通常不被认为是“必要的硬件”,因此它们的驱动程序通常不会内置到内核中,甚至并不总是包含在 initramfs 中,因此此时我们可能已经在引导过程中走了很远了。

引导过程可能已经激活了网络接口并从 NTP 服务器获取了更准确的时间信息,这可能解释了明显的时间向后跳跃。或者,如果没有可用的电池支持时钟并且尚无法访问 NTP 服务器,则引导过程可能会将上次磁盘写入操作的时间作为当前时间的最佳猜测。

Jan 22 01:38:09 hostname audit[425]: NETFILTER_CFG table=filter family=10 entries=144 op=xt_replace pid=425 comm="ip6tables-resto"
Jan 22 01:38:09 hostname audit[426]: NETFILTER_CFG table=filter family=10 entries=160 op=xt_replace pid=426 comm="ip6tables-resto"
Jan 22 01:38:09 hostname audit[435]: NETFILTER_CFG table=filter family=10 entries=168 op=xt_replace pid=435 comm="ip6tables-resto"

某些因素(可能ip6tables-restore)导致添加 IPv6 netfilter 规则。

Jan 22 01:38:09 hostname kernel: EDAC amd64: F15h detected (node 0).
Jan 22 01:38:09 hostname kernel: EDAC amd64: Error: F1 not found: device 0x1601 (broken BIOS?)

这些消息来自 EDAC(错误检测和纠正)子系统。它在仅具有 ECC 纠错内存的系统(即服务器和可能的高级工作站)上最有用。您的处理器似乎具有与 ECC 内存配合使用的必要功能,但您的系统固件显然没有 EDAC 子系统将内存/总线错误映射到物理组件(例如内存模块插槽)所需的信息。也许您的主板不支持使用 ECC 内存?如果这是真的,那么 EDAC 子系统可能对您没有用处,您可以将 EDAC 模块列入黑名单以完全跳过加载它们。

Jan 22 01:38:09 hostname systemd-udevd[258]: controlC0: Process '/usr/bin/alsactl restore 0' failed with exit code 99.
Jan 22 01:38:10 hostname systemd-udevd[267]: controlC1: Process '/usr/bin/alsactl restore 1' failed with exit code 99.

由于某种原因,主板集成声音芯片和 GPU 的 HDMI/DP 连接器恢复声卡音量设置的尝试均失败。也许是因为alsactl store从未用于持久存储当前设置?仅当您使用原始 ALSA 和/或文本模式时,这可能才重要:大多数 GUI 桌面环境和/或 Pulseaudio 通常会通过用户特定的设置覆盖 ALSA 音量设置。

Jan 22 01:38:09 hostname systemd[1]: Condition check resulted in First Boot Wizard being skipped.
Jan 22 01:38:09 hostname systemd[1]: Condition check resulted in File System Check on Root Device being skipped.
Jan 22 01:38:09 hostname systemd[1]: Condition check resulted in Rebuild Hardware Database being skipped.
Jan 22 01:38:09 hostname systemd[1]: Condition check resulted in Repartition Root Disk being skipped.
Jan 22 01:38:09 hostname systemd[1]: Condition check resulted in Create System Users being skipped.

systemd正在跳过一些有条件的引导项,因为不满足必要的条件。

Jan 22 01:38:10 hostname kernel: nvidia: module license 'NVIDIA' taints kernel.
Jan 22 01:38:10 hostname kernel: Disabling lock debugging due to kernel taint
Jan 22 01:38:10 hostname kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 237
Jan 22 01:38:10 hostname kernel: 
Jan 22 01:38:10 hostname kernel: nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
Jan 22 01:38:10 hostname kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module  460.32.03  Sun Dec 27 19:00:34 UTC 2020

正在加载专有的 NVIDIA GPU 驱动程序。

Jan 22 01:38:10 hostname audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-random-seed comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'

这看起来像是根文件系统已挂载,并且审计子系统确认了“真正的”systemd 启动。在此之前,引导过程由 initramfs 中的“mini-systemd”处理。这也许可以解释为什么有些消息似乎是重复的:initramfs mini-systemd 进行了一次这些检查,现在真正的 systemd 再次进行了这些检查。

Jan 22 01:38:10 hostname kernel: random: crng init done
Jan 22 01:38:10 hostname kernel: random: 7 urandom warning(s) missed due to ratelimiting
Jan 22 01:38:10 hostname systemd[1]: Finished Load/Save Random Seed.

systemd已恢复上次启动或有序关闭时的随机种子。

Jan 22 01:38:10 hostname kernel: nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  460.32.03  Sun Dec 27 18:51:11 UTC 2020
Jan 22 01:38:10 hostname kernel: [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
Jan 22 01:38:10 hostname kernel: [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 0

专有 NVIDIA GPU 驱动程序的更多组件正在启动。

Jan 22 01:38:10 hostname systemd[1]: Created slice system-systemd\x2dbacklight.slice.
Jan 22 01:38:10 hostname systemd[1]: Starting Load/Save Screen Backlight Brightness of backlight:acpi_video0...
Jan 22 01:38:10 hostname systemd[1]: Finished Load/Save Screen Backlight Brightness of backlight:acpi_video0.
Jan 22 01:38:10 hostname audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-backlight@backlight:acpi_video0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? termi>

固件 ACPI 表包括背光控制接口,并且systemd正在启动一项服务,该服务会记住从一次启动到下一次启动的当前背光设置。

Jan 22 01:38:18 hostname kernel: ata2: softreset failed (1st FIS failed)
Jan 22 01:38:28 hostname kernel: ata2: softreset failed (1st FIS failed)

这表明第二个 SATA 设备存在问题。您的第二个片段似乎有更多关于此的信息。

从第二个片段来看:

Jan 22 01:38:08 hostname kernel: ACPI BIOS Warning (bug): Optional FADT field Pm2ControlBlock has valid Length but zero Address: 0x0000000000000000/0x1 (20200925/tbfadt-615)

您的 ACPI 系统固件的电源管理信息不完整。 BIOS 更新可能会解决此问题,但由于这只是一个警告,因此内核可以解决该问题。

Jan 22 01:38:08 hostname kernel: ata2.00: exception Emask 0x0 SAct 0x30000 SErr 0x0 action 0x6
Jan 22 01:38:08 hostname kernel: ata2.00: irq_stat 0x40000008
Jan 22 01:38:08 hostname kernel: ata2.00: failed command: READ FPDMA QUEUED
Jan 22 01:38:08 hostname kernel: ata2.00: cmd 60/78:80:88:00:00/00:00:00:00:00/40 tag 16 ncq dma 61440 in
                                         res 43/84:78:f0:00:00/00:00:00:00:00/00 Emask 0x410 (ATA bus error) <F>
Jan 22 01:38:08 hostname kernel: ata2.00: status: { DRDY SENSE ERR }
Jan 22 01:38:08 hostname kernel: ata2.00: error: { ICRC ABRT }
Jan 22 01:38:08 hostname kernel: ata2: hard resetting link
Jan 22 01:38:08 hostname kernel: ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Jan 22 01:38:08 hostname kernel: ata2.00: NCQ Send/Recv Log not supported
Jan 22 01:38:08 hostname kernel: ata2.00: NCQ Send/Recv Log not supported
Jan 22 01:38:08 hostname kernel: ata2.00: configured for UDMA/133
Jan 22 01:38:08 hostname kernel: sd 1:0:0:0: [sdb] tag#16 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=1s
Jan 22 01:38:08 hostname kernel: sd 1:0:0:0: [sdb] tag#16 Sense Key : Aborted Command [current] 
Jan 22 01:38:08 hostname kernel: sd 1:0:0:0: [sdb] tag#16 Add. Sense: Scsi parity error
Jan 22 01:38:08 hostname kernel: sd 1:0:0:0: [sdb] tag#16 CDB: Read(10) 28 00 00 00 00 88 00 00 78 00
Jan 22 01:38:08 hostname kernel: blk_update_request: I/O error, dev sdb, sector 136 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0

这表明您的磁盘存在数据传输错误/dev/sdb。这可能是由 SATA 电缆或连接器损坏引起的,因此您可能首先尝试将磁盘移至其他 SATA 端口或先更换电缆。如果这些方法没有帮助,则可能需要更换磁盘。如果故障似乎反复出现,请尽快对该磁盘进行额外备份。

相关内容