Linux 启动问题

Linux 启动问题

我有一个戴尔 OptiPlex 7040NVME M.2启动音量。快点,当它起作用时。我最近重新启动了,它没有出现。我更新了BIOS,因为日志告诉我这很糟糕:

# BAD BIOS from `Journalctl -xb`
Jul 06 18:30:24 server_f.project33.ca kernel: MAR: [Firnuare Bug]: No firnuare reserved region can cover this RMRR [0x00000000dd800000-0x00000000dfffffffl, contact By
Jul 06 18:30:24 server_f.project33.ca kernel: DMAR: [Firmware Bug]: Your BIOS is broken: bad RuRR (0x0000000OdaB00000-0x0000000OdfffffffI
                                        BIOS vendor: Dell Inc.: Ver: 1.4.9: Product Version:
Jul 06 18:30:24 server_f.project33.ca kernel: DMAR-IR: IOAPIC id 2 under DRHD base Oxfed91000 IOMMU 1
Jul 06 18:30:24 server_f.project33.ca kernel: DMAR-IR: HET id 0 under DRHD base Oxfed91000
Jul 06 18:30:24 server_f.project33.ca kernel: DMAR-IR: Queued invalidation will be enabled to support »Zapic and Intr-remapping.
Jul 06 18:30:24 server_f.project33.ca kernel: DMAR-IR: Enabled IRQ remapping in xZapic mode
Jul 06 18:30:24 server_f.project33.ca kernel: x86/cpu: SGX disabled bu BIOS.
Jul 06 18:30:25 server_f.project33.ca kernel: sd 0:0:0:0: Lsdb] Mo Caching mode page found
Jul 06 18:30:25 server_f.project33.ca kernel: sd 0:0:0:0: Lsdb] Assuming drive cache: write through
Jul 06 18:30:26 server_f.project33.ca systemdl11: Failed to mount /boot.
Jul 06 18:30:26 server_f.project33.ca systemd1]: Failed to start Crash recovery kernel arming.
Jul 06 18:30:26 server_f. project33.ca kernel: device-mapper: core: Cannot calculate initial queue limits
Jul 06 18:30:26 server_f.project33.ca systemdl1l: Failed to start LUM event activation on device 8:2

新固件,又出现同样的错误。与以下内容有关LUM event activation on device 8:2

## New BIOS OptiPlex_7040_1.23.0.exe
## Startup:

...
DSI mode with an ungated DDI clock, gate it
[ 2.8781271 1915 0000:00:02.0: [drm] [ENCODER:124:DDI E/PHY E] is disabled/in
DSI mode with an ungated DDI clock, gate it
[FAILED] Failed to start LUM event activation on device 8:2.
See "systemctl status lum2-puscan08:2.service" for details.
[ 2.879550] 1915 0000:00:02.0: [drml Finished loading MC firmware 1915/skl_d
mc_ver1_27 .bin (v1.27)
[ 2.885725] [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.8 on minor B
You are in emergency mode.  After logging in, type "journalctl -xb" to view
system logs, "systemctl reboot" to reboot, "exit"
to boot into default mode.
[ 2.887900] ACPI: Video Device (GFX0] (multi-head: yes rom: no post: no)
[ 2.8882081 input: Video Bus as /devices/LNXSYSTM:00/LNXSYBUS:00/PMPDABB:BB/L
NXVIDED:00/input/input16
[ 2.8883351 snd hda_intel aBB0:00:1f .3: bound 0000:00:02.0 (ops i915_audio_co
mponent _bind_ops [i9151)
Give root password for maintenance
(or press Control-D to continue) :
2.899201] intel_rapl_common: Found RAPL oma in vackage
2.8992191 intel_rapl_common: Found RAPL domain core
2.8992211 intel rapl _common: Found RAPL domain uncore
2.899222] intel_rapl_common: Found RAPL domain dram
...
^d
[root@server_f ~]# systemctl status lum2-puscan@@:2.service
Unit lum2-puscan@8:2.service could not be found.
[root@server_f ~]#  

我不知道如何解决这个问题。我不记得对用户或任何重要的内容进行过任何更改,只是scripts为了端口嗅探。

我该如何解决这个问题?

OS是全新的并且运行良好。看到一些反馈超出了 BIOS 部分并加载了操作系统,我认为这更多的是操作系统的事情,即AlmaLinux 8.6.干杯

更新:这些是更新后的结果BIOS。在启动后输出中突出显示为红色POST

Jul 07 16:11:54 server_f.local kernel: x86/epu: SGX disabled by BIOS.
Jul 07 16:11:56 server_f.project33.ca systemd1l: Failed to mount /boot.
Jul 07 16:11:56 server_f.project33.ca systemdl1l: Failed to start Crash recovery kernel arming.
Jul 07 16:11:56 server_f project33.ca kernel: device-mapper: core: Cannot calculate initial queue limits
Jul 07 16:11:56 server_f .project33.ca systemdl1l: Failed to start LUM event activation on device B:2.
Jul 07 16:12:06 server_f.project33.ca systemdI11: Failed to mount /boot.
Jul 07 16:12:06 server_f.project33.ca systemdl1l: Failed to start Crash recovery kernel arming.

启动消息:

FAILED] Failed to start LVM event activation on device 8:2.
See "systemctl status lvm2-puscan@B:2.service' for details.
[ OK  ] Mounted /home

[rootlserver_f ~]# systemctl status lvm2-puscan@8:2.service
• lvm2-puscan8:2. service - LUM event activation on device 8:2
Loaded: loaded (/usr/lib/systemd/system/[email protected]; static: vendor preset: disabled)
Active: Palled (Result: exit-code) since Thu 2022-07-07 16:27:46 EDT: 57s ago
Docs: man: puscan(8)
Main PID: 889 (code=exited, status=5)

Jul 07 16:27:46 server_f.project33.ca lvm[889]:  puscan8891 UG rl not using quick activation.
Jul 07 16:27:46 server_f.project33.ca lvm[8891:  WARNING: Device /dev/sda2 has size of 486297600 sectors which is smaller than corresponding PU size of 998115328 sectors. Was device resized?
Jul 07 16:27:46 server_f.project33.ca lvm[8891:  WARMING: One or more devices used as PUs in UG rl have changed sizes.
Jul 07 16:27:46 server_f.project33.ca lvm[8891:  device-mapper: reload loctl on (253:4) failed: Invalid argument  
Jul 07 16:27:46 server_f.project33.ca lvm[8891:  device-mapper: reload ioctl on (253:4) failed: Invalid argument
Jul 07 16:27:46 server_f.project33.ca lvm[8891:  1 logical volume(s) in volume group "rl" now active
Jul 07 16:27:46 server_f.project33.ca lvm[889]:  puscan[8891 rl: autoactivation failed.
Jul 07 16:27:46 server_f .project33.ca systemd1]:  lumZ-puscan08:2.service: Main process exited, code=exited, status-5/NOTINSTALLED
Jul 07 16:27:46 server_f .project33.ca systemd[11: lum2-puscan@8:2.service: Failed with result *exit-cade"
Jul 07 16:27:46 server_f .project33.ca systemdl11: Failed to start LUM event activation on device 8:2.

[root&server_f ~] lsblk
NAME                      MAJ:MIN  RM    SIZE       RO  TYPE  MOUNTPOINT
sda                         8:0     0     447G      0   disk
|-sda1                      8:1     0       1G      0   part
|-sda2                      8:2     0     231.9G    0   part
  |-r1-swap               253:3     0      15.7G    0   lvm
nvmeOn1                   251:0     0     232.9G    0   disk
|-nume0n1p1               259:1     0       1G      0   part
|-nume0n102               259:2     0     231.9G    0   part
  |-almalinux server_f-root.   253:0     0       70G     0   lvm   /
  |-almalinux server_f-swap   253:1     0      15.7G    0   lvm   [SWAP]
  |-almalinux server_f-home   253:2     0     146.2G    0   lvm   /home

看起来sda2有问题。所以我把它拿出来并在另一个盒子上格式化,同样的问题:Failed to mount /boot. 将驱动器从盒子里拿出来,所以只有nvme启动卷在那里,同样的问题。/boot由于某种原因它没有安装。

[rootlfuf ~]# dmesg | grep -i mount
[0.019538] Mount-cache hash table entries: 65536 (order: 7, 524288 bytes, vmalloc)
[0.019726] Moumtpoint-cache hash table entries: 65536 (order: 7, 524288 bytes, umalloc)
[1.825407] XPS (dm-@): Mounting V5 Filesystem
[1.834948] XFS (dm-8): Ending clean mount
[2.322309] XTS (numeßn1p1): Mounting V5 Filesystem
[2.343868] XFS (numeßn1p1): Corruption warning: Metadata has LSN (1:3869) ahead of current LSM (1:3835). Please unmount and run x's repair (>= V4.3) to resolve.
[2.344044] XFS (numeßn1p1): log mount/recovery failed: error -22
[2.344227] XTS (numeln1p1) : log mount failed
[2.679073] XFS (dm-2) : Mounting VS Filesystem
[2.698961] XPS (dm-2): Ending clean mount
[6.938996] XFS (nume0n1p1): Mounting V5 Filesystem
[6.960104] XFS (nume0n1p1): Corruption warning: Metadata has LSN (1:3869) ahead of current LSN (1:3835). Please unmount and run x's repair (>= V4.3) to resove.
[6.960142] XFS (nume0n1p1) :log mount/recovery failed: error -22
[6.960346] XTS (nume@n1p1): log mount failed

按照周五早上的情况,可启动,安装了新的存储SSD

/dev/mapper/almalinux_server_f-root /                       xfs     defaults        0 1
# UUID=83cfc468-ecce-4188-aef4-e53cea90655a /boot                   xfs     defaults        0 0
/dev/mapper/almalinux_server_f-home /home                   xfs     defaults        0 0
/dev/mapper/almalinux_server_f-swap none                    swap    defaults        0 0

# A backup drive since added
UUID=f6db13da-ef71-4252-aab4-4f51f90ce6f7   /mnt/backups    ext4    defaults    0   2

/boot未安装:

 lsblk
NAME                   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                      8:0    0   477G  0 disk
└─sda1                   8:1    0   477G  0 part /mnt/backups
nvme0n1                259:0    0 232.9G  0 disk
├─nvme0n1p1            259:1    0     1G  0 part
└─nvme0n1p2            259:2    0 231.9G  0 part
  ├─almalinux_fuf-root 253:0    0    70G  0 lvm  /
  ├─almalinux_fuf-swap 253:1    0  15.7G  0 lvm  [SWAP]
  └─almalinux_fuf-home 253:2    0 146.2G  0 lvm  /home

看来是log坏了,所以需要修改修复:

[2022_Jul_8 06:25:18 rich@fuf ~] sudo xfs_repair -L /dev/nvme0n1p1
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
destroyed because the -L option was used.
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
Maximum metadata LSN (1:3869) is ahead of log (1:2).
Format log to cycle 4.
done
[2022_Jul_8 06:25:27 rich@fuf ~] sudo mount -a
[2022_Jul_8 06:25:43 rich@fuf ~] lsblk
NAME                   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                      8:0    0   477G  0 disk
└─sda1                   8:1    0   477G  0 part /mnt/backups
nvme0n1                259:0    0 232.9G  0 disk
├─nvme0n1p1            259:1    0     1G  0 part /boot
└─nvme0n1p2            259:2    0 231.9G  0 part
  ├─almalinux_fuf-root 253:0    0    70G  0 lvm  /
  ├─almalinux_fuf-swap 253:1    0  15.7G  0 lvm  [SWAP]
  └─almalinux_fuf-home 253:2    0 146.2G  0 lvm  /home

/dev/nvme0n1: PTUUID="df549f07" PTTYPE="dos"
/dev/nvme0n1p1: UUID="83cfc468-ecce-4188-aef4-e53cea90655a" BLOCK_SIZE="512" TYPE="xfs" PARTUUID="df549f07-01"
/dev/nvme0n1p2: UUID="i75kzm-ywmo-kblc-qVub-OAXc-oKPN-hmMtne" TYPE="LVM2_member" PARTUUID="df549f07-02"
/dev/sda1: UUID="f6db13da-ef71-4252-aab4-4f51f90ce6f7" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="cb238e81-b4d2-ba41-8a4c-19f01ea2cfd5"
/dev/mapper/almalinux_fuf-root: UUID="d0aab1dc-6d0a-4a36-b6ff-65853f73490f" BLOCK_SIZE="512" TYPE="xfs"
/dev/mapper/almalinux_fuf-swap: UUID="fb98b19f-a542-416b-8708-a397f2e5ca3b" TYPE="swap"
/dev/mapper/almalinux_fuf-home: UUID="f74e1ac0-2e82-430a-a588-169d4f487cf5" BLOCK_SIZE="512" TYPE="xfs"

答案1

这些[Firmware Bug]消息通常更像是内核开发人员写给系统固件开发人员的公开信:“请遵循规范,这样我们就不必继续发明肮脏的解决方法。”除非问题在 BIOS 更新后立即出现,否则这些不太可能是问题的根本原因。

然而,这可能是:

Jul 06 18:30:26 server_f.project33.ca systemdl11: Failed to mount /boot.

您的文件系统似乎有问题/boot,这会破坏正常的系统启动过程。由于/boot仅引导加载程序和内核更新需要它,因此您可以暂时注释掉/boot/etc/fstab查看是否可以使系统达到类似于正常状态的状态 - 如果成功,则将使问题的故障排除/boot更加容易。

您的 NVMe 系统磁盘可能有些损坏,或者可能开始死亡。不幸的是,当 SSD 设备开始出现故障时,发生的情况并不像传统 HDD 那样可预测:当 SSD 出现故障时,它们有时会完全消失,没有任何真正的警告信号。

事实上,您的 NVMe SSD 仍然可以正常工作,这一事实似乎令人鼓舞,但站在您的立场上,我会真的担心任何未备份到其他媒体的数据。如果系统包含任何非常重要的内容,我建议找到一种方法将该 NVMe SSD 作为第二个磁盘插入其他计算机,并备份您仍然可以访问的所有内容尽快地在用它做任何其他事情之前。

要评估 NVMe SSD 的运行状况,您可以尝试smartctl -x /dev/nvme0以 root 身份运行,并查看标题后显示的内容=== START OF SMART DATA SECTION ===

如果该命令不可用,nvme smart-log /dev/nvme0则应提供基本相同的数据,并且nvme error-log /dev/nvme0可能提供有关最近检测到的许多错误(如果有)的更多详细信息。


所以,通过暂时注释掉/boot似乎/etc/fstab已经让系统出现了。 (看到你/etc/fstab现在存在的样子在这里会很有帮助:它将最大限度地减少猜测的需要。)

该消息Failed to start LVM event activation on device 8:2涉及主设备号为 8、次设备号为 2 的块设备/dev/sda2。这似乎有一个单独的问题,并且由于您已经从系统中删除了该磁盘,因此该消息不应再出现。

请注意错误消息: XFS (nvme0n1p1): Corruption warning: Metadata has LSN (1:3869) ahead of current LSN (1:3835). Please unmount and run xfs_repair (>= V4.3) to resolve.

这似乎表明根据/etc/fstab你的/bootis/was 直接在分区上/dev/nvme0n1p1不是作为 LVM 逻辑卷。

假设nvme0n1p1您的 是/boot,现在您已经运行并卸载了系统/boot,您可以执行此消息建议的操作,首先运行xfs_repair -V以验证修复工具版本是否为 4.3.0 或更高版本,如果是,请运行xfs_repair /dev/nvme0n1p1

它可能会告诉你:

ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. 
Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. 
If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption — please attempt a mount of the filesystem before doing this.

如果是这样,请尝试安装/boot,以防万一xfs_repair在打印该消息之前确实设法修复了某些问题。如果您无法挂载文件系统(因为之前的尝试肯定没有成功),请按照消息所述操作并运行xfs_repair -L /dev/nvme0n1p1.

输出中blkidTYPE="LVM2_member指的是LVM物理卷(简称PV)。它不是一个文件系统,因此无法安装,但它可以是一个或多个文件系统或其一部分的容器。您的/dev/nvme0n1p2分区应该显示为TYPE="LVM2_member"包含根文件系统、交换区域和/home文件系统,但显然不是/boot

文件系统 UUID 本身不会更改:要更改它们,您需要使用mkfs(实际上丢失其中的所有现有数据)重新格式化分区或 LVM 逻辑卷,或使用特定于文件系统的工具将新的 UUID 分配给文件系统。因此, /bootin的 UUID/etc/fstab不需要更改,除非它/etc/fstab本身已损坏,或者您之前对/boot文件系统做了一些您没有告诉我们的操作。


如果 /dev/nvme0n1p1也出现了TYPE="LVM2_member",那么这意味着您已经/boot用命令覆盖了您的文件系统pvcreate /dev/nvme0n1p1。如果属实,那肯定可以解释腐败现象。

无论如何,如果上述xfs_repair过程无法修复文件系统以便安装它,最后的手段就是/boot从头开始重建文件系统。

此过程显然会使系统在成功完成之前无法启动,因此请勿在中间重新启动。首先使用(不要打错字,这是破坏性的!)重新格式化损坏的/boot文件系统,然后使用查看其新的 UUID,重新启用该条目并将其 UUID 更改为新的,然后.mkfs.xfs /dev/nvme0n1p1lsblk -o +UUID /dev/nvme0n1p1/boot/etc/fstabmount /boot

之后,使用您的包管理器重新安装任何当前安装的内核软件包:软件包管理工具应该有一个特定的选项,有效地告诉它“是的,您的数据库说这个软件包已经安装了,但无论如何将其文件重写回原位,以替换任何丢失的文件并覆盖可能损坏的文件那些”。

完成后,使用例如重新安装引导加载程序grub2-install /dev/nvme0n1。验证您的/boot/grub2/grub.cfg存在并包含您安装的内核版本;如有必要,请运行grub2-mkconfig > /boot/grub2/grub.cfg以重建配置。此时,您的系统应该可以再次启动。

相关内容