编辑1

编辑1

昨天,我在工作中的 Raid5 阵列中添加了一个新的 3TB 驱动器,并让其在一夜之间进行重建。今天我在日记中发现以下错误:

Jan 16 07:49:42 iHugo kernel: INFO: task md0_resync:854 blocked for more than 120 seconds.
Jan 16 07:49:42 iHugo kernel: task:md0_resync      state:D stack:    0 pid:  854 ppid:     2 flags:0x00004000
Jan 16 07:49:42 iHugo kernel: INFO: task jbd2/md0p1-8:1006 blocked for more than 120 seconds.
Jan 16 07:49:42 iHugo kernel: task:jbd2/md0p1-8    state:D stack:    0 pid: 1006 ppid:     2 flags:0x00004000
Jan 16 07:51:43 iHugo kernel: INFO: task md0_resync:854 blocked for more than 241 seconds.
Jan 16 07:51:43 iHugo kernel: task:md0_resync      state:D stack:    0 pid:  854 ppid:     2 flags:0x00004000
Jan 16 07:51:43 iHugo kernel: INFO: task jbd2/md0p1-8:1006 blocked for more than 241 seconds.
Jan 16 07:51:43 iHugo kernel: task:jbd2/md0p1-8    state:D stack:    0 pid: 1006 ppid:     2 flags:0x00004000
Jan 16 07:53:44 iHugo kernel: INFO: task md0_resync:854 blocked for more than 362 seconds.
Jan 16 07:53:44 iHugo kernel: task:md0_resync      state:D stack:    0 pid:  854 ppid:     2 flags:0x00004000
Jan 16 07:53:44 iHugo kernel: INFO: task jbd2/md0p1-8:1006 blocked for more than 362 seconds.
Jan 16 07:53:44 iHugo kernel: task:jbd2/md0p1-8    state:D stack:    0 pid: 1006 ppid:     2 flags:0x00004000
Jan 16 07:55:45 iHugo kernel: INFO: task md0_resync:854 blocked for more than 483 seconds.
Jan 16 07:55:45 iHugo kernel: task:md0_resync      state:D stack:    0 pid:  854 ppid:     2 flags:0x00004000
Jan 16 07:55:45 iHugo kernel: INFO: task jbd2/md0p1-8:1006 blocked for more than 483 seconds.
Jan 16 07:55:45 iHugo kernel: task:jbd2/md0p1-8    state:D stack:    0 pid: 1006 ppid:     2 flags:0x00004000
Jan 16 07:57:45 iHugo kernel: INFO: task md0_resync:854 blocked for more than 604 seconds.
Jan 16 07:57:45 iHugo kernel: task:md0_resync      state:D stack:    0 pid:  854 ppid:     2 flags:0x00004000
Jan 16 07:57:45 iHugo kernel: INFO: task jbd2/md0p1-8:1006 blocked for more than 604 seconds.
Jan 16 07:57:45 iHugo kernel: task:jbd2/md0p1-8    state:D stack:    0 pid: 1006 ppid:     2 flags:0x00004000

然后我尝试重新启动服务器。重新启动后,Raid 未开始:

Jan 16 09:17:26 iHugo blkdeactivate[82348]:   [MD]: deactivating part device md0p1...
Jan 16 09:17:26 iHugo blkdeactivate[82359]: cat: /sys/block/md0p1/md/sync_action: No such file or directory
Jan 16 09:35:58 iHugo kernel: md/raid:md0: not clean -- starting background reconstruction
Jan 16 09:35:58 iHugo kernel: md/raid:md0: device sdd operational as raid disk 1
Jan 16 09:35:58 iHugo kernel: md/raid:md0: device sdf1 operational as raid disk 3
Jan 16 09:35:58 iHugo kernel: md/raid:md0: device sdb operational as raid disk 0
Jan 16 09:35:58 iHugo kernel: md/raid:md0: force stripe size 512 for reshape
Jan 16 09:35:58 iHugo kernel: md/raid:md0: cannot start dirty degraded array.
Jan 16 09:35:58 iHugo kernel: md/raid:md0: failed to run raid set.

以下是一些细节:

统计数据:

root@iHugo:~# cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]
md0 : inactive sdf1[4] sdd[1] sde[3] sdb[0]
8790276327 blocks super 1.2

unused devices: <none>

妈妈:

root@iHugo:~# mdadm -D /dev/md0
mdadm: Unknown keyword INACTIVE-ARRAY /dev/md0:
           Version : 1.2
     Creation Time : Thu Jan 13 18:57:19 2022
        Raid Level : raid5
     Used Dev Size : 1953378304 (1862.89 GiB 2000.26 GB)
      Raid Devices : 4
     Total Devices : 4
       Persistence : Superblock is persistent

       Update Time : Sun Jan 16 07:47:20 2022
             State : active, FAILED, Not Started
    Active Devices : 3    Working Devices : 4
    Failed Devices : 0
     Spare Devices : 1

            Layout : left-symmetric
        Chunk Size : 512K

Consistency Policy : unknown

     Delta Devices : 1, (3->4)

              Name : iHugo:0  (local to host iHugo)
              UUID : dc5e662f:4f32bd91:95ee7139:7ef94601
            Events : 59708

    Number   Major   Minor   RaidDevice State
       -       0        0        0      removed
       -       0        0        1      removed
       -       0        0        2      removed
       -       0        0        3      removed

       -       8       64        2      spare rebuilding   /dev/sde
       -       8       48        1      sync   /dev/sdd
       -       8       16        0      sync   /dev/sdb
       -       8       81        3      sync   /dev/sdf1

fdisk -l

root@iHugo:~# fdisk -l
Disk /dev/sdb: 1.82 TiB, 2000398934016 bytes, 3907029168 sectors
Disk model: WDC WD20EFRX-68E
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/sda: 232.89 GiB, 250059350016 bytes, 488397168 sectors
Disk model: Samsung SSD 840
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 1B771FD2-DAB6-428D-88A6-0A1F1D35671E

Device         Start       End   Sectors   Size Type
/dev/sda1       2048   1050623   1048576   512M EFI System
/dev/sda2    1050624 486395903 485345280 231.4G Linux filesystem
/dev/sda3  486395904 488396799   2000896   977M Linux swap


Disk /dev/sdf: 2.73 TiB, 3000592982016 bytes, 5860533168 sectors
Disk model: ST3000DM001-1CH1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: DCF1340A-3DA8-42EE-B7B1-7F439E571148

Device     Start        End    Sectors  Size Type
/dev/sdf1   2048 5860532223 5860530176  2.7T Linux filesystem


Disk /dev/sdc: 931.51 GiB, 1000204886016 bytes, 1953525168 sectors
Disk model: SAMSUNG HD103UJ
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: EFBD098F-4397-4EC9-B242-1712149A75C9

Device     Start        End    Sectors   Size Type
/dev/sdc1   2048 1953525134 1953523087 931.5G Linux filesystem


Disk /dev/sde: 1.82 TiB, 2000394706432 bytes, 3907020911 sectors
Disk model: WDC WD20EARS-00J
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/sdd: 1.82 TiB, 2000398934016 bytes, 3907029168 sectors
Disk model: WDC WD20EARS-00M
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

黑子

root@iHugo:~# blkid
/dev/sdb: UUID="dc5e662f-4f32-bd91-95ee-71397ef94601" UUID_SUB="2a63392a-2f36-5e40-509a-8a968c132b66" LABEL="iHugo:0" TYPE="linux_raid_member"
/dev/sda1: UUID="CC0A-CBAA" BLOCK_SIZE="512" TYPE="vfat" PARTUUID="a8c63d30-5ddd-4a1f-b9d5-faed36434457"
/dev/sda2: UUID="7fe4974d-d6ae-4c09-a8b7-8cb46ab978b8" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="39f49ee3-8061-4da8-9802-a77beaec158a"
/dev/sda3: UUID="e668eb26-e954-48d9-9fba-f92f6437c49f" TYPE="swap" PARTUUID="95f3627d-1e7f-4fa8-a39b-d91b2b2b7012"
/dev/sdf1: UUID="dc5e662f-4f32-bd91-95ee-71397ef94601" UUID_SUB="336f8167-c619-4553-8ab2-0b4516106ae1" LABEL="iHugo:0" TYPE="linux_raid_member" PARTLABEL="primary" PARTUUID="26c1353a-0796-48ba-92db-d4fdae4f7f98"
/dev/sdc1: LABEL="Leer" BLOCK_SIZE="512" UUID="01D437C7F0FED880" TYPE="ntfs" PARTUUID="975d1223-be02-471d-a291-e3433048e0ee"
/dev/sde: UUID="dc5e662f-4f32-bd91-95ee-71397ef94601" UUID_SUB="b66d96aa-5e60-83f2-8a8c-9f8c3d1caf65" LABEL="iHugo:0" TYPE="linux_raid_member"
/dev/sdd: UUID="dc5e662f-4f32-bd91-95ee-71397ef94601" UUID_SUB="6d33f5c9-e81f-3965-da7b-37a2366ed1d1" LABEL="iHugo:0" TYPE="linux_raid_member"

把想法大声说出来

系统可以识别驱动器,mdadm 也可以看到某种阵列,但无法启动它。为什么阵列被记录为脏降级,但在 /sys/block/md0/md/array_state 中却显示为非活动状态?

到目前为止我尝试过什么

使用 mdadm --run 重新启动它

root@iHugo:~# mdadm --run /dev/md0
mdadm: Unknown keyword INACTIVE-ARRAY
mdadm: failed to start array /dev/md/iHugo:0: Input/output error

强制其重新组装

root@iHugo:~# mdadm --assemble --force /dev/md0 /dev/sdb /dev/sdd /dev/sde /dev/sdf1
mdadm: Unknown keyword INACTIVE-ARRAY
mdadm: failed to RUN_ARRAY /dev/md0: Input/output error

编辑1

dmesg -t --level=alert,crit,err,warn 的输出
secureboot: Secure boot could not be determined (mode 0)
x86/cpu: VMX (outside TXT) disabled by BIOS
MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details.
 #3
pmd_set_huge: Cannot satisfy [mem 0xf8000000-0xf8200000] with a huge-page mapping due to MTRR override.
ENERGY_PERF_BIAS: Set to 'normal', was 'performance'
pci 0000:00:02.0: BIOS left Intel GPU interrupts enabled; disabling
ACPI Warning: SystemIO range 0x0000000000000428-0x000000000000042F conflicts with OpRegion 0x0000000000000400-0x000000000000047F (\PMIO) (20200925/utaddress-204)
ACPI Warning: SystemIO range 0x0000000000000540-0x000000000000054F conflicts with OpRegion 0x0000000000000500-0x0000000000000563 (\GPIO) (20200925/utaddress-204)
ACPI Warning: SystemIO range 0x0000000000000530-0x000000000000053F conflicts with OpRegion 0x0000000000000500-0x0000000000000563 (\GPIO) (20200925/utaddress-204)
ACPI Warning: SystemIO range 0x0000000000000500-0x000000000000052F conflicts with OpRegion 0x0000000000000500-0x0000000000000563 (\GPIO) (20200925/utaddress-204)
lpc_ich: Resource conflict(s) found affecting gpio_ich
r8169 0000:03:00.0: can't disable ASPM; OS doesn't have ASPM control
md/raid:md0: cannot start dirty degraded array.
md/raid:md0: failed to run raid set.
md: pers->run() failed ...
at24 0-0050: supply vcc not found, using dummy regulator
r8169 0000:03:00.0: firmware: failed to load rtl_nic/rtl8168f-1.fw (-2)
firmware_class: See https://wiki.debian.org/Firmware for information about missing firmware
r8169 0000:03:00.0: Direct firmware load for rtl_nic/rtl8168f-1.fw failed with error -2
r8169 0000:03:00.0: Unable to load firmware rtl_nic/rtl8168f-1.fw (-2)
FAT-fs (sda1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
OCFS2 User DLM kernel interface loaded
md/raid:md0: cannot start dirty degraded array.
md/raid:md0: failed to run raid set.
md: pers->run() failed ...
md/raid:md0: cannot start dirty degraded array.
md/raid:md0: failed to run raid set.
md: pers->run() failed ...
md/raid:md0: cannot start dirty degraded array.
md/raid:md0: failed to run raid set.
md: pers->run() failed ...
md/raid:md0: cannot start dirty degraded array.
md/raid:md0: failed to run raid set.
md: pers->run() failed ...
md/raid:md0: cannot start dirty degraded array.
md/raid:md0: failed to run raid set.
md: pers->run() failed ...
md/raid:md0: cannot start dirty degraded array.
md/raid:md0: failed to run raid set.
md: pers->run() failed ...
md/raid:md0: cannot start dirty degraded array.
md/raid:md0: failed to run raid set.
md: pers->run() failed ...
md/raid:md0: cannot start dirty degraded array.
md/raid:md0: failed to run raid set.
md: pers->run() failed ...
md/raid:md0: cannot start dirty degraded array.
md/raid:md0: failed to run raid set.
md: pers->run() failed ...
md/raid:md0: cannot start dirty degraded array.
md/raid:md0: failed to run raid set.
md: pers->run() failed ...
md/raid:md0: cannot start dirty degraded array.
md/raid:md0: failed to run raid set.
md: pers->run() failed ...
md/raid:md0: cannot start dirty degraded array.
md/raid:md0: failed to run raid set.
md: pers->run() failed ...

相关内容