mdadm raid6 启动失败 I/O 错误状态 活动、失败、未启动

mdadm raid6 启动失败 I/O 错误状态 活动、失败、未启动

我正在尝试运行 raid6 阵列,但它无法启动。

阵列简史:该阵列最初由 6 个磁盘(每个 8TB)构建。

mdadm --create --verbose /dev/md1 --level=6 --raid-devices=6 /dev/sdb1 /dev/sde1 /dev/sdg1 /dev/sdh1 /dev/sdi1 /dev/sdj1

添加 1 个磁盘以扩大阵列

mdadm -v --grow --raid-devices=7 /dev/md1

然后在 gparted 中调整分区大小。

添加了另外 2 个磁盘以扩大阵列,但分区尚未重新调整大小。阵列曾在启动时自动启动,但现在无法启动。

mdadm: failed to start array /dev/md1: Input/output error

以下是一些其他相关输出:

s:~$ mdadm --detail /dev/md1
/dev/md1:
           Version : 1.2
     Creation Time : Wed Aug 25 16:25:06 2021
        Raid Level : raid6
     Used Dev Size : 18446744073709551615
      Raid Devices : 9
     Total Devices : 8
       Persistence : Superblock is persistent

       Update Time : Wed Oct  6 16:45:06 2021
             State : active, FAILED, Not Started
    Active Devices : 8
   Working Devices : 8
    Failed Devices : 0
     Spare Devices : 0

            Layout : left-symmetric
        Chunk Size : 512K

Consistency Policy : unknown

              Name : Octavius:1  (local to host Octavius)
              UUID : 80bd1af7:20800c35:be64a577:8b62e937
            Events : 198308

    Number   Major   Minor   RaidDevice State
       -       0        0        0      removed
       -       0        0        1      removed
       -       0        0        2      removed
       -       0        0        3      removed
       -       0        0        4      removed
       -       0        0        5      removed
       -       0        0        6      removed
       -       0        0        7      removed
       -       0        0        8      removed

       -       8      177        5      sync   /dev/sdl1
       -       8      161        4      sync   /dev/sdk1
       -       8      145        3      sync   /dev/sdj1
       -       8      129        2      sync   /dev/sdi1
       -       8       97        1      sync   /dev/sdg1
       -       8       49        7      sync   /dev/sdd1
       -       8       33        0      sync   /dev/sdc1
       -       8       17        6      sync   /dev/sdb1

/dev/sda1 应该是该数组的成员,但已丢失。我不知道为什么所有被删除的设备都会出现。

s:~$ sudo mdadm --examine /dev/sda1
/dev/sda1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 80bd1af7:20800c35:be64a577:8b62e937
           Name : Octavius:1  (local to host Octavius)
  Creation Time : Wed Aug 25 16:25:06 2021
     Raid Level : raid6
   Raid Devices : 9

 Avail Dev Size : 15627798528 (7451.92 GiB 8001.43 GB)
     Array Size : 54697251840 (52163.36 GiB 56009.99 GB)
  Used Dev Size : 15627786240 (7451.91 GiB 8001.43 GB)
    Data Offset : 251904 sectors
   Super Offset : 8 sectors
   Unused Space : before=251824 sectors, after=12288 sectors
          State : active
    Device UUID : 9bddd5dd:790156b1:7b8e38d3:37558974

Internal Bitmap : 8 sectors from superblock
    Update Time : Wed Oct  6 16:45:06 2021
  Bad Block Log : 512 entries available at offset 40 sectors
       Checksum : b23ecdf9 - correct
         Events : 0

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : spare
   Array State : AAAAAAAA. ('A' == active, '.' == missing, 'R' == replacing)
s:~$ sudo mdadm --examine /dev/sdb1
/dev/sdb1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 80bd1af7:20800c35:be64a577:8b62e937
           Name : Octavius:1  (local to host Octavius)
  Creation Time : Wed Aug 25 16:25:06 2021
     Raid Level : raid6
   Raid Devices : 9

 Avail Dev Size : 15627798528 (7451.92 GiB 8001.43 GB)
     Array Size : 54697251840 (52163.36 GiB 56009.99 GB)
  Used Dev Size : 15627786240 (7451.91 GiB 8001.43 GB)
    Data Offset : 251904 sectors
   Super Offset : 8 sectors
   Unused Space : before=251824 sectors, after=12288 sectors
          State : active
    Device UUID : ffa868e4:ee48f113:bd015c5c:7f92f378

Internal Bitmap : 8 sectors from superblock
    Update Time : Wed Oct  6 16:45:06 2021
  Bad Block Log : 512 entries available at offset 40 sectors
       Checksum : a8fd97fc - correct
         Events : 198308

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 6
   Array State : AAAAAAAA. ('A' == active, '.' == missing, 'R' == replacing)

阵列中所有其他设备的输出与 /dev/sdb1 相同。

任何帮助或建议都会很棒,我可以提供任何其他可能有帮助的输出。

答案1

所以我想我找出了问题所在,尽管我不确定它是如何发生的。

我错过了所有潜水都被标记为备用的事实:

:~$ sudo mdadm --stop /dev/md1
mdadm: stopped /dev/md1
:~$ cat /proc/mdstat 
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md1 : inactive sdi1[2](S) sda1[9](S) sdc1[0](S) sdl1[5](S) sdb1[6](S) sdk1[4](S) sdj1[3](S) sdg1[1](S) sdd1[8](S)
      70325093376 blocks super 1.2

所以我想我应该硬着头皮(丢失少量非备份数据的风险很小),停止阵列,然后强制组装。该阵列现在正在重建,数据似乎很好。

:~$ sudo mdadm --assemble --force /dev/md1 /dev/sda1 /dev/sdi1 /dev/sdc1 /dev/sdl1 /dev/sdb1 /dev/sdk1 /dev/sdj1 /dev/sdg1 /dev/sdd1 --verbose
mdadm: looking for devices for /dev/md1
mdadm: /dev/sda1 is identified as a member of /dev/md1, slot -1.
mdadm: /dev/sdi1 is identified as a member of /dev/md1, slot 2.
mdadm: /dev/sdc1 is identified as a member of /dev/md1, slot 0.
mdadm: /dev/sdl1 is identified as a member of /dev/md1, slot 5.
mdadm: /dev/sdb1 is identified as a member of /dev/md1, slot 6.
mdadm: /dev/sdk1 is identified as a member of /dev/md1, slot 4.
mdadm: /dev/sdj1 is identified as a member of /dev/md1, slot 3.
mdadm: /dev/sdg1 is identified as a member of /dev/md1, slot 1.
mdadm: /dev/sdd1 is identified as a member of /dev/md1, slot 7.
mdadm: Marking array /dev/md1 as 'clean'
mdadm: added /dev/sdg1 to /dev/md1 as 1
mdadm: added /dev/sdi1 to /dev/md1 as 2
mdadm: added /dev/sdj1 to /dev/md1 as 3
mdadm: added /dev/sdk1 to /dev/md1 as 4
mdadm: added /dev/sdl1 to /dev/md1 as 5
mdadm: added /dev/sdb1 to /dev/md1 as 6
mdadm: added /dev/sdd1 to /dev/md1 as 7
mdadm: no uptodate device for slot 8 of /dev/md1
mdadm: added /dev/sda1 to /dev/md1 as -1
mdadm: added /dev/sdc1 to /dev/md1 as 0
mdadm: /dev/md1 has been started with 8 drives (out of 9) and 1 spare.
:~$ cat /proc/mdstat 
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md1 : active raid6 sdc1[0] sda1[9] sdd1[8] sdb1[6] sdl1[5] sdk1[4] sdj1[3] sdi1[2] sdg1[1]
      54697251840 blocks super 1.2 level 6, 512k chunk, algorithm 2 [9/8] [UUUUUUUU_]
      [>....................]  recovery =  0.0% (1019952/7813893120) finish=1276.6min speed=101997K/sec
      bitmap: 10/59 pages [40KB], 65536KB chunk

我将此视为解决其他类似问题的建议,但是我的阵列的状态(活动、失败、未启动)与我能找到的所有示例不同,并且我最初对强制选项感到不舒服。

希望这对将来的人有帮助......

相关内容