mdadm 和 RAID-5 恢复

2024-5-28 • tag-icon

我在使用 mdadm 和 Debian 的 RAID-5 阵列时遇到了一些问题。

首先，我丢失了一个驱动器（完全丢失，甚至无法被 BIOS 识别），然后我用一个新的驱动器替换了它；重建已开始，但被第二个磁盘上的读取错误中断（并且该磁盘已被删除）：

raid5:md0: read error not correctable (sector 1398118536 on sdd)

我认为这个将在接下来的几天内死机，但我想重新添加它以与降级阵列一起工作来执行一些备份（只有几个扇区损坏，我希望在它发生故障之前保存最多的数据）。

以下是我的磁盘（按 RAID 顺序）：

南达科他州- 好的
安全数据表-（读取错误的那个，重建时从阵列中删除）
静置时间-（那个坏了的，被备用的替换了，但重建时明显中断了 => 我不确定它的数据完整性）
自卫队- 好的

事实上我无法重新添加安全数据表到数组，使用以下命令：

# mdadm --assemble /dev/md0 /dev/sdc1 /dev/sdd1 /dev/sdf1 --force --run
mdadm: failed to RUN_ARRAY /dev/md0: Input/output error
mdadm: Not enough devices to start the array.
# mdadm -D /dev/md0 
/dev/md0:
        Version : 0.90
  Creation Time : Tue Aug 24 14:20:39 2010
     Raid Level : raid5
  Used Dev Size : 1465039488 (1397.17 GiB 1500.20 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Sun Oct 23 01:57:22 2011
          State : active, FAILED, Not Started
 Active Devices : 2
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 128K

           UUID : 01017848:84926c43:1751c931:a76e1cde (local to host tryphon)
         Events : 0.131544

    Number   Major   Minor   RaidDevice State
       0       8       33        0      active sync   /dev/sdc1
       1       0        0        1      removed
       2       0        0        2      removed
       3       8       81        3      active sync   /dev/sdf1

       4       8       49        -      spare   /dev/sdd1

如您所见，sdd 被识别为备用，而不是与 RAID 设备 #1 同步。

而且我不知道如何告诉 mdadm sdd 是 RAID 设备 #1。

如果有人有任何想法，那就太好了！

谢谢。

PS：如果这有帮助，这里是 mdadm 磁盘检查的输出：

# mdadm -E /dev/sd[cdef]1
/dev/sdc1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 01017848:84926c43:1751c931:a76e1cde (local to host tryphon)
  Creation Time : Tue Aug 24 14:20:39 2010
     Raid Level : raid5
  Used Dev Size : 1465039488 (1397.17 GiB 1500.20 GB)
     Array Size : 4395118464 (4191.51 GiB 4500.60 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 0

    Update Time : Sun Oct 23 01:57:22 2011
          State : clean
 Active Devices : 2
Working Devices : 3
     Failed Devices : 2
  Spare Devices : 1
       Checksum : dfeeeace - correct
         Events : 131544

         Layout : left-symmetric
     Chunk Size : 128K

      Number   Major   Minor   RaidDevice State
this     0       8       33        0      active sync   /dev/sdc1

   0     0       8       33        0      active sync   /dev/sdc1
   1     1       0        0        1      faulty removed
   2     2       0        0        2      faulty removed
   3     3       8       81        3      active sync   /dev/sdf1
   4     4       8       49        4      spare   /dev/sdd1
/dev/sdd1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 01017848:84926c43:1751c931:a76e1cde (local to host tryphon)
  Creation Time : Tue Aug 24 14:20:39 2010
     Raid Level : raid5
  Used Dev Size : 1465039488 (1397.17 GiB 1500.20 GB)
     Array Size : 4395118464 (4191.51 GiB 4500.60 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 0

    Update Time : Sun Oct 23 01:57:22 2011
          State : clean
 Active Devices : 2
Working Devices : 3
 Failed Devices : 2
  Spare Devices : 1
       Checksum : dfeeeae0 - correct
         Events : 131544

         Layout : left-symmetric
     Chunk Size : 128K

      Number   Major   Minor   RaidDevice State
this     4       8       49        4      spare   /dev/sdd1

   0     0       8       33        0      active sync   /dev/sdc1
   1     1       0        0        1      faulty removed
   2     2       0        0        2      faulty removed
   3     3       8       81        3      active sync   /dev/sdf1
   4     4       8       49        4      spare   /dev/sdd1
/dev/sde1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 01017848:84926c43:1751c931:a76e1cde (local to host tryphon)
  Creation Time : Tue Aug 24 14:20:39 2010
     Raid Level : raid5
  Used Dev Size : 1465039488 (1397.17 GiB 1500.20 GB)
     Array Size : 4395118464 (4191.51 GiB 4500.60 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0

    Update Time : Sat Oct 22 22:11:52 2011
          State : clean
 Active Devices : 2
Working Devices : 3
 Failed Devices : 2
  Spare Devices : 1
       Checksum : dfeeb657 - correct
         Events : 131534

         Layout : left-symmetric
     Chunk Size : 128K

      Number   Major   Minor   RaidDevice State
this     4       8       65        4      spare   /dev/sde1

   0     0       8       33        0      active sync   /dev/sdc1
   1     1       0        0        1      faulty removed
   2     2       0        0        2      faulty removed
   3     3       8       81        3      active sync   /dev/sdf1
   4     4       8       65        4      spare   /dev/sde1
/dev/sdf1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 01017848:84926c43:1751c931:a76e1cde (local to host tryphon)
  Creation Time : Tue Aug 24 14:20:39 2010
     Raid Level : raid5
  Used Dev Size : 1465039488 (1397.17 GiB 1500.20 GB)
     Array Size : 4395118464 (4191.51 GiB 4500.60 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 0

    Update Time : Sun Oct 23 01:57:22 2011
          State : clean
 Active Devices : 2
Working Devices : 3
 Failed Devices : 2
  Spare Devices : 1
       Checksum : dfeeeb04 - correct
         Events : 131544

         Layout : left-symmetric
     Chunk Size : 128K

      Number   Major   Minor   RaidDevice State
this     3       8       81        3      active sync   /dev/sdf1

   0     0       8       33        0      active sync   /dev/sdc1
   1     1       0        0        1      faulty removed
       2     2       0        0        2      faulty removed
   3     3       8       81        3      active sync   /dev/sdf1
   4     4       8       49        4      spare   /dev/sdd1

答案1

你首先需要的是 sdD 的非 RAID 副本。dd_rescue例如，使用。恢复时不要在该 RAID 中使用该磁盘。

当你拥有此副本时，使用它来启动数组sdE — 使用 put 代替 it 关键字。即使直接使用 with失败missing，也可以通过两个提示来实现这一点：--force

1) 您可以使用重新创建 RAID --assume-clean。（请不要忘记此选项，因为使用它只会更新超级块，而不会更新奇偶校验）。

2）您可以-A组装阵列。

在两种情况下，你都必须提供完全相同的配置选项（布局、块大小、磁盘顺序等）就像损坏的 RAID 一样。事实上，我建议从 -A-ssembling 开始，因为它甚至不会更新超级块，同时允许您访问数据。只有当您确定它正确组装时，您才可以通过假设干净的重新创建使其持久。

只要您运行具有 3 个磁盘的 RAID，只需放入 sdE，就不会丢失一个。

答案1

相关内容