mdadm RAID5 无法组装

mdadm RAID5 无法组装

昨天我想对我的服务器进行一些维护。我按一次电源按钮将其关闭,每次都工作得很好。

10 分钟后服务器仍然关闭,我就到此为止并使用电源按钮强制关闭它。 (在强制关闭之前我尝试使用 ssh 进入它,但 ssh 服务已经停止)。

完成维护并重新启动服务器后,我注意到由 7 个 2TB 磁盘组成的 RAID5 不再工作。它被分成两个 RAID,由 5 个磁盘和 2 个磁盘组成,所有磁盘均处于 (S) 模式(备用),处于非活动状态。

我尝试过mdadm --assemble --scan --run -f但没有帮助:

mdadm: Merging with already-assembled /dev/md/128
mdadm: failed to add /dev/sdc1 to /dev/md/128: Invalid argument
mdadm: failed to add /dev/sde1 to /dev/md/128: Invalid argument
mdadm: failed to RUN_ARRAY /dev/md/128: Input/output error
mdadm: No arrays found in config file or automatically

它似乎把东西组装了一半:

cat /proc/mdstat 
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md128 : inactive sda1[0] sdg1[6] sdf1[5] sdd1[7] sdb1[1]
      9766891962 blocks super 1.2

unused devices: <none>

我还尝试使用以下命令手动重新组装它mdadm --assemble --run /dev/md0 /dev/sd[abcdefg]1 --verbose

mdadm: looking for devices for /dev/md0
mdadm: /dev/sda1 is identified as a member of /dev/md0, slot 0.
mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot 1.
mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 2.
mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 4.
mdadm: /dev/sde1 is identified as a member of /dev/md0, slot 3.
mdadm: /dev/sdf1 is identified as a member of /dev/md0, slot 6.
mdadm: /dev/sdg1 is identified as a member of /dev/md0, slot 5.
mdadm: added /dev/sdb1 to /dev/md0 as 1
mdadm: failed to add /dev/sdc1 to /dev/md0: Invalid argument
mdadm: failed to add /dev/sde1 to /dev/md0: Invalid argument
mdadm: added /dev/sdd1 to /dev/md0 as 4
mdadm: added /dev/sdg1 to /dev/md0 as 5
mdadm: added /dev/sdf1 to /dev/md0 as 6
mdadm: added /dev/sda1 to /dev/md0 as 0
mdadm: failed to RUN_ARRAY /dev/md0: Input/output error

现在检查所有磁盘mdadm --examine /dev/sd[abcdefg]1都有此输出在 hastebin.com 上查看在我看来一切都很好。

这是使用的磁盘lsblk

NAME                      MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                         8:0    0   1,8T  0 disk 
└─sda1                      8:1    0   1,8T  0 part 
sdb                         8:16   0   1,8T  0 disk 
└─sdb1                      8:17   0   1,8T  0 part 
sdc                         8:32   0   1,8T  0 disk 
└─sdc1                      8:33   0   1,8T  0 part 
sdd                         8:48   0   1,8T  0 disk 
└─sdd1                      8:49   0   1,8T  0 part 
sde                         8:64   1   1,8T  0 disk 
└─sde1                      8:65   1   1,8T  0 part 
sdf                         8:80   1   1,8T  0 disk 
└─sdf1                      8:81   1   1,8T  0 part 
sdg                         8:96   1   1,8T  0 disk 
└─sdg1                      8:97   1   1,8T  0 part 

使用的硬盘不是最好的,但它们可以工作。适用于从sda到 的所有驱动器的智能输出sdg 也可以在 hastebin.com 上找到。

由于这个事实我的 RAID5 磁盘产生错误 我假设所有数据都已丢失。 ...

编辑1:

dmesg -T返回:

[Sa Okt  7 15:41:08 2017] md/raid:md128: device sda1 operational as raid disk 0
[Sa Okt  7 15:41:08 2017] md/raid:md128: device sdf1 operational as raid disk 6
[Sa Okt  7 15:41:08 2017] md/raid:md128: device sdb1 operational as raid disk 1
[Sa Okt  7 15:41:08 2017] md/raid:md128: device sdd1 operational as raid disk 4
[Sa Okt  7 15:41:08 2017] md/raid:md128: device sdg1 operational as raid disk 5
[Sa Okt  7 15:41:08 2017] md/raid:md128: not enough operational devices (2/7 failed)
[Sa Okt  7 15:41:08 2017] md/raid:md128: failed to run raid set.
[Sa Okt  7 15:41:08 2017] md: pers->run() failed ...
[Sa Okt  7 15:41:12 2017] md: md127 stopped.
[Sa Okt  7 15:41:15 2017] md: md128 stopped.
[Sa Okt  7 15:41:20 2017] md: md0 stopped.
[Sa Okt  7 15:41:20 2017] md: sdc1 does not have a valid v1.2 superblock, not importing!
[Sa Okt  7 15:41:20 2017] md: md_import_device returned -22
[Sa Okt  7 15:41:20 2017] md: sde1 does not have a valid v1.2 superblock, not importing!
[Sa Okt  7 15:41:20 2017] md: md_import_device returned -22

如何修复超级块?


我在这里做错了什么吗?

为什么我得到:

mdadm: failed to add [...] to [...]: Invalid argument

这里什么论证是无效的?

我怎样才能进一步调试这个?

答案1

警告:这个答案是关于决定性症状的,但事实证明真正的答案与我建议的不同。

然而,这可能已经发生:问题可能是这样的:

Unused Space : before=262056 sectors, after=177 sectors
Unused Space : before=262056 sectors, after=177 sectors
Unused Space : before=262056 sectors, after=18446744073709289480 sectors
Unused Space : before=262056 sectors, after=177 sectors
Unused Space : before=262056 sectors, after=18446744073709289480 sectors
Unused Space : before=262056 sectors, after=177 sectors
Unused Space : before=262056 sectors, after=177 sectors

我无法提供一个令人愉快的方法来纠正这个问题。您应该备份 sdc1 的 MD 元数据,然后查看磁盘格式并使用十六进制编辑器来修复此问题。

dd也许您可以从其他磁盘之一复制相关部分。您“只需”找出这些字节在哪里。

有点搞笑的是这样的:

   Checksum : 85f67f98 - correct
   Checksum : 6a4fb921 - correct
   Checksum : 92db2c10 - correct
   Checksum : ad5c81b8 - correct
   Checksum : a657023 - correct
   Checksum : 6880d6c7 - correct
   Checksum : c0c31cf - correct

因此,纠正元数据可能会破坏校验和。我不知道这是否是一个真正的问题,但此时提出一个新问题可能是有意义的。

相关内容