RAID 5 在启动时处于非活动状态 - 错误:为 RAID md/0 找到两个索引为 1 的磁盘

RAID 5 在启动时处于非活动状态 - 错误:为 RAID md/0 找到两个索引为 1 的磁盘

我的操作系统是 Ubuntu 12.04.2 LTS,内核为:3.2.0-49-generic #75-Ubuntu SMP Tue Jun 18 17:39:32 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

我有一个由 3 个硬盘组成的 raid 5 阵列,突然,它在启动时开始处于非活动状态。由于主目录安装在其上,系统无法启动,并要求用户手动干预。我在论坛上发现了类似的报告,但其中大多数恰好有一个有缺陷的硬盘,而我的情况并非如此。

停止阵列(mdadm --stop /dev/md0)并重新启动(mdadm --assemble --scan /dev/md0)没有显示任何错误(没有抱怨或阵列重建)然后它可以正确安装(手动安装)那么为什么它不能在启动时启动?

在检查组成 raid 阵列的所有硬盘(sda、sdb、sdc)的 smartctl 后,我没有发现任何错误(无 Current_Pending_Sector、UDMA_CRC_Error_Count、Offline_Uncorrectable)。已经进行了短期和长期测试。

我注意到的一件事,也是问题的主要原因是 grub-probe 返回此错误:“错误:为 RAID md/0 找到两个索引为 1 的磁盘。”

使用 -v(详细输出)运行相同的命令,在探测 hd0 和 hd1 之后,我可以看到两行注释“grub-probe:info:找到阵列 md/0 (mdraid1x)。”,这两个阵列分别映射到 sda 和 sdb 上。那么 sdc 没有 grub 可读的 raid 元数据吗?遇到此问题的人建议将 raid 元数据从 0.90 更新到 1.x,但我的 raid 已经在使用 1.2 了。

我尝试手动让 sdc 硬盘发生故障两次(第一次只是将其移除并重新添加,第二次使用 mdadm --zero-superblock /dev/sdc)并强制重建 raid,但错误无法消失,所以我现在陷入困境。有人知道问题可能是什么以及如何修复吗?

下面是我用来诊断问题的命令及其输出列表:

启动后 /proc/stat

# cat /proc/mdstat 
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md0 : inactive sdc1[3](S) sda1[4](S) sdb[5](S)
      5860540617 blocks super 1.2

unused devices: <none>

/etc/mdadm/mdadm.conf

# cat /etc/mdadm/mdadm.conf 
# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#

# by default (built-in), scan all partitions (/proc/partitions) and all
# containers for MD superblocks. alternatively, specify devices to scan, using
# wildcards if desired.
#DEVICE partitions containers

# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes

# automatically tag new arrays as belonging to the local system
HOMEHOST <system>

# instruct the monitoring daemon where to send mail alerts
MAILADDR root

# definitions of existing MD arrays
ARRAY /dev/md/0 metadata=1.2 UUID=1b273efc:62f3bc36:4579f11d:15bbc75e name=ubuntu:0

# This file was auto-generated on Mon, 27 Aug 2012 17:33:16 +0300
# by mkconf $Id$

mdadm --检查 --扫描

# mdadm --examine --scan
ARRAY /dev/md/0 metadata=1.2 UUID=1b273efc:62f3bc36:4579f11d:15bbc75e name=ubuntu:0

mdadm --detail

# mdadm --detail --scan
mdadm: cannot open /dev/md/0: No such file or directory

# mdadm --detail --scan /dev/md0 
mdadm: md device /dev/md0 does not appear to be active.

mdadm --stop /dev/md0 && mdadm --assemble --scan /dev/md0 && mdadm --detail /dev/md0

# mdadm --stop /dev/md0 
mdadm: stopped /dev/md0

# mdadm --assemble --scan /dev/md0
mdadm: /dev/md0 has been started with 3 drives.

# cat /proc/mdstat 
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md0 : active raid5 sda1[4] sdc1[3] sdb1[5]
      3907025920 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]

unused devices: <none>

# mdadm --detail /dev/md0 
/dev/md0:
        Version : 1.2
  Creation Time : Sat Mar 24 15:31:43 2012
     Raid Level : raid5
     Array Size : 3907025920 (3726.03 GiB 4000.79 GB)
  Used Dev Size : 1953512960 (1863.02 GiB 2000.40 GB)
   Raid Devices : 3
  Total Devices : 3
    Persistence : Superblock is persistent

    Update Time : Sun Jul 21 22:53:21 2013
          State : clean 
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 512K

           Name : ubuntu:0
           UUID : 1b273efc:62f3bc36:4579f11d:15bbc75e
         Events : 319386

    Number   Major   Minor   RaidDevice State
       4       8        1        0      active sync   /dev/sda1
       5       8       17        1      active sync   /dev/sdb1
       3       8       33        2      active sync   /dev/sdc1

mdadm --detail --scan && mdadm --examine --scan

# mdadm --detail --scan
ARRAY /dev/md0 metadata=1.2 name=ubuntu:0 UUID=1b273efc:62f3bc36:4579f11d:15bbc75e

# mdadm --examine --scan
ARRAY /dev/md/0 metadata=1.2 UUID=1b273efc:62f3bc36:4579f11d:15bbc75e name=ubuntu:0

grub-probe -v /

# grub-probe -v /
grub-probe: info: cannot open `/boot/grub/device.map'.
grub-probe: info: Scanning for dmraid_nv RAID devices on disk hd0.
grub-probe: info: the size of hd0 is 3907029168.
grub-probe: info: the size of hd0 is 3907029168.
grub-probe: info: Scanning for dmraid_nv RAID devices on disk hd1.
grub-probe: info: the size of hd1 is 3907029168.
grub-probe: info: the size of hd1 is 3907029168.
grub-probe: info: Scanning for dmraid_nv RAID devices on disk hd2.
grub-probe: info: the size of hd2 is 3907029168.
grub-probe: info: the size of hd2 is 3907029168.
grub-probe: info: Scanning for dmraid_nv RAID devices on disk hd3.
grub-probe: info: the size of hd3 is 250069680.
grub-probe: info: the size of hd3 is 250069680.
grub-probe: info: scanning hd0 for LVM.
grub-probe: info: the size of hd0 is 3907029168.
grub-probe: info: no LVM signature found.
grub-probe: info: the size of hd0 is 3907029168.
grub-probe: info: scanning hd1 for LVM.
grub-probe: info: the size of hd1 is 3907029168.
grub-probe: info: no LVM signature found.
grub-probe: info: the size of hd1 is 3907029168.
grub-probe: info: scanning hd2 for LVM.
grub-probe: info: the size of hd2 is 3907029168.
grub-probe: info: no LVM signature found.
grub-probe: info: the size of hd2 is 3907029168.
grub-probe: info: scanning hd3 for LVM.
grub-probe: info: the size of hd3 is 250069680.
grub-probe: info: no LVM signature found.
grub-probe: info: the size of hd3 is 250069680.
grub-probe: info: Scanning for mdraid09 RAID devices on disk hd0.
grub-probe: info: the size of hd0 is 3907029168.
grub-probe: info: the size of hd0 is 3907029168.
grub-probe: info: Scanning for mdraid09 RAID devices on disk hd1.
grub-probe: info: the size of hd1 is 3907029168.
grub-probe: info: the size of hd1 is 3907029168.
grub-probe: info: Scanning for mdraid09 RAID devices on disk hd2.
grub-probe: info: the size of hd2 is 3907029168.
grub-probe: info: the size of hd2 is 3907029168.
grub-probe: info: Scanning for mdraid09 RAID devices on disk hd3.
grub-probe: info: the size of hd3 is 250069680.
grub-probe: info: the size of hd3 is 250069680.
grub-probe: info: Scanning for mdraid1x RAID devices on disk hd0.
grub-probe: info: the size of hd0 is 3907029168.
grub-probe: info: the size of hd0 is 3907029168.
grub-probe: info: Scanning for mdraid1x RAID devices on disk hd1.
grub-probe: info: the size of hd1 is 3907029168.
grub-probe: info: Found array md/0 (mdraid1x).
grub-probe: info: the size of hd1 is 3907029168.
grub-probe: info: Scanning for mdraid1x RAID devices on disk hd2.
grub-probe: info: the size of hd2 is 3907029168.
grub-probe: info: the size of hd2 is 3907029168.
grub-probe: info: Scanning for mdraid1x RAID devices on disk hd3.
grub-probe: info: the size of hd3 is 250069680.
grub-probe: info: the size of hd3 is 250069680.
grub-probe: info: Scanning for mdraid09 RAID devices on disk hd0.
grub-probe: info: the size of hd0 is 3907029168.
grub-probe: info: the size of hd0 is 3907029168.
grub-probe: info: Scanning for mdraid09 RAID devices on disk hd0,msdos1.
grub-probe: info: the size of hd0 is 3907029168.
grub-probe: info: Scanning for mdraid09 RAID devices on disk hd1.
grub-probe: info: the size of hd1 is 3907029168.
grub-probe: info: the size of hd1 is 3907029168.
grub-probe: info: Scanning for mdraid09 RAID devices on disk hd1,msdos1.
grub-probe: info: the size of hd1 is 3907029168.
grub-probe: info: Scanning for mdraid09 RAID devices on disk hd2.
grub-probe: info: the size of hd2 is 3907029168.
grub-probe: info: the size of hd2 is 3907029168.
grub-probe: info: Scanning for mdraid09 RAID devices on disk hd2,msdos1.
grub-probe: info: the size of hd2 is 3907029168.
grub-probe: info: Scanning for mdraid09 RAID devices on disk hd3.
grub-probe: info: the size of hd3 is 250069680.
grub-probe: info: the size of hd3 is 250069680.
grub-probe: info: Scanning for mdraid09 RAID devices on disk hd3,msdos2.
grub-probe: info: the size of hd3 is 250069680.
grub-probe: info: Scanning for mdraid09 RAID devices on disk hd3,msdos1.
grub-probe: info: the size of hd3 is 250069680.
grub-probe: info: Scanning for mdraid1x RAID devices on disk hd0.
grub-probe: info: the size of hd0 is 3907029168.
grub-probe: info: the size of hd0 is 3907029168.
grub-probe: info: Scanning for mdraid1x RAID devices on disk hd0,msdos1.
grub-probe: info: the size of hd0 is 3907029168.
grub-probe: info: Found array md/0 (mdraid1x).
grub-probe: info: Scanning for mdraid1x RAID devices on disk hd1.
grub-probe: info: the size of hd1 is 3907029168.
grub-probe: info: the size of hd1 is 3907029168.
grub-probe: info: Scanning for mdraid1x RAID devices on disk hd1,msdos1.
grub-probe: info: the size of hd1 is 3907029168.
error: found two disks with the index 1 for RAID md/0.
grub-probe: info: Scanning for mdraid1x RAID devices on disk hd2.
grub-probe: info: the size of hd2 is 3907029168.
grub-probe: info: the size of hd2 is 3907029168.
grub-probe: info: Scanning for mdraid1x RAID devices on disk hd2,msdos1.
grub-probe: info: the size of hd2 is 3907029168.
grub-probe: info: Scanning for mdraid1x RAID devices on disk hd3.
grub-probe: info: the size of hd3 is 250069680.
grub-probe: info: the size of hd3 is 250069680.
grub-probe: info: Scanning for mdraid1x RAID devices on disk hd3,msdos2.
grub-probe: info: the size of hd3 is 250069680.
grub-probe: info: Scanning for mdraid1x RAID devices on disk hd3,msdos1.
grub-probe: info: the size of hd3 is 250069680.
grub-probe: info: scanning md/0 for LVM.
grub-probe: info: no LVM signature found.
grub-probe: info: scanning hd0 for LVM.
grub-probe: info: the size of hd0 is 3907029168.
grub-probe: info: no LVM signature found.
grub-probe: info: the size of hd0 is 3907029168.
grub-probe: info: scanning hd0,msdos1 for LVM.
grub-probe: info: the size of hd0 is 3907029168.
grub-probe: info: no LVM signature found.
grub-probe: info: scanning hd1 for LVM.
grub-probe: info: the size of hd1 is 3907029168.
grub-probe: info: no LVM signature found.
grub-probe: info: the size of hd1 is 3907029168.
grub-probe: info: scanning hd1,msdos1 for LVM.
grub-probe: info: the size of hd1 is 3907029168.
grub-probe: info: no LVM signature found.
grub-probe: info: scanning hd2 for LVM.
grub-probe: info: the size of hd2 is 3907029168.
grub-probe: info: no LVM signature found.
grub-probe: info: the size of hd2 is 3907029168.
grub-probe: info: scanning hd2,msdos1 for LVM.
grub-probe: info: the size of hd2 is 3907029168.
grub-probe: info: no LVM signature found.
grub-probe: info: scanning hd3 for LVM.
grub-probe: info: the size of hd3 is 250069680.
grub-probe: info: no LVM signature found.
grub-probe: info: the size of hd3 is 250069680.
grub-probe: info: scanning hd3,msdos2 for LVM.
grub-probe: info: the size of hd3 is 250069680.
grub-probe: info: no LVM signature found.
grub-probe: info: scanning hd3,msdos1 for LVM.
grub-probe: info: the size of hd3 is 250069680.
grub-probe: info: no LVM signature found.
grub-probe: info: /dev/sdd1 starts from 2048.
grub-probe: info: opening the device hd3.
grub-probe: info: the size of hd3 is 250069680.
grub-probe: info: Partition 0 starts from 2048.
grub-probe: info: opening hd3,msdos1.
grub-probe: info: the size of hd3 is 250069680.
ext2

/boot/grub/设备映射

# cat /boot/grub/device.map
(hd0)   /dev/sda
(hd1)   /dev/sdb
(hd2)   /dev/sdc
(hd3)   /dev/sdd

答案1

我通过移除 raid 中的每个磁盘(逐个),将超级块和 MBR 归零,将它们添加回 raid 并等待重建来解决问题。

在我对 /dev/sdb 执行上述操作之后,问题就解决了,现在 grub-probe 只显示一行“grub-probe:info:找到数组 md/0(mdraid1x)。”,而不是像以前那样显示两行(看看问题)。

因此,它肯定与我最初关于索引错误的想法相反。我的想法是,这个索引应该存在于 raid 的每个磁盘中,这就是为什么我擦除 sdc 而 grub-probe 没有显示任何“grub-probe:信息:找到阵列 md/0 (mdraid1x)。”消息。

最终看起来只有其中一个必须拥有它,并且如果它在多个硬盘中,则会出现此错误“错误:为 RAID md/0 找到两个索引为 1 的磁盘”。

相关内容