无法安装 grub、分段错误、无法识别文件系统、多余的 RAID 成员、发现两个具有相同索引的磁盘 — Debian 7

无法安装 grub、分段错误、无法识别文件系统、多余的 RAID 成员、发现两个具有相同索引的磁盘 — Debian 7

最近,一台服务器从 A 地运到了 B 地,这趟长途旅程耗时六个月。由于在发货前没有贴标签,所以出了问题。是的,我知道——这是别人干的,但我要为此付出代价。

我必须挽救数据并且需要帮助!

系统之前启动正常,但现在无法启动(甚至无法进行 grub 救援 - BIOS 根本不起作用,而且我已尝试选择阵列中的每个单独成员)。

所以我从 USB 上的 Debian 7 ISO 启动,并进入救援模式。到目前为止,收效甚微。

我注意到的第一件事是阵列性能下降,正在重建备用阵列。这似乎是因为缺少了原始阵列成员之一。

首先,我在救援模式下启动后,阵列的当前状态的详细信息:

# mdadm --detail /dev/md127
/dev/md127:
        Version : 1.2
  Creation Time : Wed Nov  7 16:06:02 2012
     Raid Level : raid6
     Array Size : 26370335232 (25148.71 GiB 27003.22 GB)
  Used Dev Size : 2930037248 (2794.30 GiB 3000.36 GB)
   Raid Devices : 11
  Total Devices : 11
    Persistence : Superblock is persistent

    Update Time : Sat Sep 13 01:55:51 2014
          State : clean, degraded, recovering
 Active Devices : 10
Working Devices : 11
 Failed Devices : 0
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 512K

 Rebuild Status : 45% complete

           Name : media:0
           UUID : b1c40379:914e5d18:dddb893b:4dc5a28f
         Events : 2216394

    Number   Major   Minor   RaidDevice State
       0       8       82        0      active sync   /dev/sdf2
       1       8       97        1      active sync   /dev/sdg1
       2       8      129        2      active sync   /dev/sdi1
       3       8       33        3      active sync   /dev/sdc1
       4       8      161        4      active sync   /dev/sdk1
      12       8      192        5      spare rebuilding   /dev/sdm
       6       8      145        6      active sync   /dev/sdj1
       7       8       49        7      active sync   /dev/sdd1
       8       8       65        8      active sync   /dev/sde1
      10       8      224        9      active sync   /dev/sdo
      11       8      208       10      active sync   /dev/sdn

现在让我们尝试将 grub 安装到 /dev/md127。

# grub-install /dev/md127
error: found two disks with the index 9 for RAID md/0.
error: found two disks with the index 9 for RAID md/0.
error: superfluous RAID member (10 found).
error: superfluous RAID member (10 found).
error: found two disks with the index 9 for RAID md/0.
error: found two disks with the index 9 for RAID md/0.
error: superfluous RAID member (10 found).
error: superfluous RAID member (10 found).
error: found two disks with the index 9 for RAID md/0.
error: found two disks with the index 9 for RAID md/0.
error: superfluous RAID member (10 found).
error: superfluous RAID member (10 found).
error: found two disks with the index 9 for RAID md/0.
error: found two disks with the index 9 for RAID md/0.
error: superfluous RAID member (10 found).
error: superfluous RAID member (10 found).
error: found two disks with the index 9 for RAID md/0.
error: found two disks with the index 9 for RAID md/0.
error: superfluous RAID member (10 found).
error: superfluous RAID member (10 found).
Segmentation fault

哎呀,这可不妙。这些“发现两个带索引的磁盘”和“多余的 RAID 成员”是怎么回事?好吧,原来是磁盘混在一起了,系统中安装了几个额外的磁盘,因为不清楚它们是否属于 RAID 成员。

如果我们尝试安装到单个磁盘会发生什么? /dev/sdc 似乎是第一个成员(最低):

# grub-install /dev/sdc
error: found two disks with the index 9 for RAID md/0.
error: found two disks with the index 9 for RAID md/0.
error: superfluous RAID member (10 found).
error: superfluous RAID member (10 found).
error: found two disks with the index 9 for RAID md/0.
error: found two disks with the index 9 for RAID md/0.
error: superfluous RAID member (10 found).
error: superfluous RAID member (10 found).
error: found two disks with the index 9 for RAID md/0.
error: found two disks with the index 9 for RAID md/0.
error: superfluous RAID member (10 found).
error: superfluous RAID member (10 found).
error: found two disks with the index 9 for RAID md/0.
error: found two disks with the index 9 for RAID md/0.
error: superfluous RAID member (10 found).
error: superfluous RAID member (10 found).
error: found two disks with the index 9 for RAID md/0.
error: found two disks with the index 9 for RAID md/0.
error: superfluous RAID member (10 found).
error: superfluous RAID member (10 found).
/usr/sbin/grub-setup: warn: This GPT partition label has no BIOS Boot Partition; embedding won't be possible!.
/usr/sbin/grub-setup: error: embedding is not possible, but this is required for cross-disk install.

好的,现在我开始紧张了。我还尝试了其他成员盘,比如最后一个磁盘sdm:

# grub-install /dev/sdm
error: found two disks with the index 9 for RAID md/0.
error: found two disks with the index 9 for RAID md/0.
error: superfluous RAID member (10 found).
error: superfluous RAID member (10 found).
error: found two disks with the index 9 for RAID md/0.
error: found two disks with the index 9 for RAID md/0.
error: superfluous RAID member (10 found).
error: superfluous RAID member (10 found).
error: found two disks with the index 9 for RAID md/0.
error: found two disks with the index 9 for RAID md/0.
error: superfluous RAID member (10 found).
error: superfluous RAID member (10 found).
error: found two disks with the index 9 for RAID md/0.
error: found two disks with the index 9 for RAID md/0.
error: superfluous RAID member (10 found).
error: superfluous RAID member (10 found).
error: found two disks with the index 9 for RAID md/0.
error: found two disks with the index 9 for RAID md/0.
error: superfluous RAID member (10 found).
error: superfluous RAID member (10 found).
/usr/sbin/grub-setup: error: unable to identify a filesystem in hd12; safety check can't be performed.

现在我们又收到一个错误,无法识别文件系统。顺便说一下,这个 mdadm 阵列的文件系统是 XFS,而且它运行良好(谢天谢地)。

# mdadm --examine /dev/sd?
/dev/sda:
   MBR Magic : aa55
Partition[0] :     15633380 sectors at        13340 (type 0c)
/dev/sdb:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : b1c40379:914e5d18:dddb893b:4dc5a28f
           Name : media:0
  Creation Time : Wed Nov  7 16:06:02 2012
     Raid Level : raid6
   Raid Devices : 10

 Avail Dev Size : 5860271024 (2794.40 GiB 3000.46 GB)
     Array Size : 23440297984 (22354.41 GiB 24002.87 GB)
  Used Dev Size : 5860074496 (2794.30 GiB 3000.36 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 8042b1e3:d9e305aa:f53be8b4:b74cc247

    Update Time : Mon Nov 18 18:05:25 2013
       Checksum : 5762ae4a - correct
         Events : 2197822

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 9
   Array State : AAAAAAAAAA ('A' == active, '.' == missing)
/dev/sdc:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
/dev/sdd:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
/dev/sde:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
/dev/sdf:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
/dev/sdg:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
/dev/sdh:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
/dev/sdi:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
/dev/sdj:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
/dev/sdk:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
mdadm: No md superblock detected on /dev/sdl.
/dev/sdm:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x2
     Array UUID : b1c40379:914e5d18:dddb893b:4dc5a28f
           Name : media:0
  Creation Time : Wed Nov  7 16:06:02 2012
     Raid Level : raid6
   Raid Devices : 11

 Avail Dev Size : 5860271024 (2794.40 GiB 3000.46 GB)
     Array Size : 26370335232 (25148.71 GiB 27003.22 GB)
  Used Dev Size : 5860074496 (2794.30 GiB 3000.36 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
Recovery Offset : 2563782648 sectors
          State : clean
    Device UUID : c436476d:6e6dbc43:de4e9c83:d697fbf7

    Update Time : Sat Sep 13 02:03:42 2014
       Checksum : db87180b - correct
         Events : 2216444

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 5
   Array State : AAAAAAAAAAA ('A' == active, '.' == missing)
/dev/sdn:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : b1c40379:914e5d18:dddb893b:4dc5a28f
           Name : media:0
  Creation Time : Wed Nov  7 16:06:02 2012
     Raid Level : raid6
   Raid Devices : 11

 Avail Dev Size : 5860271024 (2794.40 GiB 3000.46 GB)
     Array Size : 26370335232 (25148.71 GiB 27003.22 GB)
  Used Dev Size : 5860074496 (2794.30 GiB 3000.36 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 235097a3:7c8a32b8:f1c73a25:9c149239

    Update Time : Sat Sep 13 02:03:42 2014
       Checksum : d0b20c55 - correct
         Events : 2216444

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 10
   Array State : AAAAAAAAAAA ('A' == active, '.' == missing)
/dev/sdo:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : b1c40379:914e5d18:dddb893b:4dc5a28f
           Name : media:0
  Creation Time : Wed Nov  7 16:06:02 2012
     Raid Level : raid6
   Raid Devices : 11

 Avail Dev Size : 5860271024 (2794.40 GiB 3000.46 GB)
     Array Size : 26370335232 (25148.71 GiB 27003.22 GB)
  Used Dev Size : 5860074496 (2794.30 GiB 3000.36 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : f382773b:08814775:542a5a1e:d2515115

    Update Time : Sat Sep 13 02:03:42 2014
       Checksum : fa85d548 - correct
         Events : 2216444

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 9
   Array State : AAAAAAAAAAA ('A' == active, '.' == missing)

在创建此 stackoverflow 请求之前,我运行了上述 mdadm 检查命令,发现磁盘 /dev/sdl 和磁盘 /dev/sdo 都显示“活动设备 9”。但磁盘 /dev/sdl 未被使用,更新时间较旧。我保存了输出:

/dev/sdl:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : b1c40379:914e5d18:dddb893b:4dc5a28f
           Name : media:0
  Creation Time : Wed Nov  7 16:06:02 2012
     Raid Level : raid6
   Raid Devices : 10

 Avail Dev Size : 5860271024 (2794.40 GiB 3000.46 GB)
     Array Size : 23440297984 (22354.41 GiB 24002.87 GB)
  Used Dev Size : 5860074496 (2794.30 GiB 3000.36 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 8e3499b6:b3baae34:af56fde9:f5d7bc87

    Update Time : Fri Nov 15 15:52:20 2013
       Checksum : 29fed1f5 - correct
         Events : 2183610

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 9
   Array State : AAAAAAAAAA ('A' == active, '.' == missing)

在创建此请求之前,我发出了 mdadm --zero-superblock /dev/sdl,该命令成功执行,并且该磁盘不再显示为“活动设备 9”,因此现在 mdadm --examine 输出中只有一个“磁盘 9”成员。

但是,grub-install 仍然抱怨“发现两个索引为 9 的磁盘”。

我真的需要一些帮助,我花了 12 个小时在谷歌上搜索试图解决这个问题,但无济于事。当然,这些数据没有备份,所以挽救阵列至关重要。

编辑添加

我注意到第三个“活动设备 9”,并将该超级块清零,从而消除了两个索引问题,然后我非常仔细地检查并找到了一个作为旧成员的额外磁盘,并将其也清零 - 从而消除了多余的磁盘。

现在 grub-install 不会报告这些错误。

但是,它现在报告分段错误。

# grub-install --recheck /dev/md0
Segmentation fault

因此,我安装了一个从未属于任何阵列的旧 750GB 驱动器,并将其绕过 LSI 9201 控制器直接安装到主板 SATA 上。我使用 parted 并删除了所有内容,然后设置了一个 bios_grub 分区。

Model: ATA ST3750330AS (scsi)
Disk /dev/sda: 750GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number  Start   End     Size    File system  Name     Flags
 1      1049kB  2097kB  1049kB               primary  bios_grub

然后我将 grub 安装到该设备 (sda) 并重新启动,在 BIOS 中选择该设备。GRUB 转储到救援模式并报告“没有此磁盘”。

我不知道下一步该做什么,需要一些帮助!我还想提一下,重启后,Debian 救援显示 /dev/md/0 而不是 /dev/md127。

编辑2

仍在努力解决这个问题,我计划做的是使每个物理磁盘的所有分区都相同,以纠正分段错误。

因此,它看起来像这样:

mdadm --manage /dev/md0 --fail /dev/disk
mdadm --manage /dev/md0 --remove /dev/disk

dd if=/dev/zero of=/dev/disk

sgdisk -R /dev/dest /dev/source
sgdisk -G /dev/dest

mdadm --manage /dev/md0 --add /dev/disk

我正在使用以下分区模式:

# parted /dev/sdc print
Model: ATA WDC WD30EFRX-68A (scsi)
Disk /dev/sdc: 3001GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt

Number  Start   End     Size    File system  Name       Flags
 1      1049kB  2097kB  1049kB               bios_grub  bios_grub
 2      2097kB  3001GB  3001GB               raid       raid

这意味着我要一次移除一个磁盘,运行上述过程,然后重新添加并允许其同步。这是一个大型 RAID 6 阵列,每次同步几乎需要一天时间,因此这将是一个漫长的过程。但我希望一切恢复正常,我认为这是我消除分段错误的最佳选择。

如果有人有建议请告诉我。

编辑3

更换每个磁盘时,我都会安装 grub 以确保其成功运行。以下是我在当前成员磁盘上收到的两条错误消息(在更换之前),这就是我认为我收到分段错误错误的原因:

# grub-install /dev/sdl
/usr/sbin/grub-setup: warn: This GPT partition label has no BIOS Boot Partition; embedding won't be possible!.
/usr/sbin/grub-setup: error: embedding is not possible, but this is required for cross-disk install.

# grub-install /dev/sdn
/usr/sbin/grub-setup: error: unable to identify a filesystem in hd13; safety check can't be performed.

当然,这种情况会在阵列中的所有旧成员磁盘上重复多次,但总是这两个错误之一。但在我执行上述步骤删除它们、设置新分区并将它们重新添加到阵列后,grub 确实安装正确。

现在只是时间问题了。我正在更新这篇文章,希望它能帮助其他人,并且希望几天后当所有磁盘都更换完毕时,我可以报告成功!

编辑4

这些操作是朋友建议的。它们没有起作用,我仍然需要帮助!

在此处输入图片描述

我确实需要任何人/每个人的帮助来帮助我让 GRUB 在这个盒子上运行。

有人还有其他建议和解决办法吗?

编辑5

Grub 错误报告:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=764798

答案1

您应该回答自己的问题并将其标记为答案,因为您似乎正在寻求解决方案。以下是一些建议:

  • 不要尝试在 mdadm 设备上安装 grub,这是行不通的,但实际的物理设备,即 /dev/sdn,看起来你后来发现了这一点
  • 首先让阵列在使用 CD 启动进入救援模式后顺利重建,并且只有在重建完成并报告正常后,才尝试对系统进行更多救援
  • 当你添加一个新磁盘来启动时,在安装 grub 时,最好使用--重新检查选项,这样它将扫描其他设备并添加它们,以便在启动时找到。

不久前,我写了一些有关设置 mdadm raid10 的文章,该文章允许从 raid 中的每个磁盘启动,它可能会提供一些有用的指示:

如何使用 3 或 4(或更多)磁盘软件 raid10 创建可启动的冗余 Debian 系统?

相关内容