更换磁盘后 Raid 5 损坏

更换磁盘后 Raid 5 损坏

我的服务器发邮件告诉我,其中一个磁盘无法读取某个块。因此,我决定在磁盘完全损坏之前将其替换。我添加了一个新磁盘并替换了损坏的磁盘。

sudo mdadm --manage /dev/md0 --add /dev/sdg1
sudo mdadm --manage /dev/md0 --replace /dev/sdb1 --with /dev/dbg1

同步后我想删除失败的 /dev/sdb1,并使用以下命令将其从阵列中删除:

sudo mdadm --manage /dev/md0 --remove /dev/sdb1

但是当我想从机箱中取出磁盘时,我首先取出另外 2 个磁盘,然后立即将它们放回原位。在此之后,我验证我的 raid 是否仍在工作,但事实并非如此。我尝试重新启动,希望它能自行修复。过去这从来都不是问题,但我也从未更换过磁盘。

在这样做不起作用之后,我看看该怎么做,并尝试重新添加光盘,但这没有帮助,组装也不起作用:

sudo mdadm --assamble --scan

只检测到 2 个磁盘,因此我尝试告诉它磁盘的名称

sudo mdadm -v -A /dev/md0 /dev/sda1 /dev/sdf1 /dev/sdc1 /dev/sdd1

但告诉我所有磁盘都忙:

sudo mdadm -v -A /dev/md0 /dev/sda1 /dev/sdf1 /dev/sdc1 /dev/sdd1 
mdadm: looking for devices for /dev/md0
mdadm: /dev/sda1 is busy - skipping
mdadm: /dev/sdf1 is busy - skipping
mdadm: /dev/sdc1 is busy - skipping
mdadm: /dev/sdd1 is busy - skipping

sdg1 重启后获取 sdf1。

mdstat 似乎检测到磁盘正确(我再次插入 sdb1 希望它会有所帮助,并尝试了有和没有):

cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md0 : inactive sdd1[3](S) sdb1[1](S) sdc1[2](S) sda1[0](S) sdf1[4](S)
      14650670080 blocks super 1.2
       
unused devices: <none>

如果我仅查询磁盘/dev/sda1/dev/sdf1显示相同的阵列状态AA..

sudo mdadm --query --examine /dev/sda1 
/dev/sda1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 7c3e9d4e:6bad2afa:85cd55b4:43e43f56
           Name : lianli:0  (local to host lianli)
  Creation Time : Sat Oct 29 18:52:27 2016
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 5860268032 (2794.39 GiB 3000.46 GB)
     Array Size : 8790402048 (8383.18 GiB 9001.37 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=0 sectors
          State : clean
    Device UUID : 3e912563:b10b74d0:a49faf2d:e14db558

Internal Bitmap : 8 sectors from superblock
    Update Time : Sat Jan  9 10:06:33 2021
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : c7d96490 - correct
         Events : 303045

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AA.. ('A' == active, '.' == missing, 'R' == replacing)
sudo mdadm --query --examine /dev/sdd1 
/dev/sdd1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 7c3e9d4e:6bad2afa:85cd55b4:43e43f56
           Name : lianli:0  (local to host lianli)
  Creation Time : Sat Oct 29 18:52:27 2016
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 5860268032 (2794.39 GiB 3000.46 GB)
     Array Size : 8790402048 (8383.18 GiB 9001.37 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=0 sectors
          State : clean
    Device UUID : bf303286:5889dc0c:a6a1824a:4fe1ae03

Internal Bitmap : 8 sectors from superblock
    Update Time : Sat Jan  9 10:05:58 2021
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : ef1f16fd - correct
         Events : 303036

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 3
   Array State : AA.A ('A' == active, '.' == missing, 'R' == replacing)
sudo mdadm --query --examine /dev/sdc1 
/dev/sdc1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 7c3e9d4e:6bad2afa:85cd55b4:43e43f56
           Name : lianli:0  (local to host lianli)
  Creation Time : Sat Oct 29 18:52:27 2016
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 5860268032 (2794.39 GiB 3000.46 GB)
     Array Size : 8790402048 (8383.18 GiB 9001.37 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=0 sectors
          State : clean
    Device UUID : b29aba8f:f92c2b65:d155a3a8:40f41859

Internal Bitmap : 8 sectors from superblock
    Update Time : Sat Jan  9 10:04:33 2021
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : 47feb45 - correct
         Events : 303013

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)

我会继续尝试,但目前我已经没有主意了,这也是我第一次更换磁盘阵列中的磁盘。希望有人能帮助我。

至少我还有一个备份,但我不想重置硬盘后才发现备份也不起作用......

更新: 添加所有磁盘后,我得到:

sudo mdadm -v -A /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sdf1
mdadm: looking for devices for /dev/md0
mdadm: /dev/sda1 is identified as a member of /dev/md0, slot 0.
mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot 1.
mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 2.
mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 3.
mdadm: /dev/sdf1 is identified as a member of /dev/md0, slot 1.
mdadm: added /dev/sdf1 to /dev/md0 as 1
mdadm: added /dev/sdc1 to /dev/md0 as 2 (possibly out of date)
mdadm: added /dev/sdd1 to /dev/md0 as 3 (possibly out of date)
mdadm: added /dev/sda1 to /dev/md0 as 0
mdadm: /dev/md0 assembled from 2 drives - not enough to start the array.

答案1

我找到了一个解决方案:

经过更多研究以及我在详细模式()中获得的“可能过时”的信息后sudo mdadm -v -A /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sdf1,我找到了此页面:https://raid.wiki.kernel.org/index.php/RAID_Recovery

在“尝试使用 --force 进行组装”部分中,他们描述了如果事件计数差异低于 50,则使用强制。我的事件计数要低得多,所以我尝试了一下,raid 阵列再次连接并检测到其中一个磁盘仍然过期,但我希望它能够将其与其他磁盘的信息同步。所以可能是我丢失了一些数据,但我了解到如果我从阵列中移除了错误的磁盘,则要等到阵列处于 snyc 中...

我用来使我的团队再次工作的命令是:

sudo mdadm --stop /dev/md0
sudo mdadm -v -A --force /dev/md0 /dev/sda1 /dev/sdc1 /dev/sdd1 /dev/sdf1

更新: 一个驱动器可能没有添加,因此强制仅添加一个驱动器以使阵列恢复到可工作状态。 事件差异最大的设备必须稍后添加--re-add

sudo mdadm --manage /dev/md0 --re-add /dev/sdc1

现在我的阵列已恢复同步,我可以再次尝试移除有故障的硬盘。

相关内容