我的服务器发邮件告诉我,其中一个磁盘无法读取某个块。因此,我决定在磁盘完全损坏之前将其替换。我添加了一个新磁盘并替换了损坏的磁盘。
sudo mdadm --manage /dev/md0 --add /dev/sdg1
sudo mdadm --manage /dev/md0 --replace /dev/sdb1 --with /dev/dbg1
同步后我想删除失败的 /dev/sdb1,并使用以下命令将其从阵列中删除:
sudo mdadm --manage /dev/md0 --remove /dev/sdb1
但是当我想从机箱中取出磁盘时,我首先取出另外 2 个磁盘,然后立即将它们放回原位。在此之后,我验证我的 raid 是否仍在工作,但事实并非如此。我尝试重新启动,希望它能自行修复。过去这从来都不是问题,但我也从未更换过磁盘。
在这样做不起作用之后,我看看该怎么做,并尝试重新添加光盘,但这没有帮助,组装也不起作用:
sudo mdadm --assamble --scan
只检测到 2 个磁盘,因此我尝试告诉它磁盘的名称
sudo mdadm -v -A /dev/md0 /dev/sda1 /dev/sdf1 /dev/sdc1 /dev/sdd1
但告诉我所有磁盘都忙:
sudo mdadm -v -A /dev/md0 /dev/sda1 /dev/sdf1 /dev/sdc1 /dev/sdd1
mdadm: looking for devices for /dev/md0
mdadm: /dev/sda1 is busy - skipping
mdadm: /dev/sdf1 is busy - skipping
mdadm: /dev/sdc1 is busy - skipping
mdadm: /dev/sdd1 is busy - skipping
sdg1 重启后获取 sdf1。
mdstat 似乎检测到磁盘正确(我再次插入 sdb1 希望它会有所帮助,并尝试了有和没有):
cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : inactive sdd1[3](S) sdb1[1](S) sdc1[2](S) sda1[0](S) sdf1[4](S)
14650670080 blocks super 1.2
unused devices: <none>
如果我仅查询磁盘/dev/sda1
并/dev/sdf1
显示相同的阵列状态AA..
sudo mdadm --query --examine /dev/sda1
/dev/sda1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 7c3e9d4e:6bad2afa:85cd55b4:43e43f56
Name : lianli:0 (local to host lianli)
Creation Time : Sat Oct 29 18:52:27 2016
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 5860268032 (2794.39 GiB 3000.46 GB)
Array Size : 8790402048 (8383.18 GiB 9001.37 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=0 sectors
State : clean
Device UUID : 3e912563:b10b74d0:a49faf2d:e14db558
Internal Bitmap : 8 sectors from superblock
Update Time : Sat Jan 9 10:06:33 2021
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : c7d96490 - correct
Events : 303045
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 0
Array State : AA.. ('A' == active, '.' == missing, 'R' == replacing)
sudo mdadm --query --examine /dev/sdd1
/dev/sdd1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 7c3e9d4e:6bad2afa:85cd55b4:43e43f56
Name : lianli:0 (local to host lianli)
Creation Time : Sat Oct 29 18:52:27 2016
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 5860268032 (2794.39 GiB 3000.46 GB)
Array Size : 8790402048 (8383.18 GiB 9001.37 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=0 sectors
State : clean
Device UUID : bf303286:5889dc0c:a6a1824a:4fe1ae03
Internal Bitmap : 8 sectors from superblock
Update Time : Sat Jan 9 10:05:58 2021
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : ef1f16fd - correct
Events : 303036
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 3
Array State : AA.A ('A' == active, '.' == missing, 'R' == replacing)
sudo mdadm --query --examine /dev/sdc1
/dev/sdc1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 7c3e9d4e:6bad2afa:85cd55b4:43e43f56
Name : lianli:0 (local to host lianli)
Creation Time : Sat Oct 29 18:52:27 2016
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 5860268032 (2794.39 GiB 3000.46 GB)
Array Size : 8790402048 (8383.18 GiB 9001.37 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=0 sectors
State : clean
Device UUID : b29aba8f:f92c2b65:d155a3a8:40f41859
Internal Bitmap : 8 sectors from superblock
Update Time : Sat Jan 9 10:04:33 2021
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : 47feb45 - correct
Events : 303013
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 2
Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
我会继续尝试,但目前我已经没有主意了,这也是我第一次更换磁盘阵列中的磁盘。希望有人能帮助我。
至少我还有一个备份,但我不想重置硬盘后才发现备份也不起作用......
更新: 添加所有磁盘后,我得到:
sudo mdadm -v -A /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sdf1
mdadm: looking for devices for /dev/md0
mdadm: /dev/sda1 is identified as a member of /dev/md0, slot 0.
mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot 1.
mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 2.
mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 3.
mdadm: /dev/sdf1 is identified as a member of /dev/md0, slot 1.
mdadm: added /dev/sdf1 to /dev/md0 as 1
mdadm: added /dev/sdc1 to /dev/md0 as 2 (possibly out of date)
mdadm: added /dev/sdd1 to /dev/md0 as 3 (possibly out of date)
mdadm: added /dev/sda1 to /dev/md0 as 0
mdadm: /dev/md0 assembled from 2 drives - not enough to start the array.
答案1
我找到了一个解决方案:
经过更多研究以及我在详细模式()中获得的“可能过时”的信息后sudo mdadm -v -A /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sdf1
,我找到了此页面:https://raid.wiki.kernel.org/index.php/RAID_Recovery
在“尝试使用 --force 进行组装”部分中,他们描述了如果事件计数差异低于 50,则使用强制。我的事件计数要低得多,所以我尝试了一下,raid 阵列再次连接并检测到其中一个磁盘仍然过期,但我希望它能够将其与其他磁盘的信息同步。所以可能是我丢失了一些数据,但我了解到如果我从阵列中移除了错误的磁盘,则要等到阵列处于 snyc 中...
我用来使我的团队再次工作的命令是:
sudo mdadm --stop /dev/md0
sudo mdadm -v -A --force /dev/md0 /dev/sda1 /dev/sdc1 /dev/sdd1 /dev/sdf1
更新:
一个驱动器可能没有添加,因此强制仅添加一个驱动器以使阵列恢复到可工作状态。 事件差异最大的设备必须稍后添加--re-add
:
sudo mdadm --manage /dev/md0 --re-add /dev/sdc1
现在我的阵列已恢复同步,我可以再次尝试移除有故障的硬盘。