运行 Fedora 32 连接到 4 端口 e-sata。其中一个驱动器明显出现故障,日志中显示以下消息:
smartd[1169]: Device: /dev/sdd [SAT], FAILED SMART self-check. BACK UP DATA NOW!
这里是mdadm
:
mdadm -D /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Fri Mar 13 16:46:35 2020
Raid Level : raid10
Array Size : 2930010928 (2794.28 GiB 3000.33 GB)
Used Dev Size : 1465005464 (1397.14 GiB 1500.17 GB)
Raid Devices : 4
Total Devices : 2
Persistence : Superblock is persistent
Update Time : Mon Jun 8 17:33:23 2020
State : clean, degraded
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Layout : near=2
Chunk Size : 8K
Consistency Policy : resync
Name : ourserver:0 (local to host ourserver)
UUID : 88b9fcb6:52d0f235:849bd9d6:c079cfc8
Events : 898705
Number Major Minor RaidDevice State
0 8 1 0 active sync set-A /dev/sda1
- 0 0 1 removed
3 8 49 2 active sync set-A /dev/sdd1
- 0 0 3 removed
我不明白的是,我们的 RAID10 中的另外 2 个驱动器发生了什么?
lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 1.4T 0 disk
└─sda1 8:1 0 1.4T 0 part
└─md0 9:0 0 2.7T 0 raid10
sdb 8:16 0 1.4T 0 disk
└─sdb1 8:17 0 1.4T 0 part
sdc 8:32 0 1.8T 0 disk
└─sdc1 8:33 0 1.8T 0 part
sdd 8:48 0 1.4T 0 disk
└─sdd1 8:49 0 1.4T 0 part
└─md0 9:0 0 2.7T 0 raid10
和:
blkid
/dev/sda1: UUID="88b9fcb6-52d0-f235-849b-d9d6c079cfc8" UUID_SUB="7df3d233-060a-aac3-04eb-9f3a65a9119e" LABEL="ourserver:0" TYPE="linux_raid_member" PARTUUID="0001b5c0-01"
/dev/sdb1: UUID="88b9fcb6-52d0-f235-849b-d9d6c079cfc8" UUID_SUB="64e3cedc-90db-e299-d786-7d096896f28f" LABEL="ourserver:0" TYPE="linux_raid_member" PARTUUID="00ff416d-01"
/dev/sdc1: UUID="88b9fcb6-52d0-f235-849b-d9d6c079cfc8" UUID_SUB="6d0134e3-1358-acfd-9c86-2967aec370c2" LABEL="ourserver:0" TYPE="linux_raid_member" PARTUUID="7da9b00e-01"
/dev/sdd1: UUID="88b9fcb6-52d0-f235-849b-d9d6c079cfc8" UUID_SUB="b1dd6f8b-a8e4-efa7-72b7-f987e71edeb2" LABEL="ourserver:0" TYPE="linux_raid_member" PARTUUID="b3de33a7-b2ea-f24e-903f-bae80136d543"
cat /proc/mdstat
Personalities : [raid10]
md0 : active raid10 sda1[0] sdd1[3]
2930010928 blocks super 1.2 8K chunks 2 near-copies [4/2] [U_U_]
unused devices: <none>
最初我使用这两个命令来创建 RAID10:
mdadm -E /dev/sda1 /dev/sdb1 /dev/sdd1 /dev/sdg1
mdadm --grow /dev/md0 --level=10 --backup-file=/home/backup-md0 --raid-devices=4 --add /dev/sdb1 /dev/sdd1 /dev/sdg1
经过几次重启后,/dev/sdX
(X
驱动器号在哪里)约定发生了变化。目前我没有文件mdadm.conf
,我跑去mdadm --assemble --force /dev/md0 /dev/sd[abcd]1
至少找回数据,这就是我的方法,/dev/sdb
并且/dev/sdc
不再有 RAID10 类型,并且没有 md0 /dev/sdb1
(/dev/sdc1
来自lsblk
上面的命令)。我怎样才能至少将另外 2 个驱动器/dev/sdb
和/dev/sdc
恢复到 RAID10,然后一直失败,/dev/sdd
直到我得到替换?或者有更好的方法吗?
您可以看到fdisk -l
2 个驱动器被格式化为 RAID10 的一部分:
Disk /dev/sda: 1.37 TiB, 1500301910016 bytes, 2930277168 sectors
Disk model: ST31500341AS
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x0001b5c0
Device Boot Start End Sectors Size Id Type
/dev/sda1 2048 2930277167 2930275120 1.4T fd Linux raid autodetect
Disk /dev/sdb: 1.37 TiB, 1500301910016 bytes, 2930277168 sectors
Disk model: ST31500341AS
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x00ff416d
Device Boot Start End Sectors Size Id Type
/dev/sdb1 2048 2930277167 2930275120 1.4T fd Linux raid autodetect
Disk /dev/sdc: 1.84 TiB, 2000398934016 bytes, 3907029168 sectors
Disk model: ST2000DM001-1ER1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0x7da9b00e
Device Boot Start End Sectors Size Id Type
/dev/sdc1 2048 3907029167 3907027120 1.8T fd Linux raid autodetect
Disk /dev/sdd: 1.37 TiB, 1500301910016 bytes, 2930277168 sectors
Disk model: ST31500341AS
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: DC9A2601-CFE8-4ADD-85CD-FCBEBFCD8FAF
Device Start End Sectors Size Type
/dev/sdd1 34 2930277134 2930277101 1.4T Linux RAID
检查所有 4 个驱动器后发现它们均处于活动状态:
mdadm --examine /dev/sda1
/dev/sda1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 88b9fcb6:52d0f235:849bd9d6:c079cfc8
Name : ourserver :0 (local to host ourserver )
Creation Time : Fri Mar 13 16:46:35 2020
Raid Level : raid10
Raid Devices : 4
Avail Dev Size : 2930010944 (1397.14 GiB 1500.17 GB)
Array Size : 2930010928 (2794.28 GiB 3000.33 GB)
Used Dev Size : 2930010928 (1397.14 GiB 1500.17 GB)
Data Offset : 264176 sectors
Super Offset : 8 sectors
Unused Space : before=264096 sectors, after=16 sectors
State : clean
Device UUID : 7df3d233:060aaac3:04eb9f3a:65a9119e
Update Time : Mon Jun 8 17:33:23 2020
Bad Block Log : 512 entries available at offset 16 sectors
Checksum : 6ad0f3f7 - correct
Events : 898705
Layout : near=2
Chunk Size : 8K
Device Role : Active device 0
Array State : A.A. ('A' == active, '.' == missing, 'R' == replacing)
mdadm --examine /dev/sdb1
/dev/sdb1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 88b9fcb6:52d0f235:849bd9d6:c079cfc8
Name : ourserver :0 (local to host ourserver )
Creation Time : Fri Mar 13 16:46:35 2020
Raid Level : raid10
Raid Devices : 4
Avail Dev Size : 2930010944 (1397.14 GiB 1500.17 GB)
Array Size : 2930010928 (2794.28 GiB 3000.33 GB)
Used Dev Size : 2930010928 (1397.14 GiB 1500.17 GB)
Data Offset : 264176 sectors
Super Offset : 8 sectors
Unused Space : before=263896 sectors, after=16 sectors
State : clean
Device UUID : 64e3cedc:90dbe299:d7867d09:6896f28f
Update Time : Wed Mar 18 11:50:09 2020
Bad Block Log : 512 entries available at offset 264 sectors
Checksum : aa48b164 - correct
Events : 37929
Layout : near=2
Chunk Size : 8K
Device Role : Active device 3
Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
mdadm --examine /dev/sdc1
/dev/sdc1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 88b9fcb6:52d0f235:849bd9d6:c079cfc8
Name : ourserver :0 (local to host ourserver )
Creation Time : Fri Mar 13 16:46:35 2020
Raid Level : raid10
Raid Devices : 4
Avail Dev Size : 3906762944 (1862.89 GiB 2000.26 GB)
Array Size : 2930010928 (2794.28 GiB 3000.33 GB)
Used Dev Size : 2930010928 (1397.14 GiB 1500.17 GB)
Data Offset : 264176 sectors
Super Offset : 8 sectors
Unused Space : before=263896 sectors, after=976752016 sectors
State : active
Device UUID : 6d0134e3:1358acfd:9c862967:aec370c2
Update Time : Sun May 10 16:22:39 2020
Bad Block Log : 512 entries available at offset 264 sectors
Checksum : df218e12 - correct
Events : 97380
Layout : near=2
Chunk Size : 8K
Device Role : Active device 1
Array State : AAA. ('A' == active, '.' == missing, 'R' == replacing)
mdadm --examine /dev/sdd1
/dev/sdd1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 88b9fcb6:52d0f235:849bd9d6:c079cfc8
Name : ourserver :0 (local to host ourserver )
Creation Time : Fri Mar 13 16:46:35 2020
Raid Level : raid10
Raid Devices : 4
Avail Dev Size : 2930012925 (1397.14 GiB 1500.17 GB)
Array Size : 2930010928 (2794.28 GiB 3000.33 GB)
Used Dev Size : 2930010928 (1397.14 GiB 1500.17 GB)
Data Offset : 264176 sectors
Super Offset : 8 sectors
Unused Space : before=263896 sectors, after=1997 sectors
State : clean
Device UUID : b1dd6f8b:a8e4efa7:72b7f987:e71edeb2
Update Time : Mon Jun 8 17:33:23 2020
Bad Block Log : 512 entries available at offset 264 sectors
Checksum : 8da0376 - correct
Events : 898705
Layout : near=2
Chunk Size : 8K
Device Role : Active device 2
Array State : A.A. ('A' == active, '.' == missing, 'R' == replacing)
我可以尝试提到的--force
和选项吗--assemble
此用户或者我可以尝试一下这个--replace
选项这里提到?
编辑:现在我在重新同步后看到了这一点:
mdadm --detail /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Fri Mar 13 16:46:35 2020
Raid Level : raid10
Array Size : 2930010928 (2794.28 GiB 3000.33 GB)
Used Dev Size : 1465005464 (1397.14 GiB 1500.17 GB)
Raid Devices : 4
Total Devices : 4
Persistence : Superblock is persistent
Update Time : Tue Jun 9 15:51:31 2020
State : clean, degraded
Active Devices : 3
Working Devices : 4
Failed Devices : 0
Spare Devices : 1
Layout : near=2
Chunk Size : 8K
Consistency Policy : resync
Name : ourserver:0 (local to host ourserver)
UUID : 88b9fcb6:52d0f235:849bd9d6:c079cfc8
Events : 1083817
Number Major Minor RaidDevice State
0 8 81 0 active sync set-A /dev/sdf1
4 8 33 1 active sync set-B /dev/sdc1
3 8 17 2 active sync set-A /dev/sdb1
- 0 0 3 removed
5 8 1 - spare /dev/sda1
cat /proc/mdstat
Personalities : [raid10]
md0 : active raid10 sda1[5](S)(R) sdf1[0] sdb1[3] sdc1[4]
2930010928 blocks super 1.2 8K chunks 2 near-copies [4/3] [UUU_]
unused devices: <none>
现在我在日志中看到了这一点:
Jun 9 15:51:31 ourserver kernel: md: recovery of RAID array md0
Jun 9 15:51:31 ourserver kernel: md/raid10:md0: insufficient working devices for recovery.
Jun 9 15:51:31 ourserver kernel: md: md0: recovery interrupted.
Jun 9 15:51:31 ourserver kernel: md: super_written gets error=10
Jun 9 15:53:23 ourserver kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
尝试失败的/dev/sdb
结果是:
mdadm --manage /dev/md0 --fail /dev/sdb1
mdadm: set device faulty failed for /dev/sdb1: Device or resource busy
如何提升备用驱动器并使 /dev/sdb 失效?
答案1
您实际上正在无冗余地运行和磁盘即将出现故障。
在做任何事情之前,请先备份!ddrescue /dev/sdd </dev/anotherdisk>
如果你有很多文件需要备份,我建议首先通过以下/dev/anotherdisk
方式获取故障磁盘的块级副本:额外的磁盘(甚至是 USB 磁盘)。
在拥有以后两个都文件和块级备份,您可以尝试通过发出以下命令来挽救阵列:
mdadm /dev/md0 --add /dev/sdb /dev/sdc
但是,请强烈考虑完全重新创建数组,因为您使用的块大小非常小(8K),这将严重损害性能(良好的默认块大小为 512K)。
更新:我刚刚注意到你通过强制组装进一步损坏了阵列和设置sda
为备用。此外,还出现了多余的磁盘sdf
。通过强制使用这种过期磁盘组装阵列,您可能失去了恢复阵列的任何机会。我强烈建议您联系合适的专家。