我的 raid5 因坏扇区崩溃后需要帮助。以前我可以用
# mdadm --assemble --force -v /dev/md0 /dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sdh1 /dev/sdi1
但是在进行备份时它再次崩溃,现在我无法再重新组装它,因为两个磁盘已经过期:
# mdadm --assemble --force -v /dev/md0 /dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sdh1 /dev/sdi1
mdadm: looking for devices for /dev/md0
mdadm: /dev/sde1 is identified as a member of /dev/md0, slot -1.
mdadm: /dev/sdf1 is identified as a member of /dev/md0, slot 0.
mdadm: /dev/sdg1 is identified as a member of /dev/md0, slot 1.
mdadm: /dev/sdh1 is identified as a member of /dev/md0, slot 2.
mdadm: /dev/sdi1 is identified as a member of /dev/md0, slot 3.
mdadm: added /dev/sdg1 to /dev/md0 as 1
mdadm: added /dev/sdh1 to /dev/md0 as 2 (possibly out of date)
mdadm: added /dev/sdi1 to /dev/md0 as 3 (possibly out of date)
mdadm: failed to add /dev/sde1 to /dev/md0: Device or resource busy
mdadm: added /dev/sdf1 to /dev/md0 as 0
mdadm: /dev/md0 assembled from 2 drives - not enough to start the array.
正如您所看到的,这里只有两个设备sdf1和可持续发展目标1已设置最新更新时间(+备用,但尚未完成重建)。
mdadm --examine /dev/sd[efghi]1 | egrep 'dev|Update|Role|State|Chunk Size'
/dev/sde1:
State : clean
Update Time : Sun May 10 04:15:59 2015
Chunk Size : 512K
Device Role : spare
Array State : AA.. ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdf1:
State : clean
Update Time : Sun May 10 04:15:59 2015
Chunk Size : 512K
Device Role : Active device 0
Array State : AA.. ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdg1:
State : clean
Update Time : Sun May 10 04:15:59 2015
Chunk Size : 512K
Device Role : Active device 1
Array State : AA.. ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdh1:
State : clean
Update Time : Sat May 9 23:10:06 2015
Chunk Size : 512K
Device Role : Active device 2
Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdi1:
State : active
Update Time : Sat Dec 7 12:43:00 2013
Chunk Size : 512K
Device Role : Active device 3
Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
自从其他两台设备更新以来,我没有更改任何关于突袭的数据sdh1和sdi1。我不需要重新同步所有数据,我只需要备份最后的文件,所以我只需要最后一次以只读方式挂载它。
有什么办法吗?也许我可以强制它忽略过期的?我想知道为什么- 力量不再起作用...
raid5设备的完整信息:
# mdadm --examine /dev/sd[efghi]1
/dev/sde1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : a87dfb70:2ecd03f9:ee62b434:fc637218
Name : m08002-lin:data2gb
Creation Time : Mon Sep 2 12:48:02 2013
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
Array Size : 5860145664 (5588.67 GiB 6000.79 GB)
Used Dev Size : 3906763776 (1862.89 GiB 2000.26 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262064 sectors, after=1024 sectors
State : clean
Device UUID : 624d3873:7970ba27:da0f511a:45367bdd
Update Time : Sun May 10 04:15:59 2015
Checksum : 599a5235 - correct
Events : 108804
Layout : left-symmetric
Chunk Size : 512K
Device Role : spare
Array State : AA.. ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdf1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : a87dfb70:2ecd03f9:ee62b434:fc637218
Name : m08002-lin:data2gb
Creation Time : Mon Sep 2 12:48:02 2013
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
Array Size : 5860145664 (5588.67 GiB 6000.79 GB)
Used Dev Size : 3906763776 (1862.89 GiB 2000.26 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262064 sectors, after=1024 sectors
State : clean
Device UUID : 4827a499:12980366:0de13b87:541a9b5e
Update Time : Sun May 10 04:15:59 2015
Checksum : ac5a08f2 - correct
Events : 108804
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 0
Array State : AA.. ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdg1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : a87dfb70:2ecd03f9:ee62b434:fc637218
Name : m08002-lin:data2gb
Creation Time : Mon Sep 2 12:48:02 2013
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
Array Size : 5860145664 (5588.67 GiB 6000.79 GB)
Used Dev Size : 3906763776 (1862.89 GiB 2000.26 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262064 sectors, after=1024 sectors
State : clean
Device UUID : 8c959d62:3b9c1eac:6f8d7d92:13454ab4
Update Time : Sun May 10 04:15:59 2015
Checksum : 1c5f5282 - correct
Events : 108804
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 1
Array State : AA.. ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdh1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : a87dfb70:2ecd03f9:ee62b434:fc637218
Name : m08002-lin:data2gb
Creation Time : Mon Sep 2 12:48:02 2013
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
Array Size : 5860145664 (5588.67 GiB 6000.79 GB)
Used Dev Size : 3906763776 (1862.89 GiB 2000.26 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262064 sectors, after=1024 sectors
State : clean
Device UUID : df6b9eab:ea3c6e3a:47858e6d:1eb0783d
Update Time : Sat May 9 23:10:06 2015
Checksum : 57f1e4b2 - correct
Events : 108796
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 2
Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdi1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : a87dfb70:2ecd03f9:ee62b434:fc637218
Name : m08002-lin:data2gb
Creation Time : Mon Sep 2 12:48:02 2013
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
Array Size : 5860145664 (5588.67 GiB 6000.79 GB)
Used Dev Size : 3906763776 (1862.89 GiB 2000.26 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262064 sectors, after=1024 sectors
State : active
Device UUID : fbc64ec7:a97a36c1:69cc3812:37878af1
Update Time : Sat Dec 7 12:43:00 2013
Checksum : 507acca4 - correct
Events : 83904
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 3
Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
答案1
我可以访问我的团队并备份我的文件!!
首先,我移除了备用设备。然后我检查了其他设备。有两个设备有坏扇区,我知道当 raid 想要读取或写入坏扇区时,它在重新同步时会再次崩溃。所以我决定清除 raid,并建立一个不同步但仍然可以访问所有先前数据的降级 raid。我清除了所有超级块,并创建了 raid,只留下一个损坏的设备
mdadm --stop /dev/md127 (or /dev/md0)
mdadm --zero-superblock /dev/sd[efgh]1
mdadm --create /dev/md127 --level=5 --raid-devices=4 --assume-clean /dev/sde1 /dev/sdf1 /dev/sdg1 missing
重要的是原始 raid 设备的原始顺序,并以相同的顺序创建新的 raid,以及使用参数--假设清洁!您可以通过以下方式获取原始订单
mdadm --examine /dev/sd[efghi]1
看一下设备角色。
使用assume-clean重新创建raid后,我可以挂载md127并直接访问所有数据,而无需执行任何其他操作。
答案2
我遇到了同样的问题,--zero-superblock 和 --create 导致完全乱码,因为它自动猜测了错误的分区大小,因此其上的 luks 分区是不可读的。
下面的答案救了我:https://serverfault.com/a/1147308/85383
诀窍是将 md_mod.start_dirty_degraded 内核参数设置为 1。这显然仍会丢失一些在阵列关闭时同步的块,但除此之外,这对我来说很有效。无论如何,这是一个 ext3 分区,因此相当可靠。