我使用了@Paul(https://superuser.com/users/89018/paul) 回答中的说明通过移除磁盘来缩小 RAID 吗?但我觉得我可能犯了一个严重的错误。以下是细节……
我一直在用 Seagate Ironwolf 10TB 硬盘逐个升级 DS1813+ 中的 4TB 硬盘。我还剩一个硬盘要升级,但我认为与其在升级硬盘后重建阵列,然后再执行 Paul 的流程,不如在缩小过程中简单地从阵列中移除 4TB 硬盘,这样我就可以让它失效;不幸的是,事实并非如此,我担心现在对我的 22TB 数据来说可能已经太晚了。这是我的 PuTTY 会话:
ash-4.3# pvdisplay -C
PV VG Fmt Attr PSize PFree
/dev/md2 vg1 lvm2 a-- 25.44t 50.62g
ash-4.3# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md2 : active raid5 sdf3[13] sdh3[7] sdb3[9] sdg3[6] sde3[12] sdd3[11] sdc3[10] sda3[8]
27316073792 blocks super 1.2 level 5, 64k chunk, algorithm 2 [8/8] [UUUUUUUU]
md1 : active raid1 sdf2[5] sda2[1] sdb2[7] sdc2[2] sdd2[3] sde2[4] sdg2[6] sdh2[0]
2097088 blocks [8/8] [UUUUUUUU]
md0 : active raid1 sdf1[5] sda1[1] sdb1[7] sdc1[2] sdd1[3] sde1[4] sdg1[6] sdh1[0]
2490176 blocks [8/8] [UUUUUUUU]
unused devices: <none>
ash-4.3# exit
exit
Rob@Apophos-DS:~$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/md0 2.3G 940M 1.3G 43% /
none 2.0G 4.0K 2.0G 1% /dev
/tmp 2.0G 656K 2.0G 1% /tmp
/run 2.0G 9.8M 2.0G 1% /run
/dev/shm 2.0G 4.0K 2.0G 1% /dev/shm
none 4.0K 0 4.0K 0% /sys/fs/cgroup
cgmfs 100K 0 100K 0% /run/cgmanager/fs
/dev/vg1/volume_3 493G 749M 492G 1% /volume3
/dev/vg1/volume_1 3.4T 2.3T 1.1T 69% /volume1
/dev/vg1/volume_2 22T 19T 2.4T 89% /volume2
Rob@Apophos-DS:~$ pvdisplay -C
WARNING: Running as a non-root user. Functionality may be unavailable.
/var/lock/lvm/P_global:aux: open failed: Permission denied
Unable to obtain global lock.
Rob@Apophos-DS:~$ sudo su
Password:
ash-4.3# pvdisplay -C
PV VG Fmt Attr PSize PFree
/dev/md2 vg1 lvm2 a-- 25.44t 50.62g
ash-4.3# mdadm --grow -n5 /dev/md2
mdadm: max_devs [384] of [/dev/md2]
mdadm: this change will reduce the size of the array.
use --grow --array-size first to truncate array.
e.g. mdadm --grow /dev/md2 --array-size 15609185024
ash-4.3# mdadm --grow /dev/md2 --array-size 15609185024
ash-4.3# pvdisplay -C
PV VG Fmt Attr PSize PFree
/dev/md2 vg1 lvm2 a-- 25.44t 50.62g
ash-4.3# mdadm --grow -n6 /dev/md2
mdadm: max_devs [384] of [/dev/md2]
mdadm: Need to backup 2240K of critical section..
mdadm: /dev/md2: Cannot grow - need backup-file
ash-4.3# mdadm --grow -n5 /dev/md2
mdadm: max_devs [384] of [/dev/md2]
mdadm: Need to backup 1792K of critical section..
mdadm: /dev/md2: Cannot grow - need backup-file
ash-4.3# mdadm --grow -n5 /dev/md2 --backup-file /root/mdadm.md0.backup
mdadm: max_devs [384] of [/dev/md2]
mdadm: Need to backup 1792K of critical section..
ash-4.3# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md2 : active raid5 sdf3[13] sdh3[7] sdb3[9] sdg3[6] sde3[12] sdd3[11] sdc3[10] sda3[8]
15609185024 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/5] [UUUUU]
[>....................] reshape = 0.0% (216708/3902296256) finish=3000.8min speed=21670K/sec
md1 : active raid1 sdf2[5] sda2[1] sdb2[7] sdc2[2] sdd2[3] sde2[4] sdg2[6] sdh2[0]
2097088 blocks [8/8] [UUUUUUUU]
md0 : active raid1 sdf1[5] sda1[1] sdb1[7] sdc1[2] sdd1[3] sde1[4] sdg1[6] sdh1[0]
2490176 blocks [8/8] [UUUUUUUU]
unused devices: <none>
ash-4.3# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md2 : active raid5 sdf3[13] sdh3[7] sdb3[9] sdg3[6] sde3[12] sdd3[11] sdc3[10] sda3[8]
15609185024 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/5] [UUUUU]
[>....................] reshape = 0.0% (693820/3902296256) finish=3230.3min speed=20129K/sec
md1 : active raid1 sdf2[5] sda2[1] sdb2[7] sdc2[2] sdd2[3] sde2[4] sdg2[6] sdh2[0]
2097088 blocks [8/8] [UUUUUUUU]
md0 : active raid1 sdf1[5] sda1[1] sdb1[7] sdc1[2] sdd1[3] sde1[4] sdg1[6] sdh1[0]
2490176 blocks [8/8] [UUUUUUUU]
unused devices: <none>
ash-4.3# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md2 : active raid5 sdf3[13] sdh3[7] sdb3[9] sdg3[6] sde3[12] sdd3[11] sdc3[10] sda3[8]
15609185024 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/5] [UUUUU]
[>....................] reshape = 0.0% (1130368/3902296256) finish=6500.6min speed=10001K/sec
md1 : active raid1 sdf2[5] sda2[1] sdb2[7] sdc2[2] sdd2[3] sde2[4] sdg2[6] sdh2[0]
2097088 blocks [8/8] [UUUUUUUU]
md0 : active raid1 sdf1[5] sda1[1] sdb1[7] sdc1[2] sdd1[3] sde1[4] sdg1[6] sdh1[0]
2490176 blocks [8/8] [UUUUUUUU]
unused devices: <none>
ash-4.3# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md2 : active raid5 sdf3[13] sdh3[7] sdb3[9] sdg3[6] sde3[12] sdd3[11] sdc3[10] sda3[8]
15609185024 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/5] [UUUUU]
[>....................] reshape = 0.0% (1442368/3902296256) finish=6667.7min speed=9750K/sec
md1 : active raid1 sdf2[5] sda2[1] sdb2[7] sdc2[2] sdd2[3] sde2[4] sdg2[6] sdh2[0]
2097088 blocks [8/8] [UUUUUUUU]
md0 : active raid1 sdf1[5] sda1[1] sdb1[7] sdc1[2] sdd1[3] sde1[4] sdg1[6] sdh1[0]
2490176 blocks [8/8] [UUUUUUUU]
unused devices: <none>
ash-4.3# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md2 : active raid5 sdf3[13] sdh3[7] sdb3[9] sdg3[6] sde3[12] sdd3[11] sdc3[10] sda3[8]
15609185024 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/5] [UUUUU]
[>....................] reshape = 0.4% (18826624/3902296256) finish=6706.8min speed=9650K/sec
md1 : active raid1 sdf2[5] sda2[1] sdb2[7] sdc2[2] sdd2[3] sde2[4] sdg2[6] sdh2[0]
2097088 blocks [8/8] [UUUUUUUU]
md0 : active raid1 sdf1[5] sda1[1] sdb1[7] sdc1[2] sdd1[3] sde1[4] sdg1[6] sdh1[0]
2490176 blocks [8/8] [UUUUUUUU]
unused devices: <none>
ash-4.3#
Broadcast message from root@Apophos-DS
(unknown) at 22:16 ...
The system is going down for reboot NOW!
login as: Rob
[email protected]'s password:
Could not chdir to home directory /var/services/homes/Rob: No such file or directory
Rob@Apophos-DS:/$ sudo su
Password:
ash-4.3# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md1 : active raid1 sdh2[7] sdg2[6] sdf2[5] sde2[4] sdd2[3] sdc2[2] sdb2[1] sda2[0]
2097088 blocks [8/8] [UUUUUUUU]
[=====>...............] resync = 26.8% (563584/2097088) finish=2.4min speed=10314K/sec
md2 : active raid5 sdh3[7] sdb3[9] sdf3[13] sdg3[6] sde3[12] sdd3[11] sdc3[10] sda3[8]
15609185024 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/5] [UUUUU]
[>....................] reshape = 0.5% (19578240/3902296256) finish=10384.2min speed=6231K/sec
md0 : active raid1 sda1[1] sdb1[7] sdc1[2] sdd1[3] sde1[4] sdf1[5] sdg1[6] sdh1[0]
2490176 blocks [8/8] [UUUUUUUU]
unused devices: <none>
现在,有了背景故事和我的 PuTTY 的读数,我希望有人能告诉我如何自己解决问题。我认为我的问题(在没有充分预见、考虑和充分理解过程本身的情况下开始这个过程)有两个方面:我事先没有让剩下的最后 4TB 驱动器失效,因此软件是根据最小尺寸的 4TB 驱动器进行计算的(可能没有考虑到其他 7 个驱动器之间的 70TB 空间),也可能是我的 mdadm --grow 命令的不同 -n#。
ash-4.3# mdadm --grow -n5 /dev/md2
mdadm: max_devs [384] of [/dev/md2]
mdadm: this change will reduce the size of the array.
use --grow --array-size first to truncate array.
e.g. mdadm --grow /dev/md2 --array-size 15609185024
ash-4.3# mdadm --grow /dev/md2 --array-size 15609185024
ash-4.3# pvdisplay -C
PV VG Fmt Attr PSize PFree
/dev/md2 vg1 lvm2 a-- 25.44t 50.62g
ash-4.3# mdadm --grow -n6 /dev/md2
mdadm: max_devs [384] of [/dev/md2]
mdadm: Need to backup 2240K of critical section..
mdadm: /dev/md2: Cannot grow - need backup-file
ash-4.3# mdadm --grow -n5 /dev/md2
mdadm: max_devs [384] of [/dev/md2]
mdadm: Need to backup 1792K of critical section..
mdadm: /dev/md2: Cannot grow - need backup-file
ash-4.3# mdadm --grow -n5 /dev/md2 --backup-file /root/mdadm.md0.backup
mdadm: max_devs [384] of [/dev/md2]
mdadm: Need to backup 1792K of critical section..
这是 cat /proc/mdstat 的当前输出 - 我注意到 /dev/md2 仅显示 5U,而其他 mds 有 8U,这让我感到害怕,因为它们都是位于同一 8 个磁盘的 RAID 组上的卷:
ash-4.3# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md1 : active raid1 sdh2[7] sdg2[6] sdf2[5] sde2[4] sdd2[3] sdc2[2] sdb2[1] sda2[0]
2097088 blocks [8/8] [UUUUUUUU]
md2 : active raid5 sdh3[7] sdb3[9] sdf3[13] sdg3[6] sde3[12] sdd3[11] sdc3[10] sda3[8]
15609185024 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/5] [UUUUU]
[>....................] reshape = 1.2% (48599680/3902296256) finish=6495.2min speed=9888K/sec
md0 : active raid1 sda1[1] sdb1[7] sdc1[2] sdd1[3] sde1[4] sdf1[5] sdg1[6] sdh1[0]
2490176 blocks [8/8] [UUUUUUUU]
unused devices: <none>
至少我需要能够保存 /dev/vg1/volume_1。由于我没有触碰该卷,我希望这是可能的,但目前我还不知道,因为所有 3 个卷在 DSM 中都被列为“崩溃”。我希望(但不抱希望)一旦一致性检查完成,一切都会好起来。
任何知道 mdadm 的人,我迫切需要你的帮助!保罗,如果你在那里,我需要你的帮助!我知道我搞砸了,很有可能我失去了一切,但如果你能提出任何可能挽救我的方法,请帮帮我!
更新(2017 年 12 月 5 日):除了重塑继续进行之外,没有变化 - 现在已达到 17.77%。DSM 仍将所有卷显示为“崩溃(检查奇偶校验一致性 17.77%)”,而磁盘组显示“正在后台验证硬盘(检查奇偶校验一致性 17.77%)”。这是磁盘组的图像:
我认为我错过的关键步骤是运行mdadm /dev/md2 --fail /dev/sdf3 --remove /dev/sdf3
或手动移除驱动器 - 这会使剩余的 4TB 驱动器发生故障并将其从阵列中移除,从而使我得到一个 7 x 10TB 降级的 RAID 5 阵列。我现在的问题是 - 我应该等到阵列完成重塑后再移除 4TB 驱动器吗?还是我应该立即将其发生故障/移除?我的蜘蛛感应说在重建/重塑期间移除驱动器结果会很糟糕,因为这就是我一直被教导的,但我不知道在这种情况下这是否一定是正确的,因为 mdadm 试图仅根据剩余 4TB 驱动器的大小将 7 个驱动器的空间塞进 5 个驱动器中。
另外,如果有帮助的话,以下是输出mdadm -D /dev/md2
:
/dev/md2:
Version : 1.2
Creation Time : Wed Mar 5 22:45:07 2014
Raid Level : raid5
Array Size : 15609185024 (14886.08 GiB 15983.81 GB)
Used Dev Size : 3902296256 (3721.52 GiB 3995.95 GB)
Raid Devices : 5
Total Devices : 8
Persistence : Superblock is persistent
Update Time : Tue Dec 5 17:46:27 2017
State : clean, recovering
Active Devices : 8
Working Devices : 8
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
Reshape Status : 18% complete
Delta Devices : -3, (5->2)
Name : DS:2 (local to host DS)
UUID : UUID
Events : 153828
Number Major Minor RaidDevice State
7 8 115 0 active sync /dev/sdh3
8 8 3 1 active sync /dev/sda3
10 8 35 2 active sync /dev/sdc3
11 8 51 3 active sync /dev/sdd3
12 8 67 4 active sync /dev/sde3
6 8 99 5 active sync /dev/sdg3
9 8 19 7 active sync /dev/sdb3
13 8 83 6 active sync /dev/sdf3
让我担心的是,当阵列上的数据总大小超过 20TB 时,阵列大小却显示为 16TB。我现在不知道该怎么办。任何想法或经验都将不胜感激!