我已经与这个问题斗争有一段时间了。
我有一个包含 3 个磁盘的逻辑卷,分别为 1.5TB、2TB 和 3TB。1.5TB 驱动器出现故障。大量 I/O 错误和坏扇区。我启动 pvmove 将故障驱动器上的现有盘区移动到 3TB 驱动器(剩余空间足够)。我移动了 99% 的盘区,但最后一个百分点似乎无法读取。读取失败,pvmove 退出。
当前状态如下:
root@server:~# pvdisplay
/dev/sdd: read failed after 0 of 4096 at 0: Input/output error
/dev/sdd: read failed after 0 of 4096 at 1500301819904: Input/output error
/dev/sdd: read failed after 0 of 4096 at 1500301901824: Input/output error
/dev/sdd: read failed after 0 of 4096 at 4096: Input/output error
/dev/sdd1: read failed after 0 of 4096 at 1500300771328: Input/output error
/dev/sdd1: read failed after 0 of 4096 at 1500300853248: Input/output error
/dev/sdd1: read failed after 0 of 4096 at 0: Input/output error
/dev/sdd1: read failed after 0 of 4096 at 4096: Input/output error
Couldn't find device with uuid hFhfbQ-4cuW-CSlE-qhfO-GNl8-Jvt7-4nZTWK.
--- Physical volume ---
PV Name /dev/sda # old, working drive
VG Name lvm_group1
PV Size 1.82 TiB / not usable 1.09 MiB
Allocatable yes (but full)
PE Size 4.00 MiB
Total PE 476932
Free PE 0
Allocated PE 476932
PV UUID FEoDYU-Lhjf-FdI1-Ei5p-koue-PIma-TGvs9A
--- Physical volume ---
PV Name /dev/sdd1 # old failing drive
VG Name lvm_group1
PV Size 1.36 TiB / not usable 2.40 MiB
Allocatable NO
PE Size 4.00 MiB
Total PE 357699
Free PE 357600
Allocated PE 99
PV UUID hFhfbQ-4cuW-CSlE-qhfO-GNl8-Jvt7-4nZTWK
--- Physical volume ---
PV Name /dev/sdf # new drive
VG Name lvm_group1
PV Size 2.73 TiB / not usable 4.46 MiB
Allocatable yes
PE Size 4.00 MiB
Total PE 715396
Free PE 357746
Allocated PE 357650
PV UUID qs4BVK-PAPv-I1DG-x5wJ-dRNq-vhBE-wQeJL6
pvmove 的说法如下:
root@server:~# pvmove /dev/sdd1:335950-336500 /dev/sdf --verbose
Finding volume group "lvm_group1"
Archiving volume group "lvm_group1" metadata (seqno 93).
Creating logical volume pvmove0
Moving 50 extents of logical volume lvm_group1/cryptex
Found volume group "lvm_group1"
activation/volume_list configuration setting not defined: Checking only host tags for lvm_group1/cryptex
Updating volume group metadata
Found volume group "lvm_group1"
Found volume group "lvm_group1"
Creating lvm_group1-pvmove0
Loading lvm_group1-pvmove0 table (253:2)
Loading lvm_group1-cryptex table (253:0)
Suspending lvm_group1-cryptex (253:0) with device flush
Suspending lvm_group1-pvmove0 (253:2) with device flush
Found volume group "lvm_group1"
activation/volume_list configuration setting not defined: Checking only host tags for lvm_group1/pvmove0
Resuming lvm_group1-pvmove0 (253:2)
Found volume group "lvm_group1"
Loading lvm_group1-pvmove0 table (253:2)
Suppressed lvm_group1-pvmove0 identical table reload.
Resuming lvm_group1-cryptex (253:0)
Creating volume group backup "/etc/lvm/backup/lvm_group1" (seqno 94).
Checking progress before waiting every 15 seconds
/dev/sdd1: Moved: 4.0%
/dev/sdd1: read failed after 0 of 4096 at 0: Input/output error
No physical volume label read from /dev/sdd1
Physical volume /dev/sdd1 not found
ABORTING: Can't reread PV /dev/sdd1
ABORTING: Can't reread VG for /dev/sdd1
故障驱动器上只剩下 99 个区。我可以接受丢失这些数据 - 我只想拔出这个驱动器并将其扔掉,而不会丢失其他驱动器上的数据。
因此我尝试了 pvremove:
root@server:~# pvremove /dev/sdd1
/dev/sdd1: read failed after 0 of 4096 at 1500300771328: Input/output error
/dev/sdd1: read failed after 0 of 4096 at 1500300853248: Input/output error
/dev/sdd1: read failed after 0 of 4096 at 0: Input/output error
/dev/sdd1: read failed after 0 of 4096 at 4096: Input/output error
No physical volume label read from /dev/sdd1
Physical Volume /dev/sdd1 not found
然后是 vgreduce:
root@server:~# vgreduce lvm_group1 --removemissing
/dev/sdd: read failed after 0 of 4096 at 0: Input/output error
/dev/sdd: read failed after 0 of 4096 at 1500301819904: Input/output error
/dev/sdd: read failed after 0 of 4096 at 1500301901824: Input/output error
/dev/sdd: read failed after 0 of 4096 at 4096: Input/output error
/dev/sdd1: read failed after 0 of 4096 at 1500300771328: Input/output error
/dev/sdd1: read failed after 0 of 4096 at 1500300853248: Input/output error
/dev/sdd1: read failed after 0 of 4096 at 0: Input/output error
/dev/sdd1: read failed after 0 of 4096 at 4096: Input/output error
Couldn't find device with uuid hFhfbQ-4cuW-CSlE-qhfO-GNl8-Jvt7-4nZTWK.
WARNING: Partial LV cryptex needs to be repaired or removed.
WARNING: Partial LV pvmove0 needs to be repaired or removed.
There are still partial LVs in VG lvm_group1.
To remove them unconditionally use: vgreduce --removemissing --force.
Proceeding to remove empty missing PVs.
pvdisplay 仍然显示驱动器故障......
有任何想法吗?
答案1
最后我通过手动编辑解决了这个问题/etc/lvm/backup/lvm_group1
。
如果其他人遇到此问题,请按照以下步骤操作:
- 我从服务器上移除了坏掉的硬盘
- 我执行了
vgreduce lvm_group1 --removemissing --force
- 我从配置中删除了死驱动器
- 我在“好”驱动器上添加了另一个条带来代替坏驱动器上无法读取的范围。
- 我执行了
vgcfgrestore -f edited_config_file.cfg lvm_group1
- 重启
- 瞧!驱动器可见并且可以安装。
我花了 4 天时间学习 LVM 的来龙去脉来解决这个问题......
到目前为止看起来不错。没有错误。祝您露营愉快。
答案2
如果您可以暂时停止 LVM(并且关闭底层 LUKS 容器(如果使用)),则替代解决方案是使用 GNU 将尽可能多的 PV(或底层 LUKS 容器)复制到好磁盘,ddrescue
并在重新启动 LVM 之前删除旧磁盘。
虽然我喜欢 Sniku 的 LVM 解决方案,但ddrescue
它可能比能够恢复更多的数据pvmove
。
(停止 LVM 的原因是 LVM 具有多路径支持,并且一旦 LVM 发现具有相同 UUID 的 PV 对,就会在它们之间平衡写入操作。此外,应该停止 LVM 和 LUKS,以确保最近写入的所有数据在底层设备上可见。重新启动系统并且不提供 LUKS 密码是确保这一点的最简单方法。)