重建 Linux raid-1 时出错

重建 Linux raid-1 时出错

我有一台 Linux 机器,它充当家用 NAS,配有 2 个 1 TB 硬盘,组成 Linux RAID-1。最近,两个硬盘中的一个坏了,所以我买了一个新的(1TB WD Blue)装上去。

重建开始并在 7.8% 时停止,并给出错误,即/dev/sdd(好驱动器)有坏块,并且该过程无法继续。我尝试移除/添加新驱动器,但该过程总是在同一点停止。好消息是我仍然可以访问安装在/storage(xfs fs)上的数据。下面我提供有关该问题的更多信息:

好的(源)磁盘:

sudo fdisk -l /dev/sdd

WARNING: GPT (GUID Partition Table) detected on '/dev/sdd'! The util fdisk doesn't support GPT. Use GNU Parted.

Disk /dev/sdd: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1              63  1953525167   976762552+  da  Non-FS data

新的(目标)硬盘:

sudo fdisk -l /dev/sdc

Disk /dev/sdc: 1000.2 GB, 1000204886016 bytes
81 heads, 63 sectors/track, 382818 cylinders, total 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x5c5d0188

   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1            2048  1953525167   976761560   da  Non-FS data

RAID-1 阵列:

cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md3 : active raid1 sdc1[3] sdd1[2]
      976761382 blocks super 1.2 [2/1] [U_]
      [=>...................]  recovery =  7.7% (75738048/976761382) finish=601104.0min speed=24K/sec

dmesg(此消息重复多次):

[35085.217154] ata10.00: exception Emask 0x0 SAct 0x2 SErr 0x0 action 0x0
[35085.217160] ata10.00: irq_stat 0x40000008
[35085.217163] ata10.00: failed command: READ FPDMA QUEUED
[35085.217170] ata10.00: cmd 60/08:08:37:52:43/00:00:6d:00:00/40 tag 1 ncq 4096 in
[35085.217170]          res 41/40:00:3c:52:43/00:00:6d:00:00/40 Emask 0x409 (media error) <F>
[35085.217173] ata10.00: status: { DRDY ERR }
[35085.217175] ata10.00: error: { UNC }
[35085.221619] ata10.00: configured for UDMA/133
[35085.221636] sd 9:0:0:0: [sdd] Unhandled sense code
[35085.221639] sd 9:0:0:0: [sdd]
[35085.221641] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[35085.221643] sd 9:0:0:0: [sdd]
[35085.221645] Sense Key : Medium Error [current] [descriptor]
[35085.221649] Descriptor sense data with sense descriptors (in hex):
[35085.221651]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
[35085.221661]         6d 43 52 3c
[35085.221666] sd 9:0:0:0: [sdd]
[35085.221669] Add. Sense: Unrecovered read error - auto reallocate failed
[35085.221671] sd 9:0:0:0: [sdd] CDB:
[35085.221673] Read(10): 28 00 6d 43 52 37 00 00 08 00
[35085.221682] end_request: I/O error, dev sdd, sector 1833128508
[35085.221706] ata10: EH complete

mdadm细节:

sudo mdadm --detail /dev/md3
/dev/md3:
        Version : 1.2
  Creation Time : Fri Apr 13 19:10:18 2012
     Raid Level : raid1
     Array Size : 976761382 (931.51 GiB 1000.20 GB)
  Used Dev Size : 976761382 (931.51 GiB 1000.20 GB)
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

    Update Time : Wed Sep  4 08:57:46 2013
          State : active, degraded, recovering
 Active Devices : 1
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 1

 Rebuild Status : 7% complete

           Name : hypervisor:3  (local to host hypervisor)
           UUID : b758f8f1:a6a6862e:83133e3a:3b9830ea
         Events : 1257158

    Number   Major   Minor   RaidDevice State
       2       8       49        0      active sync   /dev/sdd1
       3       8       33        1      spare rebuilding   /dev/sdc1

我注意到的一件事是,源硬盘 ( /dev/sdd) 有一个从 63 扇区开始的分区,而新磁盘 ( /dev/sdc) 从 2048 扇区开始。这与问题有关吗?有没有办法告诉 mdadm 忽略这个坏块并继续重建阵列?

我正在考虑最后的办法,使用 ddrescue(live CD)将源(/dev/sdd)驱动器克隆到新驱动器(/dev/sdc),然后将其作为源磁盘。这样可行吗?

我已经对/dev/sdd和进行了重新分区/sdc。现在它看起来像这样:

sudo fdisk -l -u /dev/sdc

Disk /dev/sdc: 1000.2 GB, 1000204886016 bytes
81 heads, 63 sectors/track, 382818 cylinders, total 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x0002c2de

   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1            2048  1953525167   976761560   da  Non-FS data

sudo fdisk -l -u /dev/sdd

Disk /dev/sdd: 1000.2 GB, 1000204886016 bytes
23 heads, 12 sectors/track, 7077989 cylinders, total 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytess
Disk identifier: 0x00069b7e

   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1            2048  1953525167   976761560   da  Non-FS data

这个可以吗?

我再次重建了阵列,然后从备份中恢复了所有数据。一切看起来都很好,只是在重启时/dev/md3重命名为/dev/md127

    Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md127 : active raid1 sdd1[0] sdc1[2]
      976630336 blocks super 1.2 [2/2] [UU]

md1 : active raid0 sdb5[0] sda5[1]
      7809024 blocks super 1.2 512k chunks

md2 : active raid0 sdb6[0] sda6[1]
      273512448 blocks super 1.2 512k chunks

md0 : active raid1 sdb1[0] sda1[2]
      15623096 blocks super 1.2 [2/2] [UU]

cat /etc/mdadm/mdadm.conf
ARRAY /dev/md/0 metadata=1.2 UUID=5c541476:4ee0d591:615c1e5a:d58bc3f7 name=hypervisor:0
ARRAY /dev/md/1 metadata=1.2 UUID=446ba1de:407f8ef4:5bf728ff:84e223db name=hypervisor:1
ARRAY /dev/md/2 metadata=1.2 UUID=b91cba71:3377feb4:8a57c958:11cc3df0 name=hypervisor:2
ARRAY /dev/md/3 metadata=1.2 UUID=5c573747:61c40d46:f5981a8b:e818a297 name=hypervisor:3

sudo mdadm --examine --scan --verbose
ARRAY /dev/md/0 level=raid1 metadata=1.2 num-devices=2 UUID=5c541476:4ee0d591:615c1e5a:d58bc3f7 name=hypervisor:0
   devices=/dev/sdb1,/dev/sda1
ARRAY /dev/md/1 level=raid0 metadata=1.2 num-devices=2 UUID=446ba1de:407f8ef4:5bf728ff:84e223db name=hypervisor:1
   devices=/dev/sdb5,/dev/sda5
ARRAY /dev/md/2 level=raid0 metadata=1.2 num-devices=2 UUID=b91cba71:3377feb4:8a57c958:11cc3df0 name=hypervisor:2
   devices=/dev/sdb6,/dev/sda6
ARRAY /dev/md/3 level=raid1 metadata=1.2 num-devices=2 UUID=5c573747:61c40d46:f5981a8b:e818a297 name=hypervisor:3
   devices=/dev/sdd1,/dev/sdc1

cat /etc/fstab
# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).
#
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
proc            /proc           proc    nodev,noexec,nosuid 0       0
# / was on /dev/md0 during installation
UUID=2e4543d3-22aa-45e1-8adb-f95cfe57a697 /               ext4    noatime,errors=remount-ro,discard   0       1
#was /dev/md3 before
UUID=13689e0b-052f-48f7-bf1f-ad857364c0d6      /storage     ext4     defaults       0       2
# /vm was on /dev/md2 during installation
UUID=9fb85fbf-31f9-43ff-9a43-3ebef9d37ee8 /vm             ext4    noatime,errors=remount-ro,discard   0       2
# swap was on /dev/md1 during installation
UUID=1815549c-9047-464e-96a0-fe836fa80cfd none            swap    sw

对此有什么建议吗?

答案1

好消息是,我仍然可以访问安装在 /storage 上的数据

不,您不能;您在读取这些可疑块上的数据时遇到了问题/dev/sdd。您在正常操作中不知道这一点,要么是因为您碰巧没有读取这些块,要么是因为您的应用程序对读取错误具有容忍度。

我发现记录的这些消息/dev/sdd非常令人担忧。如果这是我的设备,我会尽快备份数据,最好备份两次,更换另一个驱动器以及驱动器,然后从我能获得的备份中恢复。

此外,正如您所指出的,您正试图将 976762552 块分区与 976761560 块分区进行镜像,但这样做是行不通的;新分区至少需要与旧分区一样大。我有点惊讶重建mdadm能够继续进行,但您没有说明您正在运行哪个发行版,因此很难知道版本有多旧;也许它已经足够旧了,不需要检查这类事情。

编辑:是的,您应该按照您描述的那样扩大分区。我不是 ubuntu 的粉丝,所以我无法对该版本发表评论。如果您完成了此重新同步,我会立即更换另一个驱动器。如果您有一个不错的备份,我会停止浪费时间进行重新同步,立即更换它,重新创建阵列,然后从备份中恢复。

答案2

您可以尝试我在此处描述的步骤:使用新硬盘和带有坏块的旧硬盘重建 SW RAID1。它使用 hdparm 读取和写入坏扇区,并在可能的情况下将它们重新映射到磁盘上。

答案3

sdd 驱动器肯定出现故障并且内部重新分配空间不足。

无论如何,如果可用,您可以尝试更新固件。

顺便说一句,这些是 GPT 磁盘,使用partedgdisk列出和操作分区。fdisk 不支持 GPT,并且总体上是一个非常有缺陷的应用程序。

相关内容