raid5 磁盘降级 分区分离

raid5 磁盘降级 分区分离

我的 raid5 磁盘出了问题。我之前也遇到过磁盘故障,更换磁盘都没有问题,但这次修复起来却很困难。

情况如下:我正在运行 Ubuntu 12.04。我有 3x2TB 磁盘。我有 2 个 raid5 磁盘 md0 和 md1。md0 工作正常。我遇到了 md1 的问题,它现在以降级模式工作,因为 sdc2 不再是阵列的一部分。但 sdc 并没有死,因为 sdc1 是 md0 的一部分,并且工作正常。

$ cat /proc/mdstat 
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md1 : active raid5 sdb2[3] sdd2[2]
      409336832 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [_UU]

md0 : active raid5 sdc1[4] sdb1[5] sdd1[3]
      3497163776 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]

unused devices: <none>

/dev/md1 的详细信息如下:

$ sudo mdadm --detail /dev/md1
/dev/md1:
        Version : 1.2
  Creation Time : Mon Aug 22 14:17:57 2016
     Raid Level : raid5
     Array Size : 409336832 (390.37 GiB 419.16 GB)
  Used Dev Size : 204668416 (195.19 GiB 209.58 GB)
   Raid Devices : 3
  Total Devices : 2
    Persistence : Superblock is persistent

    Update Time : Wed Dec 28 08:17:51 2016
          State : clean, degraded 
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 512K

           Name : serv1:1  (local to host serv1)
           UUID : bf2095af:69c02451:1f31ee06:93b92c8b
         Events : 844

    Number   Major   Minor   RaidDevice State
       0       0        0        0      removed
       3       8       18        1      active sync   /dev/sdb2
       2       8       50        2      active sync   /dev/sdd2

尝试从 /dev/md1 中删除 /dev/sdc2,得到sudo mdadm /dev/md1 -r /dev/sdc2以下结果mdadm: hot remove failed for /dev/sdc2: No such device or address,这很好,因为这意味着 /dev/sdc2 不再属于该阵列。

但是当尝试将 /dev/sdc2 添加到阵列时,sudo mdadm /dev/md1 -a /dev/sdc2它给出了以下错误mdadm: add new device failed for /dev/sdc2 as 4: Invalid argument。我注意到,当尝试添加 sdc2 时,我遇到了一堆类似以下错误/var/log/syslog

ata3.00: exception Emask 0x0 SAct 0x4000000 SErr 0x0 action 0x0
ata3.00: irq_stat 0x40000008
ata3.00: failed command: READ FPDMA QUEUED
ata3.00: cmd 60/08:d0:08:88:76/00:00:d0:00:00/40 tag 26 ncq 4096 in
         res 41/40:00:09:88:76/00:00:d0:00:00/40 Emask 0x409 (media error) <F>
ata3.00: status: { DRDY ERR }
ata3.00: error: { UNC }
ata3.00: configured for UDMA/133
ata3: EH complete
ata3.00: exception Emask 0x0 SAct 0x8000000 SErr 0x0 action 0x0
ata3.00: irq_stat 0x40000008
ata3.00: failed command: READ FPDMA QUEUED
ata3.00: cmd 60/08:d8:08:88:76/00:00:d0:00:00/40 tag 27 ncq 4096 in
         res 41/40:00:09:88:76/00:00:d0:00:00/40 Emask 0x409 (media error) <F>
ata3.00: status: { DRDY ERR }
ata3.00: error: { UNC }
ata3.00: configured for UDMA/133
ata3: EH complete
ata3.00: exception Emask 0x0 SAct 0x10000000 SErr 0x0 action 0x0
ata3.00: irq_stat 0x40000008
ata3.00: failed command: READ FPDMA QUEUED
ata3.00: cmd 60/08:e0:08:88:76/00:00:d0:00:00/40 tag 28 ncq 4096 in
         res 41/40:00:09:88:76/00:00:d0:00:00/40 Emask 0x409 (media error) <F>
ata3.00: status: { DRDY ERR }
ata3.00: error: { UNC }
ata3.00: configured for UDMA/133
sd 2:0:0:0: [sdc] Unhandled sense code
sd 2:0:0:0: [sdc]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 2:0:0:0: [sdc]  Sense Key : Medium Error [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
        72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
        d0 76 88 09 
sd 2:0:0:0: [sdc]  Add. Sense: Unrecovered read error - auto reallocate failed
sd 2:0:0:0: [sdc] CDB: Read(10): 28 00 d0 76 88 08 00 00 08 00
end_request: I/O error, dev sdc, sector 3497429001
Buffer I/O error on device sdc2, logical block 1
ata3: EH complete

我不明白我需要做什么。因为看起来我的 sdc 磁盘坏了,而我对/dev/md0使用 的完全没有问题/dev/sdc1。我已经尝试停止 md1 然后用 组装它sudo mdadm --assemble /dev/md1 /dev/sdb2 /dev/sdd2。但添加 sdc2 总是会出现同样的问题。

以下是我得到的结果sudo smartctl -a /dev/sdc2

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   188   188   051    Pre-fail  Always       -       81654
  3 Spin_Up_Time            0x0027   175   174   021    Pre-fail  Always       -       4250
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       33
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   086   086   000    Old_age   Always       -       10611
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       33
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       22
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       19
194 Temperature_Celsius     0x0022   119   106   000    Old_age   Always       -       28
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0

以下是我获得的结果sudo badblocks /dev/sdc2

4
5
6
7

相关内容