我的 raid5 磁盘出了问题。我之前也遇到过磁盘故障,更换磁盘都没有问题,但这次修复起来却很困难。
情况如下:我正在运行 Ubuntu 12.04。我有 3x2TB 磁盘。我有 2 个 raid5 磁盘 md0 和 md1。md0 工作正常。我遇到了 md1 的问题,它现在以降级模式工作,因为 sdc2 不再是阵列的一部分。但 sdc 并没有死,因为 sdc1 是 md0 的一部分,并且工作正常。
$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid5 sdb2[3] sdd2[2]
409336832 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [_UU]
md0 : active raid5 sdc1[4] sdb1[5] sdd1[3]
3497163776 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]
unused devices: <none>
/dev/md1 的详细信息如下:
$ sudo mdadm --detail /dev/md1
/dev/md1:
Version : 1.2
Creation Time : Mon Aug 22 14:17:57 2016
Raid Level : raid5
Array Size : 409336832 (390.37 GiB 419.16 GB)
Used Dev Size : 204668416 (195.19 GiB 209.58 GB)
Raid Devices : 3
Total Devices : 2
Persistence : Superblock is persistent
Update Time : Wed Dec 28 08:17:51 2016
State : clean, degraded
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Name : serv1:1 (local to host serv1)
UUID : bf2095af:69c02451:1f31ee06:93b92c8b
Events : 844
Number Major Minor RaidDevice State
0 0 0 0 removed
3 8 18 1 active sync /dev/sdb2
2 8 50 2 active sync /dev/sdd2
尝试从 /dev/md1 中删除 /dev/sdc2,得到sudo mdadm /dev/md1 -r /dev/sdc2
以下结果mdadm: hot remove failed for /dev/sdc2: No such device or address
,这很好,因为这意味着 /dev/sdc2 不再属于该阵列。
但是当尝试将 /dev/sdc2 添加到阵列时,sudo mdadm /dev/md1 -a /dev/sdc2
它给出了以下错误mdadm: add new device failed for /dev/sdc2 as 4: Invalid argument
。我注意到,当尝试添加 sdc2 时,我遇到了一堆类似以下错误/var/log/syslog
:
ata3.00: exception Emask 0x0 SAct 0x4000000 SErr 0x0 action 0x0
ata3.00: irq_stat 0x40000008
ata3.00: failed command: READ FPDMA QUEUED
ata3.00: cmd 60/08:d0:08:88:76/00:00:d0:00:00/40 tag 26 ncq 4096 in
res 41/40:00:09:88:76/00:00:d0:00:00/40 Emask 0x409 (media error) <F>
ata3.00: status: { DRDY ERR }
ata3.00: error: { UNC }
ata3.00: configured for UDMA/133
ata3: EH complete
ata3.00: exception Emask 0x0 SAct 0x8000000 SErr 0x0 action 0x0
ata3.00: irq_stat 0x40000008
ata3.00: failed command: READ FPDMA QUEUED
ata3.00: cmd 60/08:d8:08:88:76/00:00:d0:00:00/40 tag 27 ncq 4096 in
res 41/40:00:09:88:76/00:00:d0:00:00/40 Emask 0x409 (media error) <F>
ata3.00: status: { DRDY ERR }
ata3.00: error: { UNC }
ata3.00: configured for UDMA/133
ata3: EH complete
ata3.00: exception Emask 0x0 SAct 0x10000000 SErr 0x0 action 0x0
ata3.00: irq_stat 0x40000008
ata3.00: failed command: READ FPDMA QUEUED
ata3.00: cmd 60/08:e0:08:88:76/00:00:d0:00:00/40 tag 28 ncq 4096 in
res 41/40:00:09:88:76/00:00:d0:00:00/40 Emask 0x409 (media error) <F>
ata3.00: status: { DRDY ERR }
ata3.00: error: { UNC }
ata3.00: configured for UDMA/133
sd 2:0:0:0: [sdc] Unhandled sense code
sd 2:0:0:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 2:0:0:0: [sdc] Sense Key : Medium Error [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
d0 76 88 09
sd 2:0:0:0: [sdc] Add. Sense: Unrecovered read error - auto reallocate failed
sd 2:0:0:0: [sdc] CDB: Read(10): 28 00 d0 76 88 08 00 00 08 00
end_request: I/O error, dev sdc, sector 3497429001
Buffer I/O error on device sdc2, logical block 1
ata3: EH complete
我不明白我需要做什么。因为看起来我的 sdc 磁盘坏了,而我对/dev/md0
使用 的完全没有问题/dev/sdc1
。我已经尝试停止 md1 然后用 组装它sudo mdadm --assemble /dev/md1 /dev/sdb2 /dev/sdd2
。但添加 sdc2 总是会出现同样的问题。
以下是我得到的结果sudo smartctl -a /dev/sdc2
:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 188 188 051 Pre-fail Always - 81654
3 Spin_Up_Time 0x0027 175 174 021 Pre-fail Always - 4250
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 33
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 086 086 000 Old_age Always - 10611
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 33
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 22
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 19
194 Temperature_Celsius 0x0022 119 106 000 Old_age Always - 28
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0
以下是我获得的结果sudo badblocks /dev/sdc2
:
4
5
6
7