我有一个硬件 Adaptec ASR8405 RAID 控制器,上面有 15 个磁盘 RAID6 阵列。其中一个磁盘坏了,更换后控制器没有检测到它,没有启动重建,而是进入了故障状态(见下文):
----------------------------------------------------------------------
Logical device information
----------------------------------------------------------------------
Logical Device number 0
Logical Device name : LogicalDrv 0
Block Size of member drives : 512 Bytes
RAID level : 6 Reed-Solomon
Unique Identifier : A0E20532
Status of Logical Device : Failed
Additional details : Initialized with Build/Clear
Size : 74347510 MB
Parity space : 11438080 MB
Stripe-unit size : 256 KB
Interface Type : Serial ATA
Device Type : HDD
Read-cache setting : Enabled
Read-cache status : On
Write-cache setting : Enabled
Write-cache status : Off
Partitioned : No
Protected by Hot-Spare : No
Bootable : Yes
Failed stripes : No
Power settings : Disabled
--------------------------------------------------------
Logical Device segment information
--------------------------------------------------------
Segment 0 : Present (5723166MB, SATA, HDD, Enclosure:0, Slot:0) K1JG4N8D
Segment 1 : Present (5723166MB, SATA, HDD, Enclosure:0, Slot:1) K1JGHL7D
Segment 2 : Missing
Segment 3 : Present (5723166MB, SATA, HDD, Enclosure:0, Slot:3) K1JGE6ZD
Segment 4 : Present (5723166MB, SATA, HDD, Enclosure:0, Slot:4) K1JEWTND
Segment 5 : Present (5723166MB, SATA, HDD, Enclosure:0, Slot:5) K1JENR3D
Segment 6 : Present (5723166MB, SATA, HDD, Enclosure:0, Slot:6) K1JG2U0D
Segment 7 : Present (5723166MB, SATA, HDD, Enclosure:0, Slot:7) K1JG66ED
Segment 8 : Present (5723166MB, SATA, HDD, Enclosure:0, Slot:8) K1JGHJ6D
Segment 9 : Present (5723166MB, SATA, HDD, Enclosure:0, Slot:9) K1JGELLD
Segment 10 : Present (5723166MB, SATA, HDD, Enclosure:0, Slot:10) K1JG5XYD
Segment 11 : Present (5723166MB, SATA, HDD, Enclosure:0, Slot:11) K1JGSTJD
Segment 12 : Present (5723166MB, SATA, HDD, Enclosure:0, Slot:12) K1JG339D
Segment 13 : Present (5723166MB, SATA, HDD, Enclosure:0, Slot:13) K1JG16KD
Segment 14 : Present (5723166MB, SATA, HDD, Enclosure:0, Slot:14) K1JEX09D
如您所见,逻辑设备的 Segment2 中的磁盘被报告为丢失,但是在检查物理设备(处于就绪状态)时它会显示出来:
Device #2
Device is a Hard drive
State : Ready
Block Size : 512 Bytes
Supported : Yes
Programmed Max Speed : SATA 6.0 Gb/s
Transfer Speed : SATA 12.0 Gb/s
Reported Channel,Device(T:L) : 0,6(6:0)
Reported Location : Enclosure 0, Slot 2(Connector 0)
Reported ESD(T:L) : 2,0(0:0)
Vendor : ATA
Model : HGST HUS726060AL
Firmware : T907
Serial number : K1GVY99D
World-wide name : 5000CCA255CC3FA3
Reserved Size : 4225560 KB
Used Size : 0 MB
Unused Size : 5719040 MB
Total Size : 5723166 MB
Write Cache : Enabled (write-back)
FRU : None
S.M.A.R.T. : No
S.M.A.R.T. warnings : 0
Power State : Full rpm
Supported Power States : Full rpm,Powered off,Reduced rpm
SSD : No
Temperature : 42 C/ 107 F
NCQ status : Enabled
----------------------------------------------------------------
Device Phy Information
----------------------------------------------------------------
Phy #0
PHY Identifier : 0
SAS Address : 50000D1701875C02
Attached PHY Identifier : 2
Attached SAS Address : 50000D1701875C3F
----------------------------------------------------------------
Runtime Error Counters
----------------------------------------------------------------
Hardware Error Count : 0
Medium Error Count : 0
Parity Error Count : 0
Link Failure Count : 0
Aborted Command Count : 0
SMART Warning Count : 0
- 问题 1:如何让逻辑设备识别磁盘?我尝试在 LD 上重新扫描,在磁盘上清除、验证和初始化,但都无济于事……
问题 2:有没有可能修复这个问题并恢复数据?我有备份,但数据超过 40TB,从备份中恢复这些数据并不好玩。
问题 3:如果我将 LD 状态更改为 OPTIMAL,是否有可能自行修复?
问题 4:关于如何修复它还有其他想法吗?
在此先非常感谢您的任何提示!
答案1
我已经通过以下方法修复了它:
arcconf SETSTATE 1 LOGICALDRIVE 0 OPTIMAL ADVANCED nocheck noprompt
改变逻辑驱动器状态后,阵列立即开始自动重建。重建完成后,开始使用修复进行验证(再次自动)。验证完成后,一切恢复正常(无数据丢失)。