我的服务器上有一个 RAID 1,显然两个硬盘同时出现故障。
服务器支持人员进行了快速检查以确认
HDDTEST-W1F21M6K ERROR Finished (Selftest, Device: sda);
HDDTEST-W1F22Y9M ERROR Finished (Values-Check, Device: sdb);
However, there still seems to be a partition table on sdb.
Your server is currently booted into our rescue system. Please try
to backup your data if possible and contact us again if
you wish to proceed with a hard drive replacement.
我可以从其他驱动器启动系统并看到以下结构
cat /proc/mdstat
Personalities : [raid1]
md3 : active raid1 sdb4[1]
1822442815 blocks super 1.2 [2/1] [_U]
md2 : active raid1 sdb3[1]
1073740664 blocks super 1.2 [2/1] [U_]
md1 : active raid1 sdb2[1]
524276 blocks super 1.2 [1/1] [U]
md0 : active raid1 sdb1[1]
33553336 blocks super 1.2 [2/1] [_U]
我需要的是能够从/dev/md2
分区恢复一些重要数据。我正在尝试挂载 md2 并得到以下信息:
mount /dev/md2 /mnt
mount: wrong fs type, bad option, bad superblock on /dev/md2,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so
任何想法如何解决这一问题?
更新1
更多数据
mdadm -E /dev/sdb3
/dev/sdb3:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 39c5b7f5:c3bed499:e383ce7f:0868fc3e
Name : rescue:2 (local to host rescue)
Creation Time : Wed Feb 6 07:23:32 2013
Raid Level : raid1
Raid Devices : 2
Avail Dev Size : 2147481600 (1024.00 GiB 1099.51 GB)
Array Size : 1073740664 (1024.00 GiB 1099.51 GB)
Used Dev Size : 2147481328 (1024.00 GiB 1099.51 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 3d68ec1a:3b125641:fa4b1d34:c829f017
Update Time : Wed Aug 6 13:21:28 2014
Checksum : dad4eccc - correct
Events : 18773099
Device Role : Active device 0
Array State : A. ('A' == active, '.' == missing)
更新2
可用卷
ls /dev/sd
sda sdb sdb1 sdb2 sdb3 sdb4 sdb5
mdadm -E /dev/sda
mdadm: No md superblock detected on /dev/sda.
mount /dev/md2 /mnt attepmt 后的 dmesg 输出
[Wed Aug 6 16:11:12 2014] ata2.00: exception Emask 0x0 SAct 0x600fffff SErr 0x0 action 0x0
[Wed Aug 6 16:11:12 2014] ata2.00: irq_stat 0x40000008
[Wed Aug 6 16:11:12 2014] ata2.00: cmd 60/08:e8:70:3b:d4/00:00:43:00:00/40 tag 29 ncq 4096 in
[Wed Aug 6 16:11:12 2014] res 41/40:08:70:3b:d4/00:00:43:00:00/00 Emask 0x409 (media error) <F>
[Wed Aug 6 16:11:12 2014] ata2.00: configured for UDMA/133
[Wed Aug 6 16:11:12 2014] sd 1:0:0:0: [sdb] Unhandled sense code
[Wed Aug 6 16:11:12 2014] sd 1:0:0:0: [sdb]
[Wed Aug 6 16:11:12 2014] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[Wed Aug 6 16:11:12 2014] sd 1:0:0:0: [sdb]
[Wed Aug 6 16:11:12 2014] Sense Key : Medium Error [current] [descriptor]
[Wed Aug 6 16:11:12 2014] Descriptor sense data with sense descriptors (in hex):
[Wed Aug 6 16:11:12 2014] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
[Wed Aug 6 16:11:12 2014] 43 d4 3b 70
[Wed Aug 6 16:11:12 2014] sd 1:0:0:0: [sdb]
[Wed Aug 6 16:11:12 2014] Add. Sense: Unrecovered read error - auto reallocate failed
[Wed Aug 6 16:11:12 2014] sd 1:0:0:0: [sdb] CDB:
[Wed Aug 6 16:11:12 2014] Read(16): 88 00 00 00 00 00 43 d4 3b 70 00 00 00 08 00 00
[Wed Aug 6 16:11:12 2014] end_request: I/O error, dev sdb, sector 1137982320
[Wed Aug 6 16:11:12 2014] ata2: EH complete
[Wed Aug 6 16:11:15 2014] JBD2: Failed to read block at offset 1134
[Wed Aug 6 16:11:15 2014] JBD2: IO error -5 recovering block 1134 in log
[Wed Aug 6 16:11:16 2014] JBD2: recovery failed
[Wed Aug 6 16:11:16 2014] EXT4-fs (md2): error loading journal
更新3
深圳发展银行
smartctl -d ata -A /dev/sdb
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.14.10] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 101 099 006 Pre-fail Always - 216425892
3 Spin_Up_Time 0x0003 095 095 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 6
5 Reallocated_Sector_Ct 0x0033 092 092 010 Pre-fail Always - 10928
7 Seek_Error_Rate 0x000f 081 060 030 Pre-fail Always - 149168536
9 Power_On_Hours 0x0032 085 085 000 Old_age Always - 13145
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 6
183 Runtime_Bad_Block 0x0032 099 099 000 Old_age Always - 1
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 064 064 000 Old_age Always - 36
188 Command_Timeout 0x0032 100 098 000 Old_age Always - 12885098499
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 067 052 045 Old_age Always - 33 (Min/Max 26/36)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 4
193 Load_Cycle_Count 0x0032 092 092 000 Old_age Always - 17084
194 Temperature_Celsius 0x0022 033 048 000 Old_age Always - 33 (0 22 0 0)
197 Current_Pending_Sector 0x0012 097 097 000 Old_age Always - 504
198 Offline_Uncorrectable 0x0010 097 097 000 Old_age Offline - 504
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 128896263532923
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 10152724077
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 40689314539
对于sda
smartctl -d ata -A /dev/sda
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.14.10] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
Error SMART Values Read failed: Input/output error
Smartctl: SMART Read Values failed.
=== START OF READ SMART DATA SECTION ===
答案1
好吧,看来它/dev/sda
已经死了,你无法从中获取数据,至少在没有技巧的情况下是这样。
/dev/sdb
另一方面似乎有很多坏扇区。这可能是一个坏兆头,但您应该能够获取数据。
根据数据的重要程度以及您对备份的信心,您需要首先对磁盘进行映像,至少是可以读取的扇区。工具包括 GNU ddrescue 和其他一些类似的程序。
然后进行 fsck。例如,fsck /dev/md2
在实时系统上执行此操作。您可以尝试-p
首先自动修复风险最小的错误,或者-y
告诉它修复所有内容(即使有风险)。或者没有任何选项,它会提示您进行每件事。
之后,您应该能够安装/dev/md2
并获取您的数据,或者至少是剩余的数据。
我会要求您的托管公司将两个故障磁盘保留一段时间(更换磁盘后),直到您确定拥有所有数据。