我不想通过反复试验的方式来解决这个问题,因为我知道如果我想丢失数据的话这是最好的办法。
我有一台服务器,有 4*2TB 磁盘,采用 RAID5 结构(是的,我知道这是不明智) 在 Ubuntu 14.04 上。
我的大部分数据都在/home
RAID 5 和/
RAID1 上。
我以救援模式启动了服务器,但我无法弄清楚:
- 无论问题是软件问题还是硬件问题,
- 是否有办法重新安装 raid 来恢复这些数据。
我仔细阅读了恢复失败的软件 RAID(raid.wiki.kernel.org),但由于我对自己的诊断不是很有信心,我想对正在发生的事情以及如果有任何事情需要做时该如何进行进行一些明智的判断……
我尝试过的唯一方法是安装尚未安装的 mds 设备,这对 md2 有效mount /dev/md2 /mnt/
,但正如我所说的,我无法安装 md0 和 md3 /dev/md3: can't read superblock
。
到目前为止,这是我检查的内容:
编辑部分-l
root@rescue:/mnt# parted -l
Model: ATA ST2000DM001-1CH1 (scsi)
Disk /dev/sda: 2000GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags:
Number Start End Size File system Name Flags
1 20.5kB 1049kB 1029kB primary bios_grub
2 2097kB 10.5GB 10.5GB ext4 primary raid
3 10.5GB 2000GB 1989GB primary raid
4 2000GB 2000GB 536MB linux-swap(v1) primary
Model: ATA ST2000DM001-1CH1 (scsi)
Disk /dev/sdb: 2000GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags:
Number Start End Size File system Name Flags
1 20.5kB 1049kB 1029kB primary bios_grub
2 2097kB 10.5GB 10.5GB ext4 primary raid
3 10.5GB 2000GB 1989GB primary raid
4 2000GB 2000GB 536MB linux-swap(v1) primary
Error: /dev/sdc: unrecognised disk label
Model: ATA ST2000DM001-1CH1 (scsi)
Disk /dev/sdc: 2000GB
Sector size (logical/physical): 512B/4096B
Partition Table: unknown
Disk Flags:
Model: ATA ST2000DM001-1CH1 (scsi)
Disk /dev/sdd: 2000GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags:
Number Start End Size File system Name Flags
1 20.5kB 1049kB 1029kB primary bios_grub
2 2097kB 10.5GB 10.5GB ext4 primary raid
3 10.5GB 2000GB 1989GB primary raid
4 2000GB 2000GB 536MB linux-swap(v1) primary
Model: Linux Software RAID Array (md)
Disk /dev/md2: 10.5GB
Sector size (logical/physical): 512B/4096B
Partition Table: loop
Disk Flags:
Number Start End Size File system Flags
1 0.00B 10.5GB 10.5GB ext4
Error: /dev/md127: unrecognised disk label
Model: Linux Software RAID Array (md)
Disk /dev/md127: 10.5GB
Sector size (logical/physical): 512B/4096B
Partition Table: unknown
Disk Flags:
智能控制
root@rescue:~# smartctl -a -d ata /dev/sdc
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.10.23-xxxx-std-ipv6-64-rescue] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
Smartctl: Device Read Identity Failed: Input/output error
A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.
root@rescue:~# smartctl -a -d ata /dev/sdd
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.10.23-xxxx-std-ipv6-64-rescue] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 7200.14 (AF)
Device Model: ST2000DM001-1CH164
Serial Number: W1E1KX59
LU WWN Device Id: 5 000c50 05c821593
Firmware Version: CC43
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 4
Local Time is: Tue Dec 30 16:04:49 2014 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 121) The previous self-test completed having
the read element of the test failed.
Total time to complete Offline
data collection: ( 584) seconds.
Offline data collection
capabilities: (0x73) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 230) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x3085) SCT Status supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 111 099 006 Pre-fail Always - 120551532
3 Spin_Up_Time 0x0003 095 095 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 32
5 Reallocated_Sector_Ct 0x0033 097 097 036 Pre-fail Always - 4008
7 Seek_Error_Rate 0x000f 077 060 030 Pre-fail Always - 4351310995
9 Power_On_Hours 0x0032 079 079 000 Old_age Always - 18725
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 32
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 089 089 000 Old_age Always - 11
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 0 0
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 068 056 045 Old_age Always - 32 (Min/Max 26/35)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 31
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 46
194 Temperature_Celsius 0x0022 032 044 000 Old_age Always - 32 (0 16 0 0)
197 Current_Pending_Sector 0x0012 082 082 000 Old_age Always - 3056
198 Offline_Uncorrectable 0x0010 082 082 000 Old_age Offline - 3056
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 18708h+109m+27.415s
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 24242600022
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 112149279703
SMART Error Log Version: 1
ATA Error Count: 11 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 11 occurred at disk power-on lifetime: 18520 hours (771 days + 16 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 08 ff ff ff 4f 00 09:32:13.900 READ FPDMA QUEUED
ef 10 02 00 00 00 a0 00 09:32:13.898 SET FEATURES [Reserved for Serial ATA]
27 00 00 00 00 00 e0 00 09:32:13.898 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 00 09:32:13.898 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 00 09:32:13.898 SET FEATURES [Set transfer mode]
Error 10 occurred at disk power-on lifetime: 18520 hours (771 days + 16 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 08 ff ff ff 4f 00 09:32:10.764 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 09:32:10.763 READ FPDMA QUEUED
60 00 38 ff ff ff 4f 00 09:32:10.763 READ FPDMA QUEUED
ef 10 02 00 00 00 a0 00 09:32:10.763 SET FEATURES [Reserved for Serial ATA]
27 00 00 00 00 00 e0 00 09:32:10.763 READ NATIVE MAX ADDRESS EXT
Error 9 occurred at disk power-on lifetime: 18520 hours (771 days + 16 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 38 ff ff ff 4f 00 09:32:09.084 READ FPDMA QUEUED
61 00 08 00 88 38 41 00 09:32:07.445 WRITE FPDMA QUEUED
60 00 08 ff ff ff 4f 00 09:32:07.416 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 09:32:07.416 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 09:32:07.416 READ FPDMA QUEUED
Error 8 occurred at disk power-on lifetime: 18520 hours (771 days + 16 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: WP at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
61 00 40 ff ff ff 4f 00 09:32:04.118 WRITE FPDMA QUEUED
61 00 08 70 88 38 41 00 09:32:04.117 WRITE FPDMA QUEUED
61 00 40 ff ff ff 4f 00 09:32:04.117 WRITE FPDMA QUEUED
60 00 40 ff ff ff 4f 00 09:32:04.117 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 09:32:04.117 READ FPDMA QUEUED
Error 7 occurred at disk power-on lifetime: 17319 hours (721 days + 15 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 50 a9 59 02 Error: UNC at LBA = 0x0259a950 = 39430480
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 00 00 aa 59 42 00 01:31:02.054 READ FPDMA QUEUED
60 00 00 00 a6 59 42 00 01:31:02.054 READ FPDMA QUEUED
60 00 00 00 92 36 42 00 01:30:55.032 READ FPDMA QUEUED
60 00 00 00 86 36 42 00 01:30:51.600 READ FPDMA QUEUED
60 00 00 00 82 36 42 00 01:30:51.593 READ FPDMA QUEUED
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 18706 3848494344
# 2 Short offline Completed without error 00% 3481 -
# 3 Short offline Completed without error 00% 3472 -
# 4 Short offline Completed without error 00% 3472 -
# 5 Short offline Completed without error 00% 13 -
# 6 Short offline Completed without error 00% 5 -
# 7 Short offline Completed without error 00% 5 -
# 8 Short offline Completed without error 00% 0 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
ls /dev
ls /dev
MAKEDEV md0 ptya0 ptyc5 ptyea ptyqf ptyt4 ptyv9 ptyxe ram11 sg1 tty34 ttyS1 ttyc2 ttye7 ttyqc ttyt1 ttyv6 ttyxb urandom
aer_inject md127 ptya1 ptyc6 ptyeb ptyr0 ptyt5 ptyva ptyxf ram12 sg2 tty35 ttyS2 ttyc3 ttye8 ttyqd ttyt2 ttyv7 ttyxc vcs
autofs md2 ptya2 ptyc7 ptyec ptyr1 ptyt6 ptyvb ptyy0 ram13 sg3 tty36 ttyS3 ttyc4 ttye9 ttyqe ttyt3 ttyv8 ttyxd vcs1
block md3 ptya3 ptyc8 ptyed ptyr2 ptyt7 ptyvc ptyy1 ram14 shm tty37 ttya0 ttyc5 ttyea ttyqf ttyt4 ttyv9 ttyxe vcs2
[…]
猫/proc/mdstat
cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] [faulty]
md127 : active raid1 sdc2[2]
10238912 blocks [4/1] [__U_]
md2 : active raid1 sdd2[3] sda2[0] sdb2[1]
10238912 blocks [4/3] [UU_U]
mdadm --detail
root@rescue:~# mdadm --detail /dev/md2
/dev/md2:
Version : 0.90
Creation Time : Tue Sep 2 16:46:34 2014
Raid Level : raid1
Array Size : 10238912 (9.76 GiB 10.48 GB)
Used Dev Size : 10238912 (9.76 GiB 10.48 GB)
Raid Devices : 4
Total Devices : 3
Preferred Minor : 2
Persistence : Superblock is persistent
Update Time : Sat Dec 27 17:31:03 2014
State : clean, degraded
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
UUID : 5a33c710:006f668d:a4d2adc2:26fd5302 (local to host rescue.ovh.net)
Events : 0.503145
Number Major Minor RaidDevice State
0 8 2 0 active sync /dev/sda2
1 8 18 1 active sync /dev/sdb2
2 0 0 2 removed
3 8 50 3 active sync /dev/sdd2
root@rescue:~# mdadm --detail /dev/md127
/dev/md127:
Version : 0.90
Creation Time : Tue Sep 2 16:46:34 2014
Raid Level : raid1
Array Size : 10238912 (9.76 GiB 10.48 GB)
Used Dev Size : 10238912 (9.76 GiB 10.48 GB)
Raid Devices : 4
Total Devices : 1
Preferred Minor : 127
Persistence : Superblock is persistent
Update Time : Sat Dec 27 17:31:16 2014
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0
Number Major Minor RaidDevice State
0 0 0 0 removed
1 0 0 1 removed
2 8 34 2 active sync /dev/sdc2
3 0 0 3 removed
最后 mdadm --examine sd*
root@rescue:~# mdadm --examine /dev/sd*
/dev/sda:
MBR Magic : aa55
Partition[0] : 3907029167 sectors at 1 (type ee)
mdadm: No md superblock detected on /dev/sda1.
/dev/sda2:
Magic : a92b4efc
Version : 0.90.00
UUID : 5a33c710:006f668d:a4d2adc2:26fd5302 (local to host rescue.ovh.net)
Creation Time : Tue Sep 2 16:46:34 2014
Raid Level : raid1
Used Dev Size : 10238912 (9.76 GiB 10.48 GB)
Array Size : 10238912 (9.76 GiB 10.48 GB)
Raid Devices : 4
Total Devices : 3
Preferred Minor : 2
Update Time : Sat Dec 27 18:20:56 2014
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 1
Spare Devices : 0
Checksum : 78eed7b8 - correct
Events : 503147
Number Major Minor RaidDevice State
this 0 8 2 0 active sync /dev/sda2
0 0 8 2 0 active sync /dev/sda2
1 1 8 18 1 active sync /dev/sdb2
2 2 0 0 2 faulty removed
3 3 8 50 3 active sync /dev/sdd2
/dev/sda3:
Magic : a92b4efc
Version : 0.90.00
UUID : 4a417350:7192f812:a4d2adc2:26fd5302 (local to host rescue.ovh.net)
Creation Time : Tue Sep 2 16:46:35 2014
Raid Level : raid5
Used Dev Size : 1942745600 (1852.75 GiB 1989.37 GB)
Array Size : 5828236800 (5558.24 GiB 5968.11 GB)
Raid Devices : 4
Total Devices : 3
Preferred Minor : 3
Update Time : Mon Dec 22 10:33:05 2014
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 2
Spare Devices : 0
Checksum : 4d44c428 - correct
Events : 109608
Layout : left-symmetric
Chunk Size : 512K
Number Major Minor RaidDevice State
this 0 8 3 0 active sync /dev/sda3
0 0 8 3 0 active sync /dev/sda3
1 1 8 19 1 active sync /dev/sdb3
2 2 0 0 2 faulty removed
3 3 0 0 3 faulty removed
mdadm: No md superblock detected on /dev/sda4.
/dev/sdb:
MBR Magic : aa55
Partition[0] : 3907029167 sectors at 1 (type ee)
mdadm: No md superblock detected on /dev/sdb1.
/dev/sdb2:
Magic : a92b4efc
Version : 0.90.00
UUID : 5a33c710:006f668d:a4d2adc2:26fd5302 (local to host rescue.ovh.net)
Creation Time : Tue Sep 2 16:46:34 2014
Raid Level : raid1
Used Dev Size : 10238912 (9.76 GiB 10.48 GB)
Array Size : 10238912 (9.76 GiB 10.48 GB)
Raid Devices : 4
Total Devices : 3
Preferred Minor : 2
Update Time : Sat Dec 27 18:20:56 2014
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 1
Spare Devices : 0
Checksum : 78eed7ca - correct
Events : 503147
Number Major Minor RaidDevice State
this 1 8 18 1 active sync /dev/sdb2
0 0 8 2 0 active sync /dev/sda2
1 1 8 18 1 active sync /dev/sdb2
2 2 0 0 2 faulty removed
3 3 8 50 3 active sync /dev/sdd2
/dev/sdb3:
Magic : a92b4efc
Version : 0.90.00
UUID : 4a417350:7192f812:a4d2adc2:26fd5302 (local to host rescue.ovh.net)
Creation Time : Tue Sep 2 16:46:35 2014
Raid Level : raid5
Used Dev Size : 1942745600 (1852.75 GiB 1989.37 GB)
Array Size : 5828236800 (5558.24 GiB 5968.11 GB)
Raid Devices : 4
Total Devices : 3
Preferred Minor : 3
Update Time : Mon Dec 22 10:33:05 2014
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 2
Spare Devices : 0
Checksum : 4d44c43a - correct
Events : 109608
Layout : left-symmetric
Chunk Size : 512K
Number Major Minor RaidDevice State
this 1 8 19 1 active sync /dev/sdb3
0 0 8 3 0 active sync /dev/sda3
1 1 8 19 1 active sync /dev/sdb3
2 2 0 0 2 faulty removed
3 3 0 0 3 faulty removed
mdadm: No md superblock detected on /dev/sdb4.
mdadm: No md superblock detected on /dev/sdc.
mdadm: No md superblock detected on /dev/sdc1.
mdadm: No md superblock detected on /dev/sdc2.
mdadm: No md superblock detected on /dev/sdc3.
mdadm: No md superblock detected on /dev/sdc4.
/dev/sdd:
MBR Magic : aa55
Partition[0] : 3907029167 sectors at 1 (type ee)
mdadm: No md superblock detected on /dev/sdd1.
/dev/sdd2:
Magic : a92b4efc
Version : 0.90.00
UUID : 5a33c710:006f668d:a4d2adc2:26fd5302 (local to host rescue.ovh.net)
Creation Time : Tue Sep 2 16:46:34 2014
Raid Level : raid1
Used Dev Size : 10238912 (9.76 GiB 10.48 GB)
Array Size : 10238912 (9.76 GiB 10.48 GB)
Raid Devices : 4
Total Devices : 3
Preferred Minor : 2
Update Time : Sat Dec 27 18:20:56 2014
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 1
Spare Devices : 0
Checksum : 78eed7ee - correct
Events : 503147
Number Major Minor RaidDevice State
this 3 8 50 3 active sync /dev/sdd2
0 0 8 2 0 active sync /dev/sda2
1 1 8 18 1 active sync /dev/sdb2
2 2 0 0 2 faulty removed
3 3 8 50 3 active sync /dev/sdd2
/dev/sdd3:
Magic : a92b4efc
Version : 0.90.00
UUID : 4a417350:7192f812:a4d2adc2:26fd5302 (local to host rescue.ovh.net)
Creation Time : Tue Sep 2 16:46:35 2014
Raid Level : raid5
Used Dev Size : 1942745600 (1852.75 GiB 1989.37 GB)
Array Size : 5828236800 (5558.24 GiB 5968.11 GB)
Raid Devices : 4
Total Devices : 3
Preferred Minor : 3
Update Time : Mon Dec 22 01:55:55 2014
State : active
Active Devices : 3
Working Devices : 3
Failed Devices : 1
Spare Devices : 0
Checksum : 4d429eeb - correct
Events : 109599
Layout : left-symmetric
Chunk Size : 512K
Number Major Minor RaidDevice State
this 3 8 51 3 active sync /dev/sdd3
0 0 8 3 0 active sync /dev/sda3
1 1 8 19 1 active sync /dev/sdb3
2 2 0 0 2 faulty removed
3 3 8 51 3 active sync /dev/sdd3
mdadm: No md superblock detected on /dev/sdd4.
编辑
在最后一次尝试中,我卸载了 /dev/md2 并使用 mdadm 将其停止。
然后我就可以组装/dev/md3了:
mdadm --assemble --force /dev/md3 /dev/sd[abd]3
mdadm: forcing event count in /dev/sdd3(3) from 109599 upto 109608
mdadm: clearing FAULTY flag for device 2 in /dev/md3 for /dev/sdd3
mdadm: Marking array /dev/md3 as 'clean'
mdadm: /dev/md3 has been started with 3 drives (out of 4).
当时的系统日志:
md/raid:md3: device sda3 operational as raid disk 0
md/raid:md3: device sdd3 operational as raid disk 3
md/raid:md3: device sdb3 operational as raid disk 1
md/raid:md3: allocated 4338kB
md/raid:md3: raid level 5 active with 3 out of 4 devices, algorithm 2
RAID conf printout:
--- level:5 rd:4 wd:3
disk 0, o:1, dev:sda3
disk 1, o:1, dev:sdb3
disk 3, o:1, dev:sdd3
md3: detected capacity change from 0 to 5968114483200
md3: unknown partition table
RAID 看起来还不错:
root@rescue:/mnt# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] [faulty]
md3 : active raid5 sda3[0] sdd3[3] sdb3[1]
5828236800 blocks level 5, 512k chunk, algorithm 2 [4/3] [UU_U]
[…]
但我无法安装它:
root@rescue:/mnt# mount /dev/md3 /mnt/home
mount: wrong fs type, bad option, bad superblock on /dev/md3,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so
在执行此操作时,系统日志中出现了许多错误:
ata4.00: exception Emask 0x0 SAct 0xfe SErr 0x0 action 0x0
ata4.00: irq_stat 0x40000008
ata4.00: failed command: READ FPDMA QUEUED
ata4.00: cmd 60/18:08:18:34:63/00:00:e5:00:00/40 tag 1 ncq 12288 in
res 41/40:18:18:34:63/00:00:e5:00:00/00 Emask 0x409 (media error) <F>
ata4.00: status: { DRDY ERR }
ata4.00: error: { UNC }
ata4.00: configured for UDMA/133
sd 3:0:0:0: [sdd] Unhandled sense code
sd 3:0:0:0: [sdd]
Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 3:0:0:0: [sdd]
Sense Key : Medium Error [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
e5 63 34 18
sd 3:0:0:0: [sdd]
Add. Sense: Unrecovered read error - auto reallocate failed
sd 3:0:0:0: [sdd] CDB:
Read(10): 28 00 e5 63 34 18 00 00 18 00
blk_update_request: 46 callbacks suppressed
end_request: I/O error, dev sdd, sector 3848483864
md/raid:md3: read error not correctable (sector 3828001816 on sdd3).
md/raid:md3: Disk failure on sdd3, disabling device.
我尝试用 来纠正它们,hdparm
但是它们太多了,而且每次都会出现一堆新的。
显然,正如系统日志中所提到的,md/raid:md3: Disk failure on sdd3, disabling device.
当我尝试挂载 md3 时,阵列的状态变为 FAILED。
看来这一场战斗,我已经输了……
答案1
如果我理解正确的话,您的 /dev/md3 是 raid5,应该由 /dev/sda3、/dev/sdb3、/dev/sdc3(不再存在)和 /dev/sdd3 组成。
那么您从 mdadm --detail /dev/md3 中得到了什么?
为什么 /dev/sdc 似乎仍然有一个损坏的分区表?
即使 /dev/sdc 中的分区仍然丢失,也许仍有可能恢复数据,但我会尝试以下救援操作:
1)使用一些实时 CD 来启动,不要挂载任何分区或 RAID 磁盘。
2) 制作所有磁盘的原始映像副本,是的,您将需要某个地方有超过 8 TB 的可用空间,也许在某个网络驱动器上。如果您的磁盘物理上没问题,您可以使用 dd 制作副本。如果某个磁盘物理损坏,您可能不得不使用某个 ddrescue 程序。
3)制作原始原始图像副本的工作原始图像副本,是的,您将需要另外 8 TB 的可用空间。
4) 使用虚拟机,如 qemu 或 virtualbox。首先使用适合数据救援的优质 Live CD 启动虚拟机。Systemrescuecd 可能是一个不错的选择。
5) 在虚拟机中,使用您工作的原始磁盘映像副本,尝试修复您工作的原始磁盘映像副本。一个可以开始的地方可能是向 /dev/sdc 的工作原始磁盘映像副本添加分区表。/dev/sdc 的分区表可能看起来与 /dev/sdd 的分区表相同。
6) 当您认为问题已解决时,从工作磁盘映像副本启动虚拟机。
7) 一旦虚拟机证明磁盘映像文件已修复,请将修复的映像复制回物理磁盘。如果某个物理磁盘损坏,您可能需要先更换它。
如果您在某个阶段发现您修复损坏的 raid 的尝试只会使事情变得更糟,请使用原始原始磁盘映像覆盖您正在工作的原始磁盘映像并重新启动。