机器:linux centos 5.4,配备 2 个 hdd 和 raid 5(是的,第 3 个磁盘丢失)。
情况:
- 一切运行良好(缺少第 3 个磁盘)
- 然后关闭电源(电池电量耗尽时系统自行关闭)。
- 机器没有回来
屏幕上的消息:
Memory for crash kernel (0x0 to 0x0) notwithin permissible range
PCI: BIOS Bug: MCFC area at e0000000 is not E820-reserved
PCI: Not using MMCONFIG.
Red Hat nash version 5.1.19.6 starting
insmod: error inserting '/lib/raid456.ko': -1 File exists
md: md2: raid array is not clean -- starting background reconstruction
raid5: cannot start dirty degraded array for md2
raid5: failed to run raid set md2
md: pers->run() failed ...
md: md2: raid array is not clean -- starting background reconstruction
raid5: cannot start dirty degraded array for md2
raid5: failed to run raid set md2
md: pers->run() failed ...
EXT3-fs: unable to read superblock
mount: error mounting /dev/root on /sysroot as ext3: Invalid argument
setuproot: moving /dev failed: No such file or directory00
setuproot: error mounting /proc: No such file or directory
setuproot: mount failed: No such file or directory
Kernel panic - not syncing: Attempted to kill init!
因此我在记忆棒上安装了 sysresccd 并用它启动。然后我运行这些测试:
smartctl -t short /dev/sda
smartctl -X /dev/sda
smartctl -l selftest /dev/sda
与 sdb 相同。结果是:
sda: test=Short offline, status="Completed without error", remaining=00%, lifetime=19230, firsterror=-
sdb: test=Short offline, status="Completed: read failure", remaining=90%, lifetime=19256, firsterror=67031516
sdb的详细信息如下:
root@sysresccd /root % smartctl -A /dev/sdb
smartctl 5.42 2011-10-20 r3458 [i686-linux-3.0.21-std250-i586] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 180 180 021 Pre-fail Always - 5975
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 33
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 074 074 000 Old_age Always - 19256
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 32
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 27
193 Load_Cycle_Count 0x0032 183 183 000 Old_age Always - 51128
194 Temperature_Celsius 0x0022 111 093 000 Old_age Always - 39
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 17
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0
Current_Pending_Sector 17 可能有问题。
然后是进一步的步骤:1. 购买了 3x 2tb 硬盘 2. 用记忆棒启动 3. 将 2 个旧的 1.5tb 磁盘依次复制到 2 个新的磁盘上:dd if=/dev/sda of=dev/sdc bs=32M dd if=/dev/sdb of=dev/sdc bs=32M 4. 删除 2 个旧磁盘(以免情况变得更糟)5. 连接 3 个新磁盘。重新启动。
输出如下:
Memory for crash kernel (0x0 to 0x0) notwithin permissible range
PCI: BIOS Bug: MCFC area at e0000000 is not E820-reserved
PCI: Not using MMCONFIG.
Red Hat nash version 5.1.19.6 starting
insmod: error inserting '/lib/raid456.ko': -1 File exists
md: invalid raid superblock magic on sdb3
md: md2: raid array is not clean -- starting background reconstruction
raid5: not enough operational devices for md2 (2/3 failed)
raid5: failed to run raid set md2
md: pers->run() failed ...
md: md2: raid array is not clean -- starting background reconstruction
raid5: not enough operational devices for md2 (2/3 failed)
raid5: failed to run raid set md2
md: pers->run() failed ...
EXT3-fs: unable to read superblock
mount: error mounting /dev/root on /sysroot as ext3: Invalid argument
setuproot: moving /dev failed: No such file or directory
setuproot: error mounting /proc: No such file or directory
setuproot: error mounting /sys: No such file or directory
setuproot: mount failed: No such file or directory
Kernel panic - not syncing: Attempted to kill init!
因此我使用新磁盘和 sysresccd 从记忆棒启动。以下是一些信息:
fdisk -l
shows the two full disks exactly like the output was on the old disks
Device Boot Start End Blocks Id System
/dev/sda1 * 63 610469 305203+ fd Linux raid autodetect
/dev/sda2 610470 8803619 4096575 fd Linux raid autodetect
/dev/sda3 8803620 2930272064 1460734222+ fd Linux raid autodetect
/dev/sdb1 * 63 610469 305203+ fd Linux raid autodetect
/dev/sdb2 610470 8803619 4096575 fd Linux raid autodetect
/dev/sdb3 8803620 2930272064 1460734222+ fd Linux raid autodetect
sdc 不包含有效的分区表(这是空的第三个磁盘)
smartctl -t short /dev/sda
smartctl -X /dev/sda
smartctl -l selftest /dev/sda
sda: test=Short offline, status="Completed without error", remaining=00%, lifetime=19230, firsterror=-
sdb: test=Short offline, status="Completed: read failure", remaining=90%, lifetime=19256, firsterror=67031516
smartctl -A /dev/sdb
offline_uncorrectable: 0
然后:
root@sysresccd /root % cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md125 : inactive sda3[0](S)
1460734144 blocks
md126 : active raid1 sda1[0] sdb1[1]
305088 blocks [2/2] [UU]
md127 : active raid1 sda2[0] sdb2[1]
4096448 blocks [2/2] [UU]
unused devices: <none>
注意:raid5在那里显示为md125。
127 的详细信息:
root@sysresccd /root % mdadm --detail /dev/md127
/dev/md127:
Version : 0.90
Creation Time : Sun Dec 13 18:45:15 2009
Raid Level : raid1
Array Size : 4096448 (3.91 GiB 4.19 GB)
Used Dev Size : 4096448 (3.91 GiB 4.19 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 127
Persistence : Superblock is persistent
Update Time : Thu Mar 8 00:40:45 2012
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
UUID : 939f1a92:590d4172:2414ef47:5e2b15cb
Events : 0.236
Number Major Minor RaidDevice State
0 8 2 0 active sync /dev/sda2
1 8 18 1 active sync /dev/sdb2
对于 126:
root@sysresccd /root % mdadm --detail /dev/md126
/dev/md126:
Version : 0.90
Creation Time : Sun Dec 13 19:21:09 2009
Raid Level : raid1
Array Size : 305088 (297.99 MiB 312.41 MB)
Used Dev Size : 305088 (297.99 MiB 312.41 MB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 126
Persistence : Superblock is persistent
Update Time : Wed Mar 7 23:34:02 2012
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
UUID : bde56644:86d3e3a4:1128f4fe:0f47f21f
Events : 0.242
Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
1 8 17 1 active sync /dev/sdb1
125 的详细信息:
root@sysresccd /root % mdadm --detail /dev/md125
mdadm: md device /dev/md125 does not appear to be active.
sda3:
root@sysresccd /root % mdadm --examine /dev/sda3
/dev/sda3:
Magic : a92b4efc
Version : 0.90.00
UUID : 062f3190:b9337fc1:0b38f5df:7ec7c53b
Creation Time : Sun Dec 13 18:45:15 2009
Raid Level : raid5
Used Dev Size : 1460733952 (1393.06 GiB 1495.79 GB)
Array Size : 2921467904 (2786.13 GiB 2991.58 GB)
Raid Devices : 3
Total Devices : 2
Preferred Minor : 2
Update Time : Sat Mar 3 22:48:34 2012
State : active
Active Devices : 2
Working Devices : 2
Failed Devices : 1
Spare Devices : 0
Checksum : e5ac0d6c - correct
Events : 26243911
Layout : left-symmetric
Chunk Size : 256K
Number Major Minor RaidDevice State
this 0 8 3 0 active sync /dev/sda3
0 0 8 3 0 active sync /dev/sda3
1 1 8 19 1 active sync /dev/sdb3
2 2 0 0 2 faulty removed
sdb3:root@sysresccd /root% mdadm --examine /dev/sdb3 mdadm:在 /dev/sdb3 上未检测到 md 超级块。
然后:
root@sysresccd /root % mdadm --examine /dev/sd[ab]3 | egrep 'dev|Update|Role|State|Chunk Size'
mdadm: No md superblock detected on /dev/sdb3.
/dev/sda3:
Update Time : Sat Mar 3 22:48:34 2012
State : active
Chunk Size : 256K
Number Major Minor RaidDevice State
this 0 8 3 0 active sync /dev/sda3
0 0 8 3 0 active sync /dev/sda3
1 1 8 19 1 active sync /dev/sdb3
更多的:
root@sysresccd /root % mdadm --verbose --examine --scan
ARRAY /dev/md2 level=raid5 num-devices=3 UUID=062f3190:b9337fc1:0b38f5df:7ec7c53b
devices=/dev/sda3
ARRAY /dev/md126 level=raid1 num-devices=2 UUID=bde56644:86d3e3a4:1128f4fe:0f47f21f
devices=/dev/sdb1,/dev/sda1
ARRAY /dev/md127 level=raid1 num-devices=2 UUID=939f1a92:590d4172:2414ef47:5e2b15cb
devices=/dev/sdb2,/dev/sda2
(注意:这里列出的是 md125 而不是 md2)
root@sysresccd /root % mdadm --verbose --create --assume-clean /dev/md2 --level=5 --raid-devices=3 /dev/sda3 /dev/sdb3 missing
mdadm: layout defaults to left-symmetric
mdadm: chunk size defaults to 512K
mdadm: layout defaults to left-symmetric
mdadm: layout defaults to left-symmetric
mdadm: super1.x cannot open /dev/sda3: Device or resource busy
mdadm: failed container membership check
mdadm: device /dev/sda3 not suitable for any style of array
更新:可能是磁盘 sdb 的 dd 复制没有成功。sdb 的副本看起来很可疑,因此我运行以下命令:
root@sysresccd /root % dd if=/dev/sda3 of=/dev/sdc3 bs=128M
11144+1 records in
11144+1 records out
1495791843840 bytes (1.5 TB) copied, 42354.9 s, 35.3 MB/s
root@sysresccd /root % dd if=/dev/sdb3 of=/dev/sdd3 bs=128M
dd: reading `/dev/sdb3': Input/output error
222+1 records in
222+1 records out
29813932032 bytes (30 GB) copied, 676.459 s, 44.1 MB/s
root@sysresccd /root %
这次仅复制 sdb3 分区,因为 sdb1 和 sdb2 都没有问题。如您所见,它中止了。因此我现在运行:
ddrescue -S -c 20480 -f /dev/sdb3 /dev/sdd3 /tmp/log3
再次复制它,这次使用 ddrescue。这将花费更多时间,到目前为止有 errsize=17928 kB 和 errors=3。
当复制完成并且我发现更多信息时,我会更新此帖子。
答案1
(我自己回答)
ddrescue 解决了该问题,之后可以重新组装 raid5 阵列。