当我重新启动系统时,我的四个 Linux 软件 RAID 阵列中的一个会丢失两个设备中的一个。其他三个阵列工作正常。我在内核版本 2.6.32-5-amd64 上运行 RAID1。每次重新启动时,/dev/md2 都只出现一个设备。我可以通过 $ sudo mdadm /dev/md2 --add /dev/sdc1 手动添加设备。这很正常,并且 mdadm 确认设备已重新添加,如下所示:
mdadm: re-added /dev/sdc1
添加设备并允许阵列时间重新同步后,$ cat /proc/mdstat 的输出如下所示:
Personalities : [raid1]
md3 : active raid1 sda4[0] sdb4[1]
244186840 blocks super 1.2 [2/2] [UU]
md2 : active raid1 sdc1[0] sdd1[1]
732574464 blocks [2/2] [UU]
md1 : active raid1 sda3[0] sdb3[1]
722804416 blocks [2/2] [UU]
md0 : active raid1 sda1[0] sdb1[1]
6835520 blocks [2/2] [UU]
unused devices: <none>
然后我重新启动后,$ cat /proc/mdstat 的输出如下所示:
Personalities : [raid1]
md3 : active raid1 sda4[0] sdb4[1]
244186840 blocks super 1.2 [2/2] [UU]
md2 : active raid1 sdd1[1]
732574464 blocks [2/1] [_U]
md1 : active raid1 sda3[0] sdb3[1]
722804416 blocks [2/2] [UU]
md0 : active raid1 sda1[0] sdb1[1]
6835520 blocks [2/2] [UU]
unused devices: <none>
重启期间,以下是 $ sudo cat /var/log/syslog | grep mdadm 的输出:
Jun 22 19:00:08 rook mdadm[1709]: RebuildFinished event detected on md device /dev/md2
Jun 22 19:00:08 rook mdadm[1709]: SpareActive event detected on md device /dev/md2, component device /dev/sdc1
Jun 22 19:00:20 rook kernel: [ 7819.446412] mdadm: sending ioctl 1261 to a partition!
Jun 22 19:00:20 rook kernel: [ 7819.446415] mdadm: sending ioctl 1261 to a partition!
Jun 22 19:00:20 rook kernel: [ 7819.446782] mdadm: sending ioctl 1261 to a partition!
Jun 22 19:00:20 rook kernel: [ 7819.446785] mdadm: sending ioctl 1261 to a partition!
Jun 22 19:00:20 rook kernel: [ 7819.515844] mdadm: sending ioctl 1261 to a partition!
Jun 22 19:00:20 rook kernel: [ 7819.515847] mdadm: sending ioctl 1261 to a partition!
Jun 22 19:00:20 rook kernel: [ 7819.606829] mdadm: sending ioctl 1261 to a partition!
Jun 22 19:00:20 rook kernel: [ 7819.606832] mdadm: sending ioctl 1261 to a partition!
Jun 22 19:03:48 rook kernel: [ 8027.855616] mdadm: sending ioctl 1261 to a partition!
Jun 22 19:03:48 rook kernel: [ 8027.855620] mdadm: sending ioctl 1261 to a partition!
Jun 22 19:03:48 rook kernel: [ 8027.855950] mdadm: sending ioctl 1261 to a partition!
Jun 22 19:03:48 rook kernel: [ 8027.855952] mdadm: sending ioctl 1261 to a partition!
Jun 22 19:03:49 rook kernel: [ 8027.962169] mdadm: sending ioctl 1261 to a partition!
Jun 22 19:03:49 rook kernel: [ 8027.962171] mdadm: sending ioctl 1261 to a partition!
Jun 22 19:03:49 rook kernel: [ 8028.054365] mdadm: sending ioctl 1261 to a partition!
Jun 22 19:03:49 rook kernel: [ 8028.054368] mdadm: sending ioctl 1261 to a partition!
Jun 22 19:10:23 rook kernel: [ 9.588662] mdadm: sending ioctl 1261 to a partition!
Jun 22 19:10:23 rook kernel: [ 9.588664] mdadm: sending ioctl 1261 to a partition!
Jun 22 19:10:23 rook kernel: [ 9.601990] mdadm: sending ioctl 1261 to a partition!
Jun 22 19:10:23 rook kernel: [ 9.601991] mdadm: sending ioctl 1261 to a partition!
Jun 22 19:10:23 rook kernel: [ 9.602693] mdadm: sending ioctl 1261 to a partition!
Jun 22 19:10:23 rook kernel: [ 9.602695] mdadm: sending ioctl 1261 to a partition!
Jun 22 19:10:23 rook kernel: [ 9.605981] mdadm: sending ioctl 1261 to a partition!
Jun 22 19:10:23 rook kernel: [ 9.605983] mdadm: sending ioctl 1261 to a partition!
Jun 22 19:10:23 rook kernel: [ 9.606138] mdadm: sending ioctl 800c0910 to a partition!
Jun 22 19:10:23 rook kernel: [ 9.606139] mdadm: sending ioctl 800c0910 to a partition!
Jun 22 19:10:48 rook mdadm[1737]: DegradedArray event detected on md device /dev/md2
这是 mdadm.conf 文件:
ARRAY /dev/md0 metadata=0.90 UUID=92121d42:37f46b82:926983e9:7d8aad9b
ARRAY /dev/md1 metadata=0.90 UUID=9c1bafc3:1762d51d:c1ae3c29:66348110
ARRAY /dev/md2 metadata=0.90 UUID=98cea6ca:25b5f305:49e8ec88:e84bc7f0
ARRAY /dev/md3 metadata=1.2 name=rook:3 UUID=ca3fce37:95d49a09:badd0ddc:b63a4792
我还运行了 $ sudo smartctl -t long /dev/sdc,没有检测到任何硬件问题。只要我不重新启动,/dev/md2 似乎工作正常。有人有什么建议吗?
以下是重新添加设备并让其重新同步后 $ sudo mdadm -E /dev/sdc1 的输出:
/dev/sdc1:
Magic : a92b4efc
Version : 0.90.00
UUID : 98cea6ca:25b5f305:49e8ec88:e84bc7f0 (local to host rook)
Creation Time : Sun Jul 13 08:05:55 2008
Raid Level : raid1
Used Dev Size : 732574464 (698.64 GiB 750.16 GB)
Array Size : 732574464 (698.64 GiB 750.16 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 2
Update Time : Mon Jun 24 07:42:49 2013
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Checksum : 5fd6cc13 - correct
Events : 180998
Number Major Minor RaidDevice State
this 0 8 33 0 active sync /dev/sdc1
0 0 8 33 0 active sync /dev/sdc1
1 1 8 49 1 active sync /dev/sdd1
以下是重新添加设备并让其重新同步后 $ sudo mdadm -D /dev/md2 的输出:
/dev/md2:
Version : 0.90
Creation Time : Sun Jul 13 08:05:55 2008
Raid Level : raid1
Array Size : 732574464 (698.64 GiB 750.16 GB)
Used Dev Size : 732574464 (698.64 GiB 750.16 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 2
Persistence : Superblock is persistent
Update Time : Mon Jun 24 07:42:49 2013
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
UUID : 98cea6ca:25b5f305:49e8ec88:e84bc7f0 (local to host rook)
Events : 0.180998
Number Major Minor RaidDevice State
0 8 33 0 active sync /dev/sdc1
1 8 49 1 active sync /dev/sdd1
答案1
您是否在 /dev/sdc1 上设置了分区类型“fd”?它应该设置为启动时自动检测,但您仍然可以使用分区类型 83 手动添加它。
答案2
尝试使用 smartctl 检查磁盘
简短测试
smartctl --test=short /dev/your_disk
然后检查结果
smartctl -a /dev/your_disk
还有很长的一段(需要很长时间)
smartctl --test=long /dev/your_disk
重启后似乎 sdc 未连接:
md2 : active raid1 sdc1[0] sdd1[1]
732574464 blocks [2/2] [UU]
md2 : active raid1 sdd1[1]
732574464 blocks [2/1] [_U]
硬件(端口等)有问题吗?