我在 Ubuntu Server 14.04.3 LTS 上有一个 3 磁盘 RAID5 mdadm 阵列,总容量为 4TB。
由于与阵列无关的已更换设备导致内核崩溃,因此每次重新启动后阵列都会启动 [UU_]。我找到的临时解决方案是运行,mdadm --add /dev/md0 /dev/sdd1
因为它开始重建并且重建成功。
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid5 sdd1[4] sdb1[3] sdc1[1]
3906763776 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]
unused devices: <none>
但每次重新启动时我都必须执行此操作,并且我注意到磁盘编号似乎错误:4、3 和 1,而不是 2、1 和 0。
root@Bt-Networks-Server:~# mdadm --detail /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Fri Aug 1 00:53:53 2014
Raid Level : raid5
Array Size : 3906763776 (3725.78 GiB 4000.53 GB)
Used Dev Size : 1953381888 (1862.89 GiB 2000.26 GB)
Raid Devices : 3
Total Devices : 3
Persistence : Superblock is persistent
Update Time : Mon Oct 26 17:40:43 2015
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Name : Bt-Networks-Server:0 (local to host Bt-Networks-Server)
UUID : 4e860a9e:0b433a00:54d2c991:78ca3d15
Events : 83137
Number Major Minor RaidDevice State
4 8 49 0 active sync /dev/sdd1
1 8 33 1 active sync /dev/sdc1
3 8 17 2 active sync /dev/sdb1
我还在 dmesg 上找到了以下关于踢非新鲜磁盘的信息:
[ 2.430966] md: raid1 personality registered for level 1
[ 2.500019] raid6: sse2x1 3900 MB/s
[ 2.568110] raid6: sse2x2 4957 MB/s
[ 2.582322] md: bind<sdc1>
[ 2.583992] md: bind<sdd1>
[ 2.608030] usb 6-2: new low-speed USB device number 2 using uhci_hcd
[ 2.619248] md: bind<sdb1>
[ 2.620098] md: kicking non-fresh sdd1 from array!
[ 2.620103] md: unbind<sdd1>
[ 2.636013] raid6: sse2x4 6926 MB/s
[ 2.636015] raid6: using algorithm sse2x4 (6926 MB/s)
[ 2.636017] raid6: using ssse3x2 recovery algorithm
[ 2.637624] xor: measuring software checksum speed
[ 2.664021] usb 7-1: new low-speed USB device number 2 using uhci_hcd
[ 2.676012] prefetch64-sse: 10026.000 MB/sec
[ 2.716011] generic_sse: 8868.000 MB/sec
[ 2.716013] xor: using function: prefetch64-sse (10026.000 MB/sec)
[ 2.717321] async_tx: api initialized (async)
[ 2.725129] md: raid6 personality registered for level 6
[ 2.725131] md: raid5 personality registered for level 5
[ 2.725133] md: raid4 personality registered for level 4
[ 2.728509] md: export_rdev(sdd1)
[ 2.729556] md/raid:md0: device sdb1 operational as raid disk 2
[ 2.729559] md/raid:md0: device sdc1 operational as raid disk 1
[ 2.729927] md/raid:md0: allocated 0kB
[ 2.729976] md/raid:md0: raid level 5 active with 2 out of 3 devices, algorithm 2
[ 2.729983] RAID conf printout:
[ 2.729984] --- level:5 rd:3 wd:2
[ 2.729986] disk 1, o:1, dev:sdc1
[ 2.729988] disk 2, o:1, dev:sdb1
[ 2.730030] md0: detected capacity change from 0 to 4000526106624
[ 2.731863] md: raid10 personality registered for level 10
[ 2.755618] md0: unknown partition table
[ 2.812332] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)
我已经通过更新重新检查了 mdadm.conf:
root@Bt-Networks-Server:~# mdadm --detail --scan
ARRAY /dev/md/0 metadata=1.2 name=Bt-Networks-Server:0 UUID=4e860a9e:0b433a00:54d2c991:78ca3d15
保存到配置文件并运行 update-initramfs -u
是否有任何解决方案可以避免每次重新启动时添加和重建/重新同步阵列?
谢谢!
编辑:
/etc/mdadm/mdadm.conf 的内容:
root@Bt-Networks-Server:~# cat /etc/mdadm/mdadm.conf
# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#
# by default (built-in), scan all partitions (/proc/partitions) and all
# containers for MD superblocks. alternatively, specify devices to scan, using
# wildcards if desired.
#DEVICE partitions containers
# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes
# automatically tag new arrays as belonging to the local system
HOMEHOST <system>
# instruct the monitoring daemon where to send mail alerts
MAILADDR root
# definitions of existing MD arrays
# This file was auto-generated on Thu, 31 Jul 2014 23:42:00 -0300
# by mkconf $Id$
#ARRAY /dev/md/Bt-Networks-Server:0 metadata=1.2 name=Bt-Networks-Server:0 UUID=4e860a9e:0b433a00:54d2c991:78ca3d15
ARRAY /dev/md/0 metadata=1.2 UUID=4e860a9e:0b433a00:54d2c991:78ca3d15 name=Bt-Networks-Server:0
通过dmesg搜索,找到recovery相关日志
[ 185.105099] md: export_rdev(sdd1)
[ 185.220543] md: bind<sdd1>
[ 185.320114] RAID conf printout:
[ 185.320118] --- level:5 rd:3 wd:2
[ 185.320121] disk 0, o:1, dev:sdd1
[ 185.320123] disk 1, o:1, dev:sdc1
[ 185.320124] disk 2, o:1, dev:sdb1
[ 185.320272] md: recovery of RAID array md0
[ 185.320276] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
[ 185.320278] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[ 185.320281] md: using 128k window, over a total of 1953381888k.
[ 1009.812057] EXT4-fs (md0): recovery complete
[ 1009.896520] EXT4-fs (md0): mounted filesystem with ordered data mode. Opts: (null)
[ 1109.136229] perf interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
[19295.440128] md: md0: recovery done.
[19295.607089] RAID conf printout:
[19295.607096] --- level:5 rd:3 wd:3
[19295.607099] disk 0, o:1, dev:sdd1
[19295.607101] disk 1, o:1, dev:sdc1
[19295.607103] disk 2, o:1, dev:sdb1
还发现了一些有关成功定期数组检查的数据
[501643.369779] md: data-check of RAID array md0
[501643.369784] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
[501643.369786] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check.
[501643.369791] md: using 128k window, over a total of 1953381888k.
[518452.072029] md: md0: data-check done.