我在 dmesg 上观察到 RAID 阵列的以下错误。我该如何查找 RAID 中的哪个驱动器坏了
[Fri Aug 26 19:31:13 2022] EXT4-fs warning (device md0): ext4_end_bio:349: I/O error 10 writing to inode 100728932 starting block 1514702321)
[Fri Aug 26 19:31:13 2022] buffer_io_error: 80 callbacks suppressed
[Fri Aug 26 19:31:13 2022] Buffer I/O error on device md0, logical block 1514702124
[Fri Aug 26 19:31:13 2022] Buffer I/O error on device md0, logical block 1514702125
我还用 smartctl 对 RAID 中的所有磁盘进行了健康检查,结果全部好的所以我想我可以重新安装磁盘一次并且这样应该可以工作,但我想要一个长期的解决方案来识别底层的坏磁盘或任何可以帮助我验证是什么触发了问题的命令?
答案1
由于我的 RAID 出现过问题,所以我编写了一个脚本来检查我的 RAID。结果发现我需要锁定 SATA 电缆,因为它们松动了,导致我的驱动器看起来像是出错了。
无论如何,这是我运行的脚本,下面是它的实际示例:
剧本:
#!/bin/bash
# Check for root
if [ "$EUID" -ne 0 ]; then
echo "Please run $0 as root"
echo ""
echo "example:"
echo "sudo $0"
exit 1
fi
# Check for smartmontools
smartctl -h > /dev/null
case $? in
1) echo "smartmontools is not installed. Please install it with the following command:"
echo ""
echo "sudo apt install smartmontools"
exit 1;;
0) ;;
esac
awk '/: active/ {print $1}' /proc/mdstat | while read drv
do
sudo mdadm -D /dev/$drv
done
echo ""
# Create drive array
drives=( $(smartctl --scan | awk '{print $1}') )
# Loop through array and check each drive
for ((i=0; i < ${#drives[@]}; ++i))
do
model=$(smartctl -a ${drives[$i]} | grep -i "device model:" | awk '{print substr($0,index($0,$3))}')
serial=$(smartctl -a ${drives[$i]} | grep -i "serial number:" | awk '{print $NF}')
result=$(smartctl -H ${drives[$i]} | awk '/overall-health/ {print $NF}')
echo -n "${drives[$i]} Model: $model Serial: $serial SMART: $result"
j=$(echo ${drives[$i]} | cut -d/ -f3); echo -n " Errors: "
grep -i error /var/log/kern.log 2>/dev/null | grep "$j," | wc -l
done
例子:
terrance@Intrepid:~$ sudo ./drive_check.bsh
/dev/md0:
Version : 1.2
Creation Time : Wed Dec 27 18:06:03 2017
Raid Level : raid1
Array Size : 484323328 (461.89 GiB 495.95 GB)
Used Dev Size : 484323328 (461.89 GiB 495.95 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Thu Sep 1 08:12:59 2022
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Consistency Policy : bitmap
Name : Intrepid:root (local to host Intrepid)
UUID : f9b257fc:d64f97c7:95581e88:004e3a4b
Events : 71486
Number Major Minor RaidDevice State
2 8 161 0 active sync /dev/sdk1
1 8 1 1 active sync /dev/sda1
/dev/md2:
Version : 1.2
Creation Time : Wed Dec 27 18:18:25 2017
Raid Level : raid1
Array Size : 3927040 (3.75 GiB 4.02 GB)
Used Dev Size : 3927040 (3.75 GiB 4.02 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent
Update Time : Sun Aug 7 10:59:07 2022
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Consistency Policy : resync
Name : Intrepid:swap (local to host Intrepid)
UUID : 2cdfcb03:e5e0c30f:d68d4e20:37b50e41
Events : 191
Number Major Minor RaidDevice State
2 8 165 0 active sync /dev/sdk5
1 8 5 1 active sync /dev/sda5
/dev/md1:
Version : 1.2
Creation Time : Tue Feb 3 01:16:55 2015
Raid Level : raid5
Array Size : 15627542528 (14.55 TiB 16.00 TB)
Used Dev Size : 3906885632 (3.64 TiB 4.00 TB)
Raid Devices : 5
Total Devices : 5
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Thu Sep 1 07:39:42 2022
State : clean
Active Devices : 5
Working Devices : 5
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Consistency Policy : bitmap
Name : Intrepid:1 (local to host Intrepid)
UUID : 3bb988cb:d5270497:36e75f46:67a9bc65
Events : 1155019
Number Major Minor RaidDevice State
0 8 81 0 active sync /dev/sdf1
1 8 97 1 active sync /dev/sdg1
2 8 113 2 active sync /dev/sdh1
3 8 129 3 active sync /dev/sdi1
5 8 145 4 active sync /dev/sdj1
/dev/sda Model: MAXTOR STM3500630A Serial: 9QG9152W SMART: PASSED Errors: 0
/dev/sdf Model: WDC WD40EFRX-68WT0N0 Serial: WD-WCC4EJPD3EXP SMART: PASSED Errors: 0
/dev/sdg Model: WDC WD40EFRX-68WT0N0 Serial: WD-WCC4E5UZUKPY SMART: PASSED Errors: 0
/dev/sdh Model: WDC WD40EFRX-68WT0N0 Serial: WD-WCC4E3XCP660 SMART: PASSED Errors: 0
/dev/sdi Model: WDC WD40EFRX-68WT0N0 Serial: WD-WCC4E7ZRRN8U SMART: PASSED Errors: 0
/dev/sdj Model: WDC WD40EFRX-68WT0N0 Serial: WD-WCC4EJXKY26C SMART: PASSED Errors: 0
/dev/sdk Model: ST3500418AS Serial: 6VM1HTNN SMART: PASSED Errors: 0