如何在 Linux 上的 RAID 设置中识别坏磁盘?

如何在 Linux 上的 RAID 设置中识别坏磁盘?

我在 dmesg 上观察到 RAID 阵列的以下错误。我该如何查找 RAID 中的哪个驱动器坏了

[Fri Aug 26 19:31:13 2022] EXT4-fs warning (device md0): ext4_end_bio:349: I/O error 10 writing to inode 100728932 starting block 1514702321)
[Fri Aug 26 19:31:13 2022] buffer_io_error: 80 callbacks suppressed
[Fri Aug 26 19:31:13 2022] Buffer I/O error on device md0, logical block 1514702124
[Fri Aug 26 19:31:13 2022] Buffer I/O error on device md0, logical block 1514702125

我还用 smartctl 对 RAID 中的所有磁盘进行了健康检查,结果全部好的所以我想我可以重新安装磁盘一次并且这样应该可以工作,但我想要一个长期的解决方案来识别底层的坏磁盘或任何可以帮助我验证是什么触发了问题的命令?

答案1

由于我的 RAID 出现过问题,所以我编写了一个脚本来检查我的 RAID。结果发现我需要锁定 SATA 电缆,因为它们松动了,导致我的驱动器看起来像是出错了。

无论如何,这是我运行的脚本,下面是它的实际示例:

剧本:

#!/bin/bash

# Check for root
if [ "$EUID" -ne 0 ]; then
  echo "Please run $0 as root"
  echo ""
  echo "example:"
  echo "sudo $0"
  exit 1
fi

# Check for smartmontools
smartctl -h > /dev/null
case $? in
    1) echo "smartmontools is not installed.  Please install it with the following command:"
    echo ""
    echo "sudo apt install smartmontools"
    exit 1;;
    0) ;;
esac

awk '/: active/ {print $1}' /proc/mdstat | while read drv
do
    sudo mdadm -D /dev/$drv
done

echo ""

# Create drive array
drives=( $(smartctl --scan | awk '{print $1}') )

# Loop through array and check each drive
for ((i=0; i < ${#drives[@]}; ++i))
do
    model=$(smartctl -a ${drives[$i]} | grep -i "device model:" | awk '{print substr($0,index($0,$3))}')
    serial=$(smartctl -a ${drives[$i]} | grep -i "serial number:" | awk '{print $NF}')
    result=$(smartctl -H ${drives[$i]} | awk '/overall-health/ {print $NF}')
    echo -n "${drives[$i]} Model: $model Serial: $serial SMART: $result"
    j=$(echo ${drives[$i]} | cut -d/ -f3); echo -n " Errors: "
    grep -i error /var/log/kern.log 2>/dev/null | grep "$j," | wc -l
done

例子:

terrance@Intrepid:~$ sudo ./drive_check.bsh 
/dev/md0:
           Version : 1.2
     Creation Time : Wed Dec 27 18:06:03 2017
        Raid Level : raid1
        Array Size : 484323328 (461.89 GiB 495.95 GB)
     Used Dev Size : 484323328 (461.89 GiB 495.95 GB)
      Raid Devices : 2
     Total Devices : 2
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Thu Sep  1 08:12:59 2022
             State : clean 
    Active Devices : 2
   Working Devices : 2
    Failed Devices : 0
     Spare Devices : 0

Consistency Policy : bitmap

              Name : Intrepid:root  (local to host Intrepid)
              UUID : f9b257fc:d64f97c7:95581e88:004e3a4b
            Events : 71486

    Number   Major   Minor   RaidDevice State
       2       8      161        0      active sync   /dev/sdk1
       1       8        1        1      active sync   /dev/sda1
/dev/md2:
           Version : 1.2
     Creation Time : Wed Dec 27 18:18:25 2017
        Raid Level : raid1
        Array Size : 3927040 (3.75 GiB 4.02 GB)
     Used Dev Size : 3927040 (3.75 GiB 4.02 GB)
      Raid Devices : 2
     Total Devices : 2
       Persistence : Superblock is persistent

       Update Time : Sun Aug  7 10:59:07 2022
             State : clean 
    Active Devices : 2
   Working Devices : 2
    Failed Devices : 0
     Spare Devices : 0

Consistency Policy : resync

              Name : Intrepid:swap  (local to host Intrepid)
              UUID : 2cdfcb03:e5e0c30f:d68d4e20:37b50e41
            Events : 191

    Number   Major   Minor   RaidDevice State
       2       8      165        0      active sync   /dev/sdk5
       1       8        5        1      active sync   /dev/sda5
/dev/md1:
           Version : 1.2
     Creation Time : Tue Feb  3 01:16:55 2015
        Raid Level : raid5
        Array Size : 15627542528 (14.55 TiB 16.00 TB)
     Used Dev Size : 3906885632 (3.64 TiB 4.00 TB)
      Raid Devices : 5
     Total Devices : 5
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Thu Sep  1 07:39:42 2022
             State : clean 
    Active Devices : 5
   Working Devices : 5
    Failed Devices : 0
     Spare Devices : 0

            Layout : left-symmetric
        Chunk Size : 512K

Consistency Policy : bitmap

              Name : Intrepid:1  (local to host Intrepid)
              UUID : 3bb988cb:d5270497:36e75f46:67a9bc65
            Events : 1155019

    Number   Major   Minor   RaidDevice State
       0       8       81        0      active sync   /dev/sdf1
       1       8       97        1      active sync   /dev/sdg1
       2       8      113        2      active sync   /dev/sdh1
       3       8      129        3      active sync   /dev/sdi1
       5       8      145        4      active sync   /dev/sdj1

/dev/sda Model: MAXTOR STM3500630A Serial: 9QG9152W SMART: PASSED Errors: 0
/dev/sdf Model: WDC WD40EFRX-68WT0N0 Serial: WD-WCC4EJPD3EXP SMART: PASSED Errors: 0
/dev/sdg Model: WDC WD40EFRX-68WT0N0 Serial: WD-WCC4E5UZUKPY SMART: PASSED Errors: 0
/dev/sdh Model: WDC WD40EFRX-68WT0N0 Serial: WD-WCC4E3XCP660 SMART: PASSED Errors: 0
/dev/sdi Model: WDC WD40EFRX-68WT0N0 Serial: WD-WCC4E7ZRRN8U SMART: PASSED Errors: 0
/dev/sdj Model: WDC WD40EFRX-68WT0N0 Serial: WD-WCC4EJXKY26C SMART: PASSED Errors: 0
/dev/sdk Model: ST3500418AS Serial: 6VM1HTNN SMART: PASSED Errors: 0

相关内容