Linux mdraid RAID 6,磁盘每隔几天就会随机掉线

Linux mdraid RAID 6,磁盘每隔几天就会随机掉线

我有一些运行 Debian 8 的服务器,其中 8x800GB SSD 配置为 RAID6。所有磁盘都连接到闪存为 IT 模式的 LSI-3008。在每台服务器中,我还为操作系统配备了 2 个磁盘对作为 RAID1。

当前状态

# dpkg -l|grep mdad
ii  mdadm                          3.3.2-5+deb8u1              amd64        tool to administer Linux MD arrays (software RAID)

# uname -a
Linux R5U32-B 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt25-2 (2016-04-08) x86_64 GNU/Linux

# more /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md2 : active raid6 sde1[1](F) sdg1[3] sdf1[2] sdd1[0] sdh1[7] sdb1[6] sdj1[5] sdi1[4]
      4687678464 blocks super 1.2 level 6, 512k chunk, algorithm 2 [8/7] [U_UUUUUU]
      bitmap: 3/6 pages [12KB], 65536KB chunk

md1 : active (auto-read-only) raid1 sda5[0] sdc5[1]
      62467072 blocks super 1.2 [2/2] [UU]
        resync=PENDING

md0 : active raid1 sda2[0] sdc2[1]
      1890881536 blocks super 1.2 [2/2] [UU]
      bitmap: 2/15 pages [8KB], 65536KB chunk

unused devices: <none>

# mdadm --detail /dev/md2
/dev/md2:
        Version : 1.2
  Creation Time : Fri Jun 24 04:35:18 2016
     Raid Level : raid6
     Array Size : 4687678464 (4470.52 GiB 4800.18 GB)
  Used Dev Size : 781279744 (745.09 GiB 800.03 GB)
   Raid Devices : 8
  Total Devices : 8
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Tue Jul 19 17:36:15 2016
          State : active, degraded
 Active Devices : 7
Working Devices : 7
 Failed Devices : 1
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 512K

           Name : R5U32-B:2  (local to host R5U32-B)
           UUID : 24299038:57327536:4db96d98:d6e914e2
         Events : 2514191

    Number   Major   Minor   RaidDevice State
       0       8       49        0      active sync   /dev/sdd1
       2       0        0        2      removed
       2       8       81        2      active sync   /dev/sdf1
       3       8       97        3      active sync   /dev/sdg1
       4       8      129        4      active sync   /dev/sdi1
       5       8      145        5      active sync   /dev/sdj1
       6       8       17        6      active sync   /dev/sdb1
       7       8      113        7      active sync   /dev/sdh1

       1       8       65        -      faulty   /dev/sde1

问题

RAID 6 阵列每隔 1-3 天左右会定期降级。原因是其中一个(任意一个)磁盘显示为故障,并出现以下错误:

#dmesg -T
[Sat Jul 16 05:38:45 2016] sd 0:0:3:0: attempting task abort! scmd(ffff8810350cbe00)
[Sat Jul 16 05:38:45 2016] sd 0:0:3:0: [sde] CDB:
[Sat Jul 16 05:38:45 2016] Synchronize Cache(10): 35 00 00 00 00 00 00 00 00 00
[Sat Jul 16 05:38:45 2016] scsi target0:0:3: handle(0x000d), sas_address(0x500304801707a443), phy(3)
[Sat Jul 16 05:38:45 2016] scsi target0:0:3: enclosure_logical_id(0x500304801707a47f), slot(3)
[Sat Jul 16 05:38:46 2016] sd 0:0:3:0: task abort: SUCCESS scmd(ffff8810350cbe00)
[Sat Jul 16 05:38:46 2016] end_request: I/O error, dev sde, sector 2064
[Sat Jul 16 05:38:46 2016] md: super_written gets error=-5, uptodate=0
[Sat Jul 16 05:38:46 2016] md/raid:md2: Disk failure on sde1, disabling device.md/raid:md2: Operation continuing on 7 devices.
[Sat Jul 16 05:38:46 2016] RAID conf printout:
[Sat Jul 16 05:38:46 2016]  --- level:6 rd:8 wd:7
[Sat Jul 16 05:38:46 2016]  disk 0, o:1, dev:sdd1
[Sat Jul 16 05:38:46 2016]  disk 1, o:0, dev:sde1
[Sat Jul 16 05:38:46 2016]  disk 2, o:1, dev:sdf1
[Sat Jul 16 05:38:46 2016]  disk 3, o:1, dev:sdg1
[Sat Jul 16 05:38:46 2016]  disk 4, o:1, dev:sdi1
[Sat Jul 16 05:38:46 2016]  disk 5, o:1, dev:sdj1
[Sat Jul 16 05:38:46 2016]  disk 6, o:1, dev:sdb1
[Sat Jul 16 05:38:46 2016]  disk 7, o:1, dev:sdh1
[Sat Jul 16 05:38:46 2016] RAID conf printout:
[Sat Jul 16 05:38:46 2016]  --- level:6 rd:8 wd:7
[Sat Jul 16 05:38:46 2016]  disk 0, o:1, dev:sdd1
[Sat Jul 16 05:38:46 2016]  disk 2, o:1, dev:sdf1
[Sat Jul 16 05:38:46 2016]  disk 3, o:1, dev:sdg1
[Sat Jul 16 05:38:46 2016]  disk 4, o:1, dev:sdi1
[Sat Jul 16 05:38:46 2016]  disk 5, o:1, dev:sdj1
[Sat Jul 16 05:38:46 2016]  disk 6, o:1, dev:sdb1
[Sat Jul 16 05:38:46 2016]  disk 7, o:1, dev:sdh1
[Sat Jul 16 12:40:00 2016] sd 0:0:7:0: attempting task abort! scmd(ffff88000d76eb00)

已经尝试过

我已经尝试了以下方法,但没有任何改善:

  • 将 /sys/block/md2/md/stripe_cache_size 从 256 增加到 16384
  • 将 dev.raid.speed_limit_min 从 1000 增加到 50000

需要你的帮助

这些错误是由 mdadm 配置、内核还是控制器引起的?

更新 20160802

遵循 ppetraki 和其他人的建议:

  • 使用原始磁盘而不是分区

    这并不能解决问题

  • 减小块大小

    块大小已修改为 128KB,然后为 64KB,但 RAID 卷在几天内仍然降级。dmesg 显示与之前的错误类似。我忘了尝试将块大小减小到 32KB。

  • 将 RAID 数量减少到 6 个磁盘

    我尝试过销毁现有的 RAID,将每个磁盘上的超级块清零,然后使用 6 个磁盘(原始磁盘)和 64KB 块创建 RAID6。减少磁盘 RAID 数量似乎可以让阵列的使用寿命更长,大约在降级前 4-7 天

  • 更新驱动程序

我刚刚将驱动程序更新为Linux_Driver_RHEL6-7_SLES11-12_P12(http://www.avagotech.com/products/server-storage/host-bus-adapters/sas-9300-8e)。磁盘错误仍然出现如下图所示

[Tue Aug  2 17:57:48 2016] sd 0:0:6:0: attempting task abort! scmd(ffff880fc0dd1980)
[Tue Aug  2 17:57:48 2016] sd 0:0:6:0: [sdg] CDB:
[Tue Aug  2 17:57:48 2016] Synchronize Cache(10): 35 00 00 00 00 00 00 00 00 00
[Tue Aug  2 17:57:48 2016] scsi target0:0:6: handle(0x0010), sas_address(0x50030480173ee946), phy(6)
[Tue Aug  2 17:57:48 2016] scsi target0:0:6: enclosure_logical_id(0x50030480173ee97f), slot(6)
[Tue Aug  2 17:57:49 2016] sd 0:0:6:0: task abort: SUCCESS scmd(ffff880fc0dd1980)
[Tue Aug  2 17:57:49 2016] end_request: I/O error, dev sdg, sector 0

就在几分钟前,我的阵列降级了。这次 /dev/sdf 和 /dev/sdg 显示错误“正在尝试中止任务!scmd”

[Tue Aug  2 21:26:02 2016]  
[Tue Aug  2 21:26:02 2016] sd 0:0:5:0: [sdf] CDB:
[Tue Aug  2 21:26:02 2016] Synchronize Cache(10): 35 00 00 00 00 00 00 00 00 00
[Tue Aug  2 21:26:02 2016] scsi target0:0:5: handle(0x000f), sas_address(0x50030480173ee945), phy(5)
[Tue Aug  2 21:26:02 2016] scsi target0:0:5: enclosure logical id(0x50030480173ee97f), slot(5)
[Tue Aug  2 21:26:02 2016] scsi target0:0:5: enclosure level(0x0000), connector name(     ^A)
[Tue Aug  2 21:26:03 2016] sd 0:0:5:0: task abort: SUCCESS scmd(ffff88103beb5240)
[Tue Aug  2 21:26:03 2016] sd 0:0:5:0: attempting task abort! scmd(ffff88107934e080)
[Tue Aug  2 21:26:03 2016] sd 0:0:5:0: [sdf] CDB:
[Tue Aug  2 21:26:03 2016] Read(10): 28 00 04 75 3b f8 00 00 08 00
[Tue Aug  2 21:26:03 2016] scsi target0:0:5: handle(0x000f), sas_address(0x50030480173ee945), phy(5)
[Tue Aug  2 21:26:03 2016] scsi target0:0:5: enclosure logical id(0x50030480173ee97f), slot(5)
[Tue Aug  2 21:26:03 2016] scsi target0:0:5: enclosure level(0x0000), connector name(     ^A)
[Tue Aug  2 21:26:03 2016] sd 0:0:5:0: task abort: SUCCESS scmd(ffff88107934e080)
[Tue Aug  2 21:26:04 2016] sd 0:0:5:0: [sdf] CDB:
[Tue Aug  2 21:26:04 2016] Read(10): 28 00 04 75 3b f8 00 00 08 00
[Tue Aug  2 21:26:04 2016] mpt3sas_cm0:         sas_address(0x50030480173ee945), phy(5)
[Tue Aug  2 21:26:04 2016] mpt3sas_cm0:         enclosure logical id(0x50030480173ee97f), slot(5)
[Tue Aug  2 21:26:04 2016] mpt3sas_cm0:         enclosure level(0x0000), connector name(     ^A)
[Tue Aug  2 21:26:04 2016] mpt3sas_cm0:         handle(0x000f), ioc_status(success)(0x0000), smid(35)
[Tue Aug  2 21:26:04 2016] mpt3sas_cm0:         request_len(4096), underflow(4096), resid(-4096)
[Tue Aug  2 21:26:04 2016] mpt3sas_cm0:         tag(65535), transfer_count(8192), sc->result(0x00000000)
[Tue Aug  2 21:26:04 2016] mpt3sas_cm0:         scsi_status(check condition)(0x02), scsi_state(autosense valid )(0x01)
[Tue Aug  2 21:26:04 2016] mpt3sas_cm0:         [sense_key,asc,ascq]: [0x06,0x29,0x00], count(18)
[Tue Aug  2 22:14:51 2016] sd 0:0:6:0: attempting task abort! scmd(ffff880931d8c840)
[Tue Aug  2 22:14:51 2016] sd 0:0:6:0: [sdg] CDB:
[Tue Aug  2 22:14:51 2016] Synchronize Cache(10): 35 00 00 00 00 00 00 00 00 00
[Tue Aug  2 22:14:51 2016] scsi target0:0:6: handle(0x0010), sas_address(0x50030480173ee946), phy(6)
[Tue Aug  2 22:14:51 2016] scsi target0:0:6: enclosure logical id(0x50030480173ee97f), slot(6)
[Tue Aug  2 22:14:51 2016] scsi target0:0:6: enclosure level(0x0000), connector name(     ^A)
[Tue Aug  2 22:14:51 2016] sd 0:0:6:0: task abort: SUCCESS scmd(ffff880931d8c840)
[Tue Aug  2 22:14:52 2016] sd 0:0:6:0: [sdg] CDB:
[Tue Aug  2 22:14:52 2016] Synchronize Cache(10): 35 00 00 00 00 00 00 00 00 00
[Tue Aug  2 22:14:52 2016] mpt3sas_cm0:         sas_address(0x50030480173ee946), phy(6)
[Tue Aug  2 22:14:52 2016] mpt3sas_cm0:         enclosure logical id(0x50030480173ee97f), slot(6)
[Tue Aug  2 22:14:52 2016] mpt3sas_cm0:         enclosure level(0x0000), connector name(     ^A)
[Tue Aug  2 22:14:52 2016] mpt3sas_cm0:         handle(0x0010), ioc_status(success)(0x0000), smid(85)
[Tue Aug  2 22:14:52 2016] mpt3sas_cm0:         request_len(0), underflow(0), resid(-8192)
[Tue Aug  2 22:14:52 2016] mpt3sas_cm0:         tag(65535), transfer_count(8192), sc->result(0x00000000)
[Tue Aug  2 22:14:52 2016] mpt3sas_cm0:         scsi_status(check condition)(0x02), scsi_state(autosense valid )(0x01)
[Tue Aug  2 22:14:52 2016] mpt3sas_cm0:         [sense_key,asc,ascq]: [0x06,0x29,0x00], count(18)
[Tue Aug  2 22:14:52 2016] end_request: I/O error, dev sdg, sector 16
[Tue Aug  2 22:14:52 2016] md: super_written gets error=-5, uptodate=0
[Tue Aug  2 22:14:52 2016] md/raid:md2: Disk failure on sdg, disabling device. md/raid:md2: Operation continuing on 5 devices.
[Tue Aug  2 22:14:52 2016] RAID conf printout:
[Tue Aug  2 22:14:52 2016]  --- level:6 rd:6 wd:5
[Tue Aug  2 22:14:52 2016]  disk 0, o:1, dev:sdc
[Tue Aug  2 22:14:52 2016]  disk 1, o:1, dev:sdd
[Tue Aug  2 22:14:52 2016]  disk 2, o:1, dev:sde
[Tue Aug  2 22:14:52 2016]  disk 3, o:1, dev:sdf
[Tue Aug  2 22:14:52 2016]  disk 4, o:0, dev:sdg
[Tue Aug  2 22:14:52 2016]  disk 5, o:1, dev:sdh
[Tue Aug  2 22:14:52 2016] RAID conf printout:
[Tue Aug  2 22:14:52 2016]  --- level:6 rd:6 wd:5
[Tue Aug  2 22:14:52 2016]  disk 0, o:1, dev:sdc
[Tue Aug  2 22:14:52 2016]  disk 1, o:1, dev:sdd
[Tue Aug  2 22:14:52 2016]  disk 2, o:1, dev:sde
[Tue Aug  2 22:14:52 2016]  disk 3, o:1, dev:sdf
[Tue Aug  2 22:14:52 2016]  disk 5, o:1, dev:sdh

我认为错误“尝试中止任务!scmd”导致阵列降级,但不知道是什么原因造成的。

更新 20160806

我尝试设置具有相同规格的其他服务器。没有 mdadm RAID,每个磁盘都直接安装在 ext4 文件系统下。一段时间后,内核日志显示某些磁盘上的“正在尝试中止任务!scmd”。这导致 /dev/sdd1 错误,然后重新安装到只读模式

$ dmesg -T
[Sat Aug  6 05:21:09 2016] sd 0:0:3:0: [sdd] CDB:
[Sat Aug  6 05:21:09 2016] Read(10): 28 00 2d 29 21 00 00 00 20 00
[Sat Aug  6 05:21:09 2016] scsi target0:0:3: handle(0x000a), sas_address(0x4433221103000000), phy(3)
[Sat Aug  6 05:21:09 2016] scsi target0:0:3: enclosure_logical_id(0x500304801a5d3f01), slot(3)
[Sat Aug  6 05:21:09 2016] sd 0:0:3:0: task abort: SUCCESS scmd(ffff88006b206800)
[Sat Aug  6 05:21:09 2016] sd 0:0:3:0: attempting task abort! scmd(ffff88019a3a07c0)
[Sat Aug  6 05:21:09 2016] sd 0:0:3:0: [sdd] CDB:
[Sat Aug  6 05:21:09 2016] Read(10): 28 00 08 46 8f 80 00 00 20 00
[Sat Aug  6 05:21:09 2016] scsi target0:0:3: handle(0x000a), sas_address(0x4433221103000000), phy(3)
[Sat Aug  6 05:21:09 2016] scsi target0:0:3: enclosure_logical_id(0x500304801a5d3f01), slot(3)
[Sat Aug  6 05:21:09 2016] sd 0:0:3:0: task abort: SUCCESS scmd(ffff88019a3a07c0)
[Sat Aug  6 05:21:10 2016] sd 0:0:3:0: attempting device reset! scmd(ffff880f9a49ac80)
[Sat Aug  6 05:21:10 2016] sd 0:0:3:0: [sdd] CDB:
[Sat Aug  6 05:21:10 2016] Synchronize Cache(10): 35 00 00 00 00 00 00 00 00 00
[Sat Aug  6 05:21:10 2016] scsi target0:0:3: handle(0x000a), sas_address(0x4433221103000000), phy(3)
[Sat Aug  6 05:21:10 2016] scsi target0:0:3: enclosure_logical_id(0x500304801a5d3f01), slot(3)
[Sat Aug  6 05:21:10 2016] sd 0:0:3:0: device reset: SUCCESS scmd(ffff880f9a49ac80)
[Sat Aug  6 05:21:10 2016] mpt3sas0: log_info(0x31110e03): originator(PL), code(0x11), sub_code(0x0e03)
[Sat Aug  6 05:21:10 2016] mpt3sas0: log_info(0x31110e03): originator(PL), code(0x11), sub_code(0x0e03)
[Sat Aug  6 05:21:10 2016] mpt3sas0: log_info(0x31110e03): originator(PL), code(0x11), sub_code(0x0e03)
[Sat Aug  6 05:21:11 2016] end_request: I/O error, dev sdd, sector 780443696
[Sat Aug  6 05:21:11 2016] Aborting journal on device sdd1-8.
[Sat Aug  6 05:21:11 2016] EXT4-fs error (device sdd1): ext4_journal_check_start:56: Detected aborted journal
[Sat Aug  6 05:21:11 2016] EXT4-fs (sdd1): Remounting filesystem read-only
[Sat Aug  6 05:40:35 2016] sd 0:0:5:0: attempting task abort! scmd(ffff88024fc08340)
[Sat Aug  6 05:40:35 2016] sd 0:0:5:0: [sdf] CDB:
[Sat Aug  6 05:40:35 2016] Synchronize Cache(10): 35 00 00 00 00 00 00 00 00 00
[Sat Aug  6 05:40:35 2016] scsi target0:0:5: handle(0x000c), sas_address(0x4433221105000000), phy(5)
[Sat Aug  6 05:40:35 2016] scsi target0:0:5: enclosure_logical_id(0x500304801a5d3f01), slot(5)
[Sat Aug  6 05:40:35 2016] sd 0:0:5:0: task abort: FAILED scmd(ffff88024fc08340)
[Sat Aug  6 05:40:35 2016] sd 0:0:5:0: attempting task abort! scmd(ffff88019a12ee00)
[Sat Aug  6 05:40:35 2016] sd 0:0:5:0: [sdf] CDB:
[Sat Aug  6 05:40:35 2016] Read(10): 28 00 27 c8 b4 e0 00 00 20 00
[Sat Aug  6 05:40:35 2016] scsi target0:0:5: handle(0x000c), sas_address(0x4433221105000000), phy(5)
[Sat Aug  6 05:40:35 2016] scsi target0:0:5: enclosure_logical_id(0x500304801a5d3f01), slot(5)
[Sat Aug  6 05:40:35 2016] sd 0:0:5:0: task abort: SUCCESS scmd(ffff88019a12ee00)
[Sat Aug  6 05:40:35 2016] sd 0:0:5:0: attempting task abort! scmd(ffff88203eaddac0)

更新 20160930

将控制器固件升级到最新版本(当前)12.00.02 后,问题消失

结论

问题已解决

答案1

这是一个相当大的条带,8-2=6 * 512K = 3MiB;也不是一个偶数。将阵列增加到 10 个磁盘(8 个数据 + 2 个奇偶校验)或减少到 4 个 + 2 个奇偶校验,每个驱动器的总条带大小为 256K 或 64K。可能是缓存因未对齐的写入而对您不满意。您可以尝试将所有驱动器置于写通模式,然后再尝试重新配置阵列。

2016 年 7 月 20 日更新。

此时,我确信您的 RAID 配置存在问题。3MiB 条带很奇怪,即使它是分区偏移量 [1] (1MiB) 的倍数,对于任何 RAID、SSD 或其他设备来说,它都只是次优的条带大小。它可能会产生大量未对齐的写入,从而迫使您的 SSD 释放比现有页面更多的页面,这会不断将其推入垃圾收集器,并缩短其使用寿命。驱动器无法足够快地获得可用于写入的可用页面,因此当您最终将缓存刷新到磁盘(同步写入)时,它实际上会失败。您没有崩溃一致的阵列,例如您的数据不安全。

这是我根据现有信息和投入时间得出的理论。现在,您面前有一个成为存储专家的“成长机会” ;)

重新开始。不要使用分区。将系统放在一边,并构建一个总条带大小为 128K 的阵列(开始时稍微保守一点)。在总共 N 个驱动器的 RAID 6 配置中,只有 N-2 个驱动器一次获取数据,其余两个驱动器存储奇偶校验信息。因此,如果 N=6,则 128K 条带将需要 32K 块。现在您应该能够明白为什么 8 对于运行 RAID 6 来说是一个奇数。

然后以直接模式对“原始磁盘”运行 fio [2] 并对其进行敲击,直到您确信它稳定为止。接下来添加文件系统并告知其底层条带大小(man mkfs.???)。再次运行 fio,但这次使用文件(否则您将破坏文件系统)并确认阵列保持正常运行。

我知道这有很多“东西”,只需从小事做起,尝试了解它在做什么,然后坚持下去。blktrace 和 iostat 等工具可以帮助您了解应用程序的编写方式,从而告知您使用的最佳条带/块大小。

  1. https://www.percona.com/blog/2011/06/09/aligning-io-on-a-hard-disk-raid-the-theory/

(我的 fio 备忘单)2.https://wiki.mikejung.biz/Benchmarking#Fio_Random_Write_and_Random_Read_Command_Line_Examples

答案2

启动时检查并显示 SMART 读数。我怀疑你的磁盘有故障。尝试读取/写入故障扇区后,似乎超时了。也可能是电缆问题(接触不良、电缆断裂等)。我还看到磁盘有类似的固件问题。SMART 之后我应该会得到更多建议。

相关内容