我有一个“Areca ARC-1883IX-12”Raid 控制器固件 1.54,运行带有 XEN Hypervisor 的 OpenSuse 42.3。
我使用四个复制命令实例在本地文件系统中复制四个大二进制文件:
cp /arecaDriveMnt/bigfile1.dat /arecaDriveMnt/bigfile1Copy1.dat
如果我使用不同的进程创建此网络硬盘负载,我会在 /var/log/messages 中收到以下错误: 第一次出现此错误几秒钟后,I/O 吞吐量从~500MByte/s 下降到基本上为零,我需要重新启动机器才能再次访问 Raid HDD。
编辑:该错误与网络流量无关,如果我产生足够多的进程在本地磁盘上复制本地数据,也会发生该错误。
2018-04-05T14:11:39.267042+02:00 dom0 kernel: [ 3324.524188] arcmsr14: abort device command of scsi id = 6 lun = 0
2018-04-05T14:11:42.499045+02:00 dom0 kernel: [ 3327.756238] arcmsr14: abort device command of scsi id = 6 lun = 0
2018-04-05T14:11:45.731043+02:00 dom0 kernel: [ 3330.988233] arcmsr14: abort device command of scsi id = 6 lun = 0
2018-04-05T14:11:48.963033+02:00 dom0 kernel: [ 3334.220268] arcmsr14: abort device command of scsi id = 6 lun = 0
2018-04-05T14:11:52.195037+02:00 dom0 kernel: [ 3337.452336] arcmsr14: abort device command of scsi id = 6 lun = 0
2018-04-05T14:11:55.427038+02:00 dom0 kernel: [ 3340.684381] arcmsr14: abort device command of scsi id = 6 lun = 0
2018-04-05T14:11:58.659044+02:00 dom0 kernel: [ 3343.916533] arcmsr14: abort device command of scsi id = 6 lun = 0
2018-04-05T14:12:01.891054+02:00 dom0 kernel: [ 3347.148512] arcmsr: executing bus reset eh.....num_resets = 0, num_aborts = 7
2018-04-05T14:12:33.891069+02:00 dom0 kernel: [ 3379.148850] arcmsr14: wait 'abort all outstanding command' timeout
2018-04-05T14:12:33.891093+02:00 dom0 kernel: [ 3379.150370] arcmsr14: executing hw bus reset .....
2018-04-05T14:12:46.923049+02:00 dom0 kernel: [ 3392.181980] arcmsr14: wait 'get adapter firmware miscellaneous data' timeout
/sys/block/sdh/device/timeout
中的值为30
我没有对操作系统或 Bios Raid 控制器进行任何配置更改,问题存在于初始 openSuse 安装中,使用了优化的默认 BIOS 设置和未改动的 Areca raid 设置。
我尝试了以下方法来修复错误:
- 更新 BIOS
- 将 areca 内核模块“arcmsr”和“eth1”的 IRQ 调用分发到不同的处理器(参见这里)
- 禁用
irqbalance.service
有人遇到过类似的问题吗?如何解决?