RAID 控制器消失导致系统速度减慢?

RAID 控制器消失导致系统速度减慢?

系统可能运行于RAID 5RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2108 [Liberator] (rev 05)

大约一周以来,系统变得非常慢。我在命令行上收到如下消息(取自 syslog)

Jan 17 18:16:12 HAUPTRECHNER kernel: [  840.329151] megacli.real    D ffff880402c7fc88     0  4058   4057 0x00000000
Jan 17 18:16:12 HAUPTRECHNER kernel: [  840.329186]  [<ffffffffc001ce51>] megasas_issue_blocked_cmd+0x121/0x210 [megaraid_sas]
Jan 17 18:16:12 HAUPTRECHNER kernel: [  840.329200]  [<ffffffffc00242a4>] megasas_mgmt_fw_ioctl+0x3e4/0xae0 [megaraid_sas]
Jan 17 18:16:12 HAUPTRECHNER kernel: [  840.329210]  [<ffffffffc0024b6b>] megasas_mgmt_ioctl_fw.isra.25+0x1cb/0x230 [megaraid_sas]
Jan 17 18:16:12 HAUPTRECHNER kernel: [  840.329218]  [<ffffffffc0024e48>] megasas_mgmt_ioctl+0x28/0x40 [megaraid_sas]

除此之外,系统日志中还充斥着这样的消息,我不知道这意味着什么:

Jan 17 18:13:44 HAUPTRECHNER kernel: [  692.360649] megaraid_sas 0000:02:00.0: RESET_GEN2: retry=5a, hostdiag=ffffffff
Jan 17 18:13:45 HAUPTRECHNER kernel: [  692.464643] megaraid_sas 0000:02:00.0: RESET_GEN2: retry=5b, hostdiag=ffffffff
Jan 17 18:13:45 HAUPTRECHNER kernel: [  692.568659] megaraid_sas 0000:02:00.0: RESET_GEN2: retry=5c, hostdiag=ffffffff
Jan 17 18:13:45 HAUPTRECHNER kernel: [  692.672630] megaraid_sas 0000:02:00.0: RESET_GEN2: retry=5d, hostdiag=ffffffff
Jan 17 18:13:45 HAUPTRECHNER kernel: [  692.776626] megaraid_sas 0000:02:00.0: RESET_GEN2: retry=5e, hostdiag=ffffffff
Jan 17 18:13:45 HAUPTRECHNER kernel: [  692.880619] megaraid_sas 0000:02:00.0: RESET_GEN2: retry=5f, hostdiag=ffffffff

在这种情况下,某些命令例如fdisk不会返回(一段时间后发出 120 秒消息)并且我得到ls类似的有线输出(忽略日期):

# ls -l 
-rw-r--r-- 1 root   root     1148 Jan  9 11:53 file
-rw-r--r-- 1 root   root     1320 Dez 13 10:28 file.1
-rw-r--r-- 1 root   root      300 Apr  1  2018 file.10.gz
-????-???- 1 ????   ????      252 Feb 12  2018 file.11.gz
-rw-r--r-- 1 root   root     2121 Jan 31  2018 file.12.gz
-rw-r--r-- 1 root   root      980 Nov 29 18:05 file.2.gz
-????-???- 1 ????   ????      252 Feb 12  2018 file.3.gz
-????-???- 1 ????   ????      252 Feb 12  2018 file.4.gz
-rw-r--r-- 1 root   root     1889 Okt 31 17:17 file.5.gz
-????-???- 1 ????   ????      252 Feb 12  2018 file.6.gz
-????-???- 1 ????   ????      252 Feb 12  2018 file.7.gz

但其他时候(重启两三次后)系统行为正常并ls显示正常输出。

HDS 本身看上去不错:

#megacli -PDList -a0 | egrep "flagged|Temperature|Firmware s|Port Number:"
Firmware state: Online, Spun Up
Connected Port Number: 1(path0) 
Drive Temperature :65C (149.00 F)
Drive has flagged a S.M.A.R.T alert : No
Firmware state: Online, Spun Up
Connected Port Number: 2(path0) 
Drive Temperature :69C (156.20 F)
Drive has flagged a S.M.A.R.T alert : No
Firmware state: Online, Spun Up
Connected Port Number: 3(path0) 
Drive Temperature :68C (154.40 F)
Drive has flagged a S.M.A.R.T alert : No
Firmware state: Online, Spun Up
Connected Port Number: 0(path0) 
Drive Temperature :60C (140.00 F)
Drive has flagged a S.M.A.R.T alert : No

这是否表明 RAID 控制器已损坏并需要更换?

相关内容