smartctl 没有对硬盘执行测试

smartctl 没有对硬盘执行测试

我有一个 SATA 硬盘,通过 Intel 服务器的背板连接到 SAS 卡。该硬盘在 Linux 中似乎很容易访问,但我注意到日志中有一些奇怪的错误。我想看看这些错误是否与启动/初始化问题或其他问题有关,所以我想做一个 SMART 测试。

设备报告“整体健康自我评估测试结果:通过”,但我想自己运行一些 SMART 测试。我不确定为什么会失败,我的 Google-foo 让我失望了。

有人能解释一下以下内容的含义以及我是否可以解决这个问题 - 最好不要使驱动器脱机:

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Short self-test routine immediately in captive mode".
Command "Execute SMART Short self-test routine immediately in captive mode" failed: Connection timed out

(这是对命令“smartctl -t short -C /dev/sdd”的响应)

答案1

“captive”模式似乎不受支持(至少在 Linux 上?),遗憾的是我查看的任何地方都没有提到这一点。

所以我遇到了同样的问题,以为“强制”前台测试将具有完全优先级和可用带宽,因此完成得更快。但事实似乎并非如此。所以smartctl手册页具有误导性。

作为强制自检的一部分,smartctl 进程会一直等待驱动器完成并返回。但是,SATA 子系统会将此未完成的命令检测为驱动器挂起,并在/sys/block/<blockdev>/device/timeout几秒后中止。

dmesg将记录驱动器重置(在我的例子中,它挂在 Adaptec 控制器上),

dmesg 错误日志

[May 7 17:28] aacraid: Host adapter abort request.
              aacraid: Outstanding commands on (0,1,3,0):
[ +28.668009] aacraid: Host adapter abort request.
              aacraid: Outstanding commands on (0,1,3,0):
[  +0.024081] aacraid: Host bus reset request. SCSI hang ?
[  +0.000006] aacraid 0000:06:00.0: outstanding cmd: midlevel-0
[  +0.000002] aacraid 0000:06:00.0: outstanding cmd: lowlevel-0
[  +0.000001] aacraid 0000:06:00.0: outstanding cmd: error handler-1
[  +0.000001] aacraid 0000:06:00.0: outstanding cmd: firmware-0
[  +0.000001] aacraid 0000:06:00.0: outstanding cmd: kernel-0
[  +0.019997] aacraid 0000:06:00.0: Controller reset type is 3
[  +0.000004] aacraid 0000:06:00.0: Issuing IOP reset
[May 7 17:29] aacraid 0000:06:00.0: IOP reset succeeded
[  +0.033805] aacraid: Comm Interface type2 enabled
[  +2.217498] udevd[558]: worker [9103] /devices/pci0000:00/0000:00:0c.0/0000:06:00.0/host0/target0:1:3/0:1:3:0/block/sdd is taking a long time
[  +6.814903] aacraid 0000:06:00.0: Scheduling bus rescan
[ +10.192816] sd 0:1:3:0: [sdd] tag#543 timing out command, waited 60s
[  +0.000007] sd 0:1:3:0: [sdd] tag#543 FAILED Result: hostbyte=DID_RESET driverbyte=DRIVER_OK cmd_age=109s
[  +0.000003] sd 0:1:3:0: [sdd] tag#543 CDB: ATA command pass through(16) 85 06 0c 00 d4 00 00 00 81 00 4f 00 c2 00 b0 00
[  +0.001052] sd 0:1:3:0: [sdd] 11721045168 512-byte logical blocks: (6.00 TB/5.46 TiB)
[  +0.000005] sd 0:1:3:0: [sdd] 4096-byte physical blocks
[  +0.003122]  sdd: sdd1 sdd2

然后驱动器记录失败的自检:

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short captive       Interrupted (host reset)      50%       196         -

描述该问题的票据smartmontools已被标记为“wontfix”:https://www.smartmontools.org/ticket/1153

我认为增加块设备超时不是延长自检时间的解决方案。所以我想我们无法运行强制测试。(对于原生 SCSI 驱动器来说可能有所不同?)

相关内容