

我有一块 WD Red 4 TB 磁盘(WD40EFRX-68WT0N0,固件 82.00A82),它偶尔会在 SMART 错误日志中显示无法纠正的读取错误,例如:

Error 43 [18] occurred at disk power-on lifetime: 13157 hours (548 days + 5 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 00 02 e9 e0 40 00  Error: UNC at LBA = 0x0002e9e0 = 190944

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 01 00 00 08 00 00 00 02 ea 48 40 00 12d+15:42:14.157  READ FPDMA QUEUED
  60 00 e0 00 00 00 00 00 02 e9 68 40 00 12d+15:42:14.157  READ FPDMA QUEUED
  60 00 e0 00 08 00 00 00 02 e8 88 40 00 12d+15:42:10.216  READ FPDMA QUEUED
  60 01 00 00 00 00 00 00 02 e7 88 40 00 12d+15:42:10.215  READ FPDMA QUEUED
  60 01 00 00 08 00 00 00 02 e6 88 40 00 12d+15:42:07.629  READ FPDMA QUEUED

(smartctl 的完整报告这里

出现最新错误时,zpool status 报告以下内容:

$ zpool status cloudpool
  pool: cloudpool
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
    attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
    using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-9P
  scan: scrub repaired 0B in 3h57m with 0 errors on Wed Oct 17 03:53:57 2018

    NAME                                          STATE     READ WRITE CKSUM
    cloudpool                                     ONLINE       0     0     0
      mirror-0                                    ONLINE       0     0     0
        ata-ST8000VN0022-2EL112_ZA17FZXF          ONLINE       0     0     0
        ata-ST8000VN0022-2EL112_ZA17H5D3          ONLINE       0     0     0
      mirror-1                                    ONLINE       0     0     0
        ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E5NFLRU3  ONLINE       1     0     0
        ata-ST4000VN000-2AH166_WDH0KMHT           ONLINE       0     0     0
      mirror-2                                    ONLINE       0     0     0
        ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N3EHHA2E  ONLINE       0     0     0
        ata-ST3000DM001-1CH166_Z1F1HL4V           ONLINE       0     0     0

errors: No known data errors

(以前,zpool scrub 的一些运行报告说它已经修复了一些数据,但这是我第一次看到这种新状态)。

然而,运行短期、传送和扩展 SMART 测试并未发现任何问题。

我还认为加载/卸载循环次数高得可疑,但这是一个红色驱动器,而不是绿色驱动器,并且 WD(wd5741.exe)的官方工具报告说无需执行任何操作。


编辑:虽然我使用的是 ECC RAM,但我的另一个驱动器出现了问题:

  pool: cloudpool
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
    attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
    using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-9P
  scan: scrub repaired 768K in 2h56m with 0 errors on Sun Jan 13 03:20:40 2019

    NAME                                          STATE     READ WRITE CKSUM
    cloudpool                                     ONLINE       0     0     0
      mirror-0                                    ONLINE       0     0     0
        ata-ST8000VN0022-2EL112_ZA17FZXF          ONLINE       0     0     0
        ata-ST8000VN0022-2EL112_ZA17H5D3          ONLINE       0     0     0
      mirror-1                                    ONLINE       0     0     0
        ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E5NFLRU3  ONLINE       0     0     0
        ata-ST4000VN000-2AH166_WDH0KMHT           ONLINE       0     0     0
      mirror-2                                    ONLINE       0     0     0
        ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N3EHHA2E  ONLINE       0     0     6
        ata-ST3000DM001-1CH166_Z1F1HL4V           ONLINE       0     0     0

errors: No known data errors
