修复 Linux RAID5 阵列上的坏块 - 二手驱动器

修复 Linux RAID5 阵列上的坏块 - 二手驱动器

TL;DR:我购买的二手硬盘可以安全地用于备份 NAS 的副本吗?


我购买了四块二手西部数据 3TB Red HDD,并将它们放入我拥有的备用微服务器中。我计划使用此设置作为现有现场(备份)NAS 的辅助异地备份副本。

我已经使用 .NET 在每个驱动器上运行了“扩展自检” smartctl/dev/sdb/dev/sdc&上的三个测试/dev/sdd“已完成,没有错误”,但测试/dev/sda“已完成:读取失败”。

  1. 这些是物理坏块还是逻辑坏块?
  2. 如果它们符合逻辑,我该如何修复它们?
  3. 如果它们是物理的,继续使用 RAID 阵列中的驱动器是否安全,或者我真的应该花钱购买新驱动器吗?

我已经阅读了有关“的所有信息”坏块操作方法”页面,但它没有提供与 RAID 阵列相关的任何信息。特别是关于在尝试计算块号时如何考虑 RAID 偏移量。

我正在读这个答案对于类似的问题,有人提到您必须考虑“RAID 数据偏移”,但并没有进一步澄清。他们还有 RAID 1 阵列,而不是 RAID 5。我指的方程式是:

 b = (int)((L-S)*512/B)

where:
b = File System block number
B = File system block size in bytes
L = LBA of bad sector
S = Starting sector of partition as shown by fdisk -lu
and (int) denotes the integer part.

另外,从该示例中,当我运行:时,sudo fdisk -lu /dev/md0我只得到以下内容:

> sudo fdisk -lu /dev/md0
Disk /dev/md0: 8.19 TiB, 9001371697152 bytes, 17580804096 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 524288 bytes / 1572864 bytes

我没有看到驱动器表信息,例如: Device Boot Start End Blocks Id System

当我运行时我也没有得到更多信息:sudo fdisk -lu /dev/sda

> sudo fdisk -lu /dev/sda
Disk /dev/sda: 2.75 TiB, 3000592982016 bytes, 5860533168 sectors
Disk model: WDC WD30EFRX-68A
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

这意味着我不知道分区的起始扇区/dev/sda。如果我使用 RAID 数据偏移值 264192 并将其替换为S等式中的 ,我最终会得到: [(1342886576 - 264192) * 512] / 4096 = 167827798

如果我然后运行sudo hdparm --read-sector 167827798 /dev/sd0,我会得到reading sector 167827798: succeeded结果。

所以要么我的块/扇区号错误,要么坏块实际上没问题......

有人可以进一步帮助我吗?我已将smartctl输出包含在下面。


/dev/sda

  • 智能总体:通过
  • 长时间离线:读取失败
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-31-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD30EFRX-68AX9N0
Serial Number:    WD-WMC1T3647350
LU WWN Device Id: 5 0014ee 6ae046f8e
Firmware Version: 80.00A80
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Fri May 22 10:41:12 2020 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      ( 113) The previous self-test completed having
                    the read element of the test failed.
Total time to complete Offline 
data collection:        (39900) seconds.
Offline data collection
capabilities:            (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:    (   2) minutes.
Extended self-test routine
recommended polling time:    ( 400) minutes.
Conveyance self-test routine
recommended polling time:    (   5) minutes.
SCT capabilities:          (0x70bd) SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       11
  3 Spin_Up_Time            0x0027   185   179   021    Pre-fail  Always       -       5741
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       41
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   042   042   000    Old_age   Always       -       42583
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       41
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       27
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       13
194 Temperature_Celsius     0x0022   121   110   000    Old_age   Always       -       29
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       4
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       1

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       10%     42562         1342886576
# 2  Extended offline    Completed: read failure       90%     42534         1342886576
# 3  Short offline       Completed without error       00%     42534         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

/dev/sdb

  • 智能总体:通过
  • 延长离线时间:已完成且没有错误
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-31-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD30EFRX-68AX9N0
Serial Number:    WD-WMC1T3613933
LU WWN Device Id: 5 0014ee 6ae04400b
Firmware Version: 80.00A80
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Fri May 22 10:47:03 2020 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                    without error or no self-test has ever 
                    been run.
Total time to complete Offline 
data collection:        (40860) seconds.
Offline data collection
capabilities:            (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:    (   2) minutes.
Extended self-test routine
recommended polling time:    ( 410) minutes.
Conveyance self-test routine
recommended polling time:    (   5) minutes.
SCT capabilities:          (0x70bd) SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       1
  3 Spin_Up_Time            0x0027   182   176   021    Pre-fail  Always       -       5858
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       58
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       1
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   085   084   000    Old_age   Always       -       11555
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       41
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       28
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       12
194 Temperature_Celsius     0x0022   120   111   000    Old_age   Always       -       30
196 Reallocated_Event_Count 0x0032   199   199   000    Old_age   Always       -       1
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
ATA Error Count: 32 (device log contains only the most recent five errors)
    CR = Command Register [HEX]
    FR = Features Register [HEX]
    SC = Sector Count Register [HEX]
    SN = Sector Number Register [HEX]
    CL = Cylinder Low Register [HEX]
    CH = Cylinder High Register [HEX]
    DH = Device/Head Register [HEX]
    DC = Device Command Register [HEX]
    ER = Error register [HEX]
    ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 32 occurred at disk power-on lifetime: 1796 hours (74 days + 20 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 61 42 10 51 6f ae  Device Fault; Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ef 03 42 10 51 6f ae 00  43d+23:51:57.394  SET FEATURES [Set transfer mode]
  ef 03 0c 10 51 6f ae 00  43d+23:51:57.393  SET FEATURES [Set transfer mode]
  ec 00 01 00 00 00 a0 00  43d+23:51:57.392  IDENTIFY DEVICE
  ef 90 06 00 00 00 40 00  43d+23:51:38.870  SET FEATURES [Disable SATA feature]
  ef 90 06 00 00 00 40 00  43d+23:51:38.869  SET FEATURES [Disable SATA feature]

Error 31 occurred at disk power-on lifetime: 1796 hours (74 days + 20 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 61 0c 10 51 6f ae  Device Fault; Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ef 03 0c 10 51 6f ae 00  43d+23:51:57.393  SET FEATURES [Set transfer mode]
  ec 00 01 00 00 00 a0 00  43d+23:51:57.392  IDENTIFY DEVICE
  ef 90 06 00 00 00 40 00  43d+23:51:38.870  SET FEATURES [Disable SATA feature]
  ef 90 06 00 00 00 40 00  43d+23:51:38.869  SET FEATURES [Disable SATA feature]
  ef 90 06 00 00 00 40 00  43d+23:51:38.869  SET FEATURES [Disable SATA feature]

Error 30 occurred at disk power-on lifetime: 1796 hours (74 days + 20 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 61 06 00 00 00 40  Device Fault; Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ef 90 06 00 00 00 40 00  43d+23:51:38.870  SET FEATURES [Disable SATA feature]
  ef 90 06 00 00 00 40 00  43d+23:51:38.869  SET FEATURES [Disable SATA feature]
  ef 90 06 00 00 00 40 00  43d+23:51:38.869  SET FEATURES [Disable SATA feature]
  ef 90 06 00 00 00 40 00  43d+23:51:38.868  SET FEATURES [Disable SATA feature]
  ef 90 06 00 00 00 40 00  43d+23:51:38.868  SET FEATURES [Disable SATA feature]

Error 29 occurred at disk power-on lifetime: 1796 hours (74 days + 20 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 61 06 00 00 00 40  Device Fault; Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ef 90 06 00 00 00 40 00  43d+23:51:38.869  SET FEATURES [Disable SATA feature]
  ef 90 06 00 00 00 40 00  43d+23:51:38.869  SET FEATURES [Disable SATA feature]
  ef 90 06 00 00 00 40 00  43d+23:51:38.868  SET FEATURES [Disable SATA feature]
  ef 90 06 00 00 00 40 00  43d+23:51:38.868  SET FEATURES [Disable SATA feature]
  ef 90 06 00 00 00 40 00  43d+23:51:38.867  SET FEATURES [Disable SATA feature]

Error 28 occurred at disk power-on lifetime: 1796 hours (74 days + 20 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 61 06 00 00 00 40  Device Fault; Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ef 90 06 00 00 00 40 00  43d+23:51:38.869  SET FEATURES [Disable SATA feature]
  ef 90 06 00 00 00 40 00  43d+23:51:38.868  SET FEATURES [Disable SATA feature]
  ef 90 06 00 00 00 40 00  43d+23:51:38.868  SET FEATURES [Disable SATA feature]
  ef 90 06 00 00 00 40 00  43d+23:51:38.867  SET FEATURES [Disable SATA feature]
  ef 90 06 00 00 00 40 00  43d+23:51:38.667  SET FEATURES [Disable SATA feature]

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     11531         -
# 2  Extended offline    Aborted by host               10%     11521         -
# 3  Short offline       Completed without error       00%     11506         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

/dev/sdc

  • 智能总体:通过
  • 延长离线时间:已完成且没有错误
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-31-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD30EFRX-68AX9N0
Serial Number:    WD-WMC1T3643940
LU WWN Device Id: 5 0014ee 60359ae37
Firmware Version: 80.00A80
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Fri May 22 10:48:40 2020 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                    without error or no self-test has ever 
                    been run.
Total time to complete Offline 
data collection:        (40080) seconds.
Offline data collection
capabilities:            (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:    (   2) minutes.
Extended self-test routine
recommended polling time:    ( 402) minutes.
Conveyance self-test routine
recommended polling time:    (   5) minutes.
SCT capabilities:          (0x70bd) SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   182   177   021    Pre-fail  Always       -       5858
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       42
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   042   042   000    Old_age   Always       -       42582
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       42
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       29
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       12
194 Temperature_Celsius     0x0022   120   111   000    Old_age   Always       -       30
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     42558         -
# 2  Short offline       Completed without error       00%     42543         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

/dev/sdd

  • 智能总体:通过
  • 延长离线时间:已完成且没有错误
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-31-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD30EFRX-68AX9N0
Serial Number:    WD-WMC1T3642983
LU WWN Device Id: 5 0014ee 658ae70cd
Firmware Version: 80.00A80
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Fri May 22 10:49:48 2020 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                    without error or no self-test has ever 
                    been run.
Total time to complete Offline 
data collection:        (40320) seconds.
Offline data collection
capabilities:            (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:    (   2) minutes.
Extended self-test routine
recommended polling time:    ( 404) minutes.
Conveyance self-test routine
recommended polling time:    (   5) minutes.
SCT capabilities:          (0x70bd) SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   181   175   021    Pre-fail  Always       -       5950
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       42
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   047   047   000    Old_age   Always       -       39003
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       42
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       29
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       12
194 Temperature_Celsius     0x0022   120   112   000    Old_age   Always       -       30
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     38979         -
# 2  Short offline       Completed without error       00%     38969         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

输出来自 sudo mdadm --examine /dev/sda

> sudo mdadm --examine /dev/sda
/dev/sda:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x9
     Array UUID : 1457ad95:8434aaa1:93949de6:2e100471
           Name : remote-nas:0  (local to host remote-nas)
  Creation Time : Tue May 19 17:05:24 2020
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 5860268976 (2794.39 GiB 3000.46 GB)
     Array Size : 8790402048 (8383.18 GiB 9001.37 GB)
  Used Dev Size : 5860268032 (2794.39 GiB 3000.46 GB)
    Data Offset : 264192 sectors
   Super Offset : 8 sectors
   Unused Space : before=264112 sectors, after=944 sectors
          State : clean
    Device UUID : 7dc7b151:ee18b088:a54a5a7d:201420d3

Internal Bitmap : 8 sectors from superblock
    Update Time : Fri May 22 10:02:30 2020
  Bad Block Log : 512 entries available at offset 24 sectors - bad blocks present.
       Checksum : e90608f7 - correct
         Events : 20196

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)

答案1

这些是物理坏块还是逻辑坏块?

两者兼而有之。物理(驱动器本身)是Completed: read failureSMART 测试结果。对于重新分配/待处理/不可纠正的块来说也是一个非零值。

逻辑(mdadm 元数据)位于坏块日志bad blocks presentmdadm。您mdadm --examine-badblocks也可以(针对每个驱动器单独)进行检查。如果多个驱动器上有相同的坏块,md 设备将为这些驱动器返回软读取错误。

继续使用 RAID 阵列中的驱动器是否安全

我不会信任 SMART 失败或数据丢失到不可读扇区的驱动器。更换它,对其进行写入-读取-验证测试,然后做出决定。

如果它们符合逻辑,我该如何修复它们?

mdadm --replace理想情况下,坏块日志条目在有问题的驱动器上会消失。

如果替换后条目仍然存在,因为它们在多个驱动器上相同(这些块没有冗余),并且 md 阵列返回软读取错误,您可以使用 强制擦除坏块日志mdadm --assemble --update=force-no-bbl

随后,md 数组可能会返回此日志中的块的错误或过时的数据,这可能会导致数据损坏。

相关内容