Linux:添加分区时重建软件 Raid 1 失败

Linux:添加分区时重建软件 Raid 1 失败

我昨天遇到了软件 Raid 的问题,其中一个磁盘必须更换。我使用以下方法从阵列中删除了分区

mdadm /dev/mdx -r /dev/sdbx

在托管中心更换了故障驱动器后,我将分区表应用到了新磁盘(sdb 是坏设备)

sgdisk -R /dev/sdb /dev/sda 

赋予它一个新的 ID:

sgdisk -G /dev/sdb

然后我使用以下命令再次添加所有分区:

mdadm /dev/mdx -r /dev/sdbx

除了一个分区外,其他所有分区都运行顺利,这个分区在几个小时后恢复到 60% 左右。这是突袭的当前状态:

cat /proc/mdstat 
Personalities : [raid1] 
md5 : active raid1 sda6[0] sdb6[2](S)
      2633910528 blocks super 1.2 [2/1] [U_]

md4 : active raid1 sda5[0] sdb5[2]
      16768896 blocks super 1.2 [2/2] [UU]

md3 : active raid1 sda4[0] sdb4[2]
      2096064 blocks super 1.2 [2/2] [UU]

md2 : active raid1 sda3[0] sdb3[2]
      268304192 blocks super 1.2 [2/2] [UU]

md1 : active raid1 sda2[0] sdb2[2]
      523968 blocks super 1.2 [2/2] [UU]

md0 : active raid1 sda1[0] sdb1[2]
      8384448 blocks super 1.2 [2/2] [UU]

unused devices: <none>

在系统日志中我可以看到如下消息:

n 23 14:24:04 rescue kernel: [11163.329021] ata1.00: exception Emask 0x0 SAct 0xf00000 SErr 0x0 action 0x0
Jan 23 14:24:04 rescue kernel: [11163.376449] ata1.00: configured for UDMA/133
Jan 23 14:24:04 rescue kernel: [11163.376475] sd 0:0:0:0: [sda] Unhandled sense code
Jan 23 14:24:04 rescue kernel: [11163.376477] sd 0:0:0:0: [sda]  
Jan 23 14:24:04 rescue kernel: [11163.376479] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jan 23 14:24:04 rescue kernel: [11163.376481] sd 0:0:0:0: [sda]  
Jan 23 14:24:04 rescue kernel: [11163.376483] Sense Key : Medium Error [current] [descriptor]
Jan 23 14:24:04 rescue kernel: [11163.376486] Descriptor sense data with sense descriptors (in hex):
Jan 23 14:24:04 rescue kernel: [11163.376487]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
Jan 23 14:24:04 rescue kernel: [11163.376495]         ce 1f 0d 58 
Jan 23 14:24:04 rescue kernel: [11163.376498] sd 0:0:0:0: [sda]  
Jan 23 14:24:04 rescue kernel: [11163.376501] Add. Sense: Unrecovered read error - auto reallocate failed
Jan 23 14:24:04 rescue kernel: [11163.376503] sd 0:0:0:0: [sda] CDB: 
Jan 23 14:24:04 rescue kernel: [11163.376504] Read(16): 88 00 00 00 00 00 ce 1f 0b 80 00 00 04 00 00 00
Jan 23 14:24:04 rescue kernel: [11163.376513] end_request: I/O error, dev sda, sector 3458141528

Jan 23 14:35:22 rescue kernel: [11840.396206] ata1.00: configured for UDMA/133
Jan 23 14:35:22 rescue kernel: [11840.396212] ata1.00: device reported invalid CHS sector 0
Jan 23 14:35:22 rescue kernel: [11840.396216] ata1.00: device reported invalid CHS sector 0
Jan 23 14:35:22 rescue kernel: [11840.396220] ata1.00: device reported invalid CHS sector 0
Jan 23 14:35:22 rescue kernel: [11840.396223] ata1.00: device reported invalid CHS sector 0
Jan 23 14:35:22 rescue kernel: [11840.396230] ata1: EH complete
Jan 23 14:35:52 rescue kernel: [11870.888343] ata1.00: exception Emask 0x0 SAct 0x40000007 SErr 0x0 action 0x6 frozen
Jan 23 14:35:52 rescue kernel: [11870.945207] ata1.00: cmd 60/00:08:80:c3:58/04:00:ce:00:00/40 tag 1 ncq 524288 in
Jan 23 14:35:52 rescue kernel: [11870.945207]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jan 23 14:35:52 rescue kernel: [11870.982487] ata1.00: cmd 60/80:10:00:c0:58/03:00:ce:00:00/40 tag 2 ncq 458752 in
Jan 23 14:35:52 rescue kernel: [11870.982487]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jan 23 14:35:53 rescue kernel: [11871.019291] ata1.00: cmd 60/00:f0:80:cb:58/04:00:ce:00:00/40 tag 30 ncq 524288 in
Jan 23 14:35:53 rescue kernel: [11871.019291]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jan 23 14:35:53 rescue kernel: [11871.055486] ata1: hard resetting link
Jan 23 14:35:53 rescue kernel: [11871.707811] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Jan 23 14:35:53 rescue kernel: [11871.708270] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20131218/psargs-359)
Jan 23 14:35:53 rescue kernel: [11871.708279] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff88041d869a88), AE_NOT_FOUND (20131
218/psparse-536)
Jan 23 14:35:53 rescue kernel: [11871.709174] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20131218/psargs-359)
Jan 23 14:35:53 rescue kernel: [11871.709182] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff88041d869a88), AE_NOT_FOUND (20131
218/psparse-536)

我可以挂载 /dev/md5 并列出文件。但是我无法将新分区添加到阵列。

有什么方法可以修复此问题且不丢失分区上的数据?

如果没有,是否可以只格式化该单个分区,然后再次添加新驱动器?我应该对该分区进行最新备份,这样就不会有问题了。如果可能的话,我只想删除所有分区。

smartctl 输出:

/dev/sda:

smartctl -a /dev/sda
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.14.27] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model:     ST3000DM001-1CH166
Serial Number:    Z1F1XJHC
LU WWN Device Id: 5 000c50 04f3fc2c7
Firmware Version: CC24
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Fri Jan 23 16:16:32 2015 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

Error SMART Values Read failed: scsi error aborted command
Smartctl: SMART Read Values failed.

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: UNKNOWN!
SMART Status, Attributes and Thresholds cannot be read.

SMART Error Log Version: 1
ATA Error Count: 107 (device log contains only the most recent five errors)
    CR = Command Register [HEX]
    FR = Features Register [HEX]
    SC = Sector Count Register [HEX]
    SN = Sector Number Register [HEX]
    CL = Cylinder Low Register [HEX]
    CH = Cylinder High Register [HEX]
    DH = Device/Head Register [HEX]
    DC = Device Command Register [HEX]
    ER = Error register [HEX]
    ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 107 occurred at disk power-on lifetime: 13180 hours (549 days + 4 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 08 ff ff ff ef 00      15:56:49.931  READ DMA EXT
  25 00 08 ff ff ff ef 00      15:56:48.680  READ DMA EXT
  ef 10 02 00 00 00 a0 00      15:56:48.644  SET FEATURES [Reserved for Serial ATA]
  27 00 00 00 00 00 e0 00      15:56:48.644  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 00      15:56:48.644  IDENTIFY DEVICE

Error 106 occurred at disk power-on lifetime: 13180 hours (549 days + 4 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 08 ff ff ff ef 00      15:56:45.363  READ DMA EXT
  25 00 08 ff ff ff ef 00      15:56:44.071  READ DMA EXT
  25 00 08 ff ff ff ef 00      15:56:42.789  READ DMA EXT
  25 00 08 ff ff ff ef 00      15:56:42.755  READ DMA EXT
  25 00 08 ff ff ff ef 00      15:56:42.722  READ DMA EXT

Error 105 occurred at disk power-on lifetime: 13180 hours (549 days + 4 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 08 ff ff ff ef 00      15:56:15.716  READ DMA EXT
  25 00 08 ff ff ff ef 00      15:56:12.832  READ DMA EXT
  25 00 08 ff ff ff ef 00      15:56:11.540  READ DMA EXT
  25 00 08 ff ff ff ef 00      15:56:10.290  READ DMA EXT
  25 00 08 ff ff ff ef 00      15:56:09.448  READ DMA EXT

Error 104 occurred at disk power-on lifetime: 13180 hours (549 days + 4 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 08 ff ff ff ef 00      15:56:02.563  READ DMA EXT
  25 00 08 ff ff ff ef 00      15:55:59.655  READ DMA EXT
  25 00 08 ff ff ff ef 00      15:55:58.319  READ DMA EXT
  25 00 08 ff ff ff ef 00      15:55:58.069  READ DMA EXT
  25 00 08 ff ff ff ef 00      15:55:57.838  READ DMA EXT

Error 103 occurred at disk power-on lifetime: 13180 hours (549 days + 4 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 80 ff ff ff ef 00      15:55:51.995  READ DMA EXT
  25 00 08 ff ff ff ef 00      15:55:50.735  READ DMA EXT
  ef 10 02 00 00 00 a0 00      15:55:50.700  SET FEATURES [Reserved for Serial ATA]
  27 00 00 00 00 00 e0 00      15:55:50.700  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 00      15:55:50.699  IDENTIFY DEVICE

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      4561         -
# 2  Extended offline    Completed without error       00%      2977         -
# 3  Extended offline    Completed without error       00%         5         -

Device does not support Selective Self Tests/Logging

/dev/sdb:

smartctl -a /dev/sdb
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.14.27] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model:     ST33000650NS
Serial Number:    Z295TK0G
LU WWN Device Id: 5 000c50 04f891ded
Firmware Version: 0004
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Size:      512 bytes logical/physical
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Fri Jan 23 16:15:30 2015 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                    was completed without error.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                    without error or no self-test has ever 
                    been run.
Total time to complete Offline 
data collection:        (  600) seconds.
Offline data collection
capabilities:            (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:    (   1) minutes.
Extended self-test routine
recommended polling time:    ( 255) minutes.
Conveyance self-test routine
recommended polling time:    (   2) minutes.
SCT capabilities:          (0x10bd) SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   078   053   044    Pre-fail  Always       -       70825960
  3 Spin_Up_Time            0x0003   093   093   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       11
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       1
  7 Seek_Error_Rate         0x000f   088   060   030    Pre-fail  Always       -       791126750
  9 Power_On_Hours          0x0032   092   092   000    Old_age   Always       -       7155
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       11
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   090   090   000    Old_age   Always       -       10
188 Command_Timeout         0x0032   100   099   000    Old_age   Always       -       1
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   066   043   045    Old_age   Always   In_the_past 34 (5 173 37 27)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       8
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       11
194 Temperature_Celsius     0x0022   034   057   000    Old_age   Always       -       34 (0 24 0 0)
195 Hardware_ECC_Recovered  0x001a   018   007   000    Old_age   Always       -       70825960
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
ATA Error Count: 18 (device log contains only the most recent five errors)
    CR = Command Register [HEX]
    FR = Features Register [HEX]
    SC = Sector Count Register [HEX]
    SN = Sector Number Register [HEX]
    CL = Cylinder Low Register [HEX]
    CH = Cylinder High Register [HEX]
    DH = Device/Head Register [HEX]
    DC = Device Command Register [HEX]
    ER = Error register [HEX]
    ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 18 occurred at disk power-on lifetime: 5559 hours (231 days + 15 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: WP at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 00 18 ff ff ff 4f 00  26d+03:52:28.560  WRITE FPDMA QUEUED
  60 00 00 ff ff ff 4f 00  26d+03:52:28.560  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00  26d+03:52:28.559  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00  26d+03:52:28.559  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00  26d+03:52:28.559  READ FPDMA QUEUED

Error 17 occurred at disk power-on lifetime: 5559 hours (231 days + 15 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 ff ff ff 4f 00  26d+03:52:13.471  READ FPDMA QUEUED
  60 00 58 d0 57 44 43 00  26d+03:52:13.471  READ FPDMA QUEUED
  61 00 02 08 90 6d 49 00  26d+03:52:13.471  WRITE FPDMA QUEUED
  ea 00 00 00 00 00 a0 00  26d+03:52:13.470  FLUSH CACHE EXT
  60 00 00 e0 42 20 4e 00  26d+03:52:13.422  READ FPDMA QUEUED

Error 16 occurred at disk power-on lifetime: 5559 hours (231 days + 15 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 00 ff ff ff 4f 00  26d+03:51:56.176  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00  26d+03:51:56.176  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00  26d+03:51:56.175  READ FPDMA QUEUED
  60 00 00 e0 0d 20 4e 00  26d+03:51:56.116  READ FPDMA QUEUED
  60 00 00 e0 0c 20 4e 00  26d+03:51:56.114  READ FPDMA QUEUED

Error 15 occurred at disk power-on lifetime: 5559 hours (231 days + 15 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 50 59 cb 43 00  26d+03:51:24.077  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00  26d+03:51:24.077  READ FPDMA QUEUED
  60 00 00 e0 c5 1c 4e 00  26d+03:51:24.076  READ FPDMA QUEUED
  ea 00 00 00 00 00 a0 00  26d+03:51:24.071  FLUSH CACHE EXT
  60 00 08 28 46 c1 43 00  26d+03:51:22.717  READ FPDMA QUEUED

Error 14 occurred at disk power-on lifetime: 5559 hours (231 days + 15 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 00 ff ff ff 4f 00  26d+03:51:02.317  READ FPDMA QUEUED
  61 00 08 ff ff ff 4f 00  26d+03:51:02.317  WRITE FPDMA QUEUED
  ea 00 00 00 00 00 a0 00  26d+03:51:02.316  FLUSH CACHE EXT
  60 00 08 ff ff ff 4f 00  26d+03:51:02.303  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00  26d+03:51:02.300  READ FPDMA QUEUED

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      7071         -
# 2  Extended offline    Completed without error       00%      7060         -
# 3  Extended offline    Completed without error       00%      5600         -
# 4  Short offline       Completed without error       00%      2489         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

答案1

在我看来,问题很大程度上是由于 上的这个未恢复的读取错误造成的sda。这是镜像中唯一当前处于活动状态的一半,因此如果无法读取,就无法干净地复制sdb6sda6重新同步镜像。

sda我注意到,自上次通过自检以来已经过去了近 10,000 小时,因此硬件故障可能也悄悄地影响了它,这似乎并不奇怪。如果你仍然可以读取/dev/md5你躲过一劫的内容,这意味着不可读的块不在文件中。备份该分区的内容,然后替换它sda这次用相当新的光盘替换它。一切稳定后,重新制作md5设备,并从备份中恢复。

一旦您恢复此系统,请确保您至少每月或每两个月对两个驱动器cron进行一次测试,否则这正是您收到的警告,表明事情正在变得糟糕。smartctl

相关内容