mdstat 不匹配 cnt 不同步的块

mdstat 不匹配 cnt 不同步的块

我们的两台服务器都受到

mdstat mismatch cnt unsynchronized blocks

每个月初我们都会遇到此错误,我们必须使用以下方法修复突袭

echo 'repair' >/sys/block/<md id>/md/sync_action

如果我没记错的话,这个检查是由 mdcheck_start.timer.service 引起的。
修复它大约需要 5 个小时,之后它会自行修复,至少我是这么认为的。

问题是,这是修复 raid 不同步块的正确方法吗?是什么原因造成的?我如何判断这是硬件/磁盘错误?谢谢!

编辑:/etc/fstab 包含:

# /etc/fstab: static file system information.

# / was on /dev/md2p1 during curtin installation
/dev/disk/by-id/md-uuid-b0b68adb:353b70e8:fa806910:a78761e9-part1 / ext4 defaults 0 0

# /vol/data was on /dev/md3p1 during curtin installation
/dev/disk/by-id/md-uuid-2360fc63:991922f4:33aae17f:12f23590-part1 /vol/data ext4 defaults 0 0

# /boot was on /dev/md0p1 during curtin installation
/dev/disk/by-id/md-uuid-a76428ff:270597e7:70ed6c91:026d2441-part1 /boot ext4 defaults 0 0

UUID="5c389b41-007d-4893-b81c-5560cb2d6ff9" /vol/backup ext4 defaults 0 0

172.30.0.199:/vol/shared    /vol/shared    nfs    defaults    0 0

输出lsblk --discard

NAME        DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
loop0              0        4K       4G         0
loop1              0        4K       4G         0
loop2              0        4K       4G         0
loop3              0        4K       4G         0
loop4              0        4K       4G         0
loop5              0        4K       4G         0
loop6              0        4K       4G         0
loop7              0        4K       4G         0
loop8              0        4K       4G         0
sda                0        4K       2G         0
├─sda1             0        4K       2G         0
├─sda2             0        4K       2G         0
│ └─md0            0        4K       2G         0
│   └─md0p1        0        4K       2G         0
├─sda3             0        4K       2G         0
│ └─md1            0        4K       2G         0
│   └─md1p1        0        4K       2G         0
└─sda4             0        4K       2G         0
  └─md2            0        4K       2G         0
    └─md2p1        0        4K       2G         0
sdb                0        4K       2G         0
├─sdb1             0        4K       2G         0
├─sdb2             0        4K       2G         0
│ └─md0            0        4K       2G         0
│   └─md0p1        0        4K       2G         0
├─sdb3             0        4K       2G         0
│ └─md1            0        4K       2G         0
│   └─md1p1        0        4K       2G         0
└─sdb4             0        4K       2G         0
  └─md2            0        4K       2G         0
    └─md2p1        0        4K       2G         0
sdc                0        0B       0B         0
└─sdc1             0        0B       0B         0
nvme1n1            0      512B       2T         0
└─md3              0      512B       2T         0
  └─md3p1          0      512B       2T         0
nvme0n1            0      512B       2T         0
└─md3              0      512B       2T         0
  └─md3p1          0      512B       2T         0

输出smartctl -i /dev/sd[ab]

smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-92-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Intel S4510/S4610/S4500/S4600 Series SSDs
Device Model:     INTEL SSDSC2KG960G8
Serial Number:    BTYG024601ZC960CGN
LU WWN Device Id: 5 5cd2e4 152b3fddf
Firmware Version: XCV10120
User Capacity:    960,197,124,096 bytes [960 GB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Feb  2 07:43:15 2022 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

输出mdadm --detail /dev/md2

/dev/md2:
           Version : 1.2
     Creation Time : Tue Nov 24 21:02:34 2020
        Raid Level : raid1
        Array Size : 919731200 (877.12 GiB 941.80 GB)
     Used Dev Size : 919731200 (877.12 GiB 941.80 GB)
      Raid Devices : 2
     Total Devices : 2
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Wed Feb  2 07:43:33 2022
             State : active
    Active Devices : 2
   Working Devices : 2
    Failed Devices : 0
     Spare Devices : 0

Consistency Policy : bitmap

              Name : ubuntu-server:2
              UUID : b0b68adb:353b70e8:fa806910:a78761e9
            Events : 24281

    Number   Major   Minor   RaidDevice State
       0       8        4        0      active sync   /dev/sda4
       1       8       20        1      active sync   /dev/sdb4

输出smartctl -A -l error /dev/sda

smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-92-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0032   100   100   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       10469
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       8
170 Available_Reservd_Space 0x0033   100   100   010    Pre-fail  Always       -       0
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
174 Unsafe_Shutdown_Count   0x0032   100   100   000    Old_age   Always       -       7
175 Power_Loss_Cap_Test     0x0033   100   100   010    Pre-fail  Always       -       2591 (8 65535)
183 SATA_Downshift_Count    0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error_Count  0x0033   100   100   090    Pre-fail  Always       -       0
187 Uncorrectable_Error_Cnt 0x0032   100   100   000    Old_age   Always       -       0
190 Drive_Temperature       0x0022   079   075   000    Old_age   Always       -       21 (Min/Max 12/27)
192 Unsafe_Shutdown_Count   0x0032   100   100   000    Old_age   Always       -       7
194 Temperature_Celsius     0x0022   100   100   000    Old_age   Always       -       21
197 Pending_Sector_Count    0x0012   100   100   000    Old_age   Always       -       0
199 CRC_Error_Count         0x003e   100   100   000    Old_age   Always       -       0
225 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       1006057
226 Workld_Media_Wear_Indic 0x0032   100   100   000    Old_age   Always       -       419
227 Workld_Host_Reads_Perc  0x0032   100   100   000    Old_age   Always       -       52
228 Workload_Minutes        0x0032   100   100   000    Old_age   Always       -       628023
232 Available_Reservd_Space 0x0033   100   100   010    Pre-fail  Always       -       0
233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       -       0
234 Thermal_Throttle_Status 0x0032   100   100   000    Old_age   Always       -       0/0
235 Power_Loss_Cap_Test     0x0033   100   100   010    Pre-fail  Always       -       2591 (8 65535)
241 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       1006057
242 Host_Reads_32MiB        0x0032   100   100   000    Old_age   Always       -       1112548
243 NAND_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       1730576

SMART Error Log Version: 1
No Errors Logged

输出smartctl -A -l error /dev/sdb

smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-92-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0032   100   100   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       10469
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       8
170 Available_Reservd_Space 0x0033   100   100   010    Pre-fail  Always       -       0
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
174 Unsafe_Shutdown_Count   0x0032   100   100   000    Old_age   Always       -       7
175 Power_Loss_Cap_Test     0x0033   100   100   010    Pre-fail  Always       -       2479 (8 65535)
183 SATA_Downshift_Count    0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error_Count  0x0033   100   100   090    Pre-fail  Always       -       0
187 Uncorrectable_Error_Cnt 0x0032   100   100   000    Old_age   Always       -       0
190 Drive_Temperature       0x0022   078   073   000    Old_age   Always       -       22 (Min/Max 12/29)
192 Unsafe_Shutdown_Count   0x0032   100   100   000    Old_age   Always       -       7
194 Temperature_Celsius     0x0022   100   100   000    Old_age   Always       -       22
197 Pending_Sector_Count    0x0012   100   100   000    Old_age   Always       -       0
199 CRC_Error_Count         0x003e   100   100   000    Old_age   Always       -       0
225 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       1064411
226 Workld_Media_Wear_Indic 0x0032   100   100   000    Old_age   Always       -       440
227 Workld_Host_Reads_Perc  0x0032   100   100   000    Old_age   Always       -       45
228 Workload_Minutes        0x0032   100   100   000    Old_age   Always       -       628005
232 Available_Reservd_Space 0x0033   100   100   010    Pre-fail  Always       -       0
233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       -       0
234 Thermal_Throttle_Status 0x0032   100   100   000    Old_age   Always       -       0/0
235 Power_Loss_Cap_Test     0x0033   100   100   010    Pre-fail  Always       -       2479 (8 65535)
241 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       1064411
242 Host_Reads_32MiB        0x0032   100   100   000    Old_age   Always       -       876800
243 NAND_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       1801020

SMART Error Log Version: 1
No Errors Logged

相关内容