当一个驱动器出现故障时,访问 ZFS 池的速度非常慢

当一个驱动器出现故障时,访问 ZFS 池的速度非常慢

我继承了一个 ZFS 盒,它有很多问题。检查状态后,我发现有几个驱动器有问题:

ganymede $ zpool status -x
  pool: dpool
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
    continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Thu Feb 15 00:51:49 2024
    88.1M scanned out of 36.2T at 6.77M/s, (scan is slow, no estimated time)
    25.3M resilvered, 0.00% done
config:

    NAME                                      STATE     READ WRITE CKSUM
    dpool                                     DEGRADED     0     0     0
      mirror-0                                DEGRADED     0     0     0
        12151399272057691850                  UNAVAIL      0     0     0  was /dev/disk/by-id/ata-ST8000NM0055-1RM112_ZA11E6HJ-part1
        ata-ST8000NM0055-1RM112_ZA158JRW      ONLINE       0     0     0
      mirror-1                                DEGRADED     0     0     0
        ata-ST8000NM0055-1RM112_ZA15FG7E      ONLINE       0     0     0  (resilvering)
        ata-ST8000NM0055-1RM112_ZA15FGCM      DEGRADED    22     0    12  too many errors
      mirror-2                                ONLINE       0     0     0
        ata-ST8000NM0055-1RM112_ZA164M9J      ONLINE       0     0     0  (resilvering)
        ata-ST8000NM0055-1RM112_ZA164QKP      ONLINE       0     0     0
      mirror-3                                ONLINE       0     0     0
        ata-TOSHIBA_MC04ACA600A_X5J1K05JFE6C  ONLINE       0     0     0
        ata-TOSHIBA_MC04ACA600A_X5J9K004FE6C  ONLINE       0     0     0
      mirror-4                                ONLINE       0     0     0
        ata-TOSHIBA_MC04ACA600A_X5J9K005FE6C  ONLINE       0     0     0
        ata-TOSHIBA_MC04ACA600A_X5LEK019FE6C  ONLINE       0     0     0
      mirror-5                                ONLINE       0     0     0
        ata-TOSHIBA_MC04ACA600A_X5J9K007FE6C  ONLINE       0     0     0
        ata-TOSHIBA_MC04ACA600A_X5JFK001FE6C  ONLINE       0     0     0

errors: No known data errors

我正在尝试在更换磁盘之前从该系统中提取数据(将其备份到 s3)。但是,镜像 1 中的驱动器(ata-ST8000NM0055-1RM112_ZA15FGCM)出现问题,我相信它正在减慢所有数据操作的速度(如果我让重新同步,它会下降到 K/s,一周后它仍在运行)。

查看 dmesg 输出,我发现大量以下错误:

[  464.866611] mpt2sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
[  464.866635] sd 1:0:27:0: [sdaa] tag#0 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[  464.866637] sd 1:0:27:0: [sdaa] tag#2 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  464.866653] sd 1:0:27:0: [sdaa] tag#2 Sense Key : Medium Error [current] [descriptor]
[  464.866658] sd 1:0:27:0: [sdaa] tag#0 CDB: Read(16) 88 00 00 00 00 02 78 25 d7 38 00 00 00 08 00 00
[  464.866666] sd 1:0:27:0: [sdaa] tag#2 Add. Sense: Unrecovered read error
[  464.866670] print_req_error: I/O error, dev sdaa, sector 10605680440
[  464.866677] sd 1:0:27:0: [sdaa] tag#2 CDB: Read(16) 88 00 00 00 00 02 78 25 d5 68 00 00 00 f0 00 00
[  464.866767] print_req_error: critical medium error, dev sdaa, sector 10605680096

考虑到池中至少每个镜像都有一个良好的驱动器,有没有办法可以简单地移除导致问题的磁盘(我没有物理访问权限),以便我可以从服务器上获取数据?

我尝试禁用磁盘

sync
echo 1 > /sys/block/sdaa/device/delete

但是访问 ZFS 上的数据仍然非常慢(例如,使用 awscli 将 93mb 的文件复制到 AWS s3 需要 10 分钟)。

只是试图找出系统处于这种状态时最佳的前进路径。

相关内容