我继承了一个 ZFS 盒,它有很多问题。检查状态后,我发现有几个驱动器有问题:
ganymede $ zpool status -x
pool: dpool
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Thu Feb 15 00:51:49 2024
88.1M scanned out of 36.2T at 6.77M/s, (scan is slow, no estimated time)
25.3M resilvered, 0.00% done
config:
NAME STATE READ WRITE CKSUM
dpool DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
12151399272057691850 UNAVAIL 0 0 0 was /dev/disk/by-id/ata-ST8000NM0055-1RM112_ZA11E6HJ-part1
ata-ST8000NM0055-1RM112_ZA158JRW ONLINE 0 0 0
mirror-1 DEGRADED 0 0 0
ata-ST8000NM0055-1RM112_ZA15FG7E ONLINE 0 0 0 (resilvering)
ata-ST8000NM0055-1RM112_ZA15FGCM DEGRADED 22 0 12 too many errors
mirror-2 ONLINE 0 0 0
ata-ST8000NM0055-1RM112_ZA164M9J ONLINE 0 0 0 (resilvering)
ata-ST8000NM0055-1RM112_ZA164QKP ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
ata-TOSHIBA_MC04ACA600A_X5J1K05JFE6C ONLINE 0 0 0
ata-TOSHIBA_MC04ACA600A_X5J9K004FE6C ONLINE 0 0 0
mirror-4 ONLINE 0 0 0
ata-TOSHIBA_MC04ACA600A_X5J9K005FE6C ONLINE 0 0 0
ata-TOSHIBA_MC04ACA600A_X5LEK019FE6C ONLINE 0 0 0
mirror-5 ONLINE 0 0 0
ata-TOSHIBA_MC04ACA600A_X5J9K007FE6C ONLINE 0 0 0
ata-TOSHIBA_MC04ACA600A_X5JFK001FE6C ONLINE 0 0 0
errors: No known data errors
我正在尝试在更换磁盘之前从该系统中提取数据(将其备份到 s3)。但是,镜像 1 中的驱动器(ata-ST8000NM0055-1RM112_ZA15FGCM)出现问题,我相信它正在减慢所有数据操作的速度(如果我让重新同步,它会下降到 K/s,一周后它仍在运行)。
查看 dmesg 输出,我发现大量以下错误:
[ 464.866611] mpt2sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
[ 464.866635] sd 1:0:27:0: [sdaa] tag#0 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[ 464.866637] sd 1:0:27:0: [sdaa] tag#2 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 464.866653] sd 1:0:27:0: [sdaa] tag#2 Sense Key : Medium Error [current] [descriptor]
[ 464.866658] sd 1:0:27:0: [sdaa] tag#0 CDB: Read(16) 88 00 00 00 00 02 78 25 d7 38 00 00 00 08 00 00
[ 464.866666] sd 1:0:27:0: [sdaa] tag#2 Add. Sense: Unrecovered read error
[ 464.866670] print_req_error: I/O error, dev sdaa, sector 10605680440
[ 464.866677] sd 1:0:27:0: [sdaa] tag#2 CDB: Read(16) 88 00 00 00 00 02 78 25 d5 68 00 00 00 f0 00 00
[ 464.866767] print_req_error: critical medium error, dev sdaa, sector 10605680096
考虑到池中至少每个镜像都有一个良好的驱动器,有没有办法可以简单地移除导致问题的磁盘(我没有物理访问权限),以便我可以从服务器上获取数据?
我尝试禁用磁盘
sync
echo 1 > /sys/block/sdaa/device/delete
但是访问 ZFS 上的数据仍然非常慢(例如,使用 awscli 将 93mb 的文件复制到 AWS s3 需要 10 分钟)。
只是试图找出系统处于这种状态时最佳的前进路径。