一个设备正常但暂时离线的故障 zpool 中,如何恢复?

一个设备正常但暂时离线的故障 zpool 中,如何恢复?

我有一个 zpool,其中有 4 个 2TB USB 磁盘,采用 raidz 配置:

[root@chef /mnt/Chef]# zpool status farcryz1
  pool: farcryz1
 state: ONLINE
 scrub: none requested
config:

    NAME        STATE     READ WRITE CKSUM
    farcryz1    ONLINE       0     0     0
      raidz1    ONLINE       0     0     0
        da1     ONLINE       0     0     0
        da2     ONLINE       0     0     0
        da3     ONLINE       0     0     0
        da4     ONLINE       0     0     0

为了测试池,我通过从其中一个驱动器拔出 USB 电缆(但不将其脱机)模拟了驱动器故障:

[root@chef /mnt/Chef]# zpool status farcryz1
  pool: farcryz1
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
    attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
    using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: none requested
config:

    NAME        STATE     READ WRITE CKSUM
    farcryz1    ONLINE       0     0     0
      raidz1    ONLINE       0     0     0
        da4     ONLINE      22     4     0
        da3     ONLINE       0     0     0
        da1     ONLINE       0     0     0
        da2     ONLINE       0     0     0

errors: No known data errors

数据仍在,池仍在线。太棒了!现在让我们尝试恢复池。我重新插入驱动器,并zpool replace按照上面的指示发出命令:

[root@chef /mnt/Chef]# zpool replace farcryz1 da4
invalid vdev specification
use '-f' to override the following errors:
/dev/da4 is part of active pool 'farcryz1'

嗯......这没用......所以我尝试了zpool clear farcryz1,但一点用都没有。我仍然无法替换。所以我尝试了ing、ing、ing、ing 和ingda4的组合。现在我被困在这里:onlineofflineclearreplacescrub

[root@chef /mnt/Chef]# zpool status -v farcryz1
  pool: farcryz1
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
    invalid.  Sufficient replicas exist for the pool to continue
    functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-4J
 scrub: scrub completed after 0h2m with 0 errors on Fri Sep  9 13:43:34 2011
config:

    NAME        STATE     READ WRITE CKSUM
    farcryz1    DEGRADED     0     0     0
      raidz1    DEGRADED     0     0     0
        da4     UNAVAIL      9     0     0  experienced I/O failures
        da3     ONLINE       0     0     0
        da1     ONLINE       0     0     0
        da2     ONLINE       0     0     0

errors: No known data errors
[root@chef /mnt/Chef]# zpool replace farcryz1 da4
cannot replace da4 with da4: da4 is busy

我该如何从这种情况中恢复呢?我的 zpool 中的一个设备意外断开了连接(但不是故障设备),现在又恢复了,准备重新镀银?


编辑:根据要求,ataildmesg

(ses3:umass-sim4:4:0:1): removing device entry
(da4:umass-sim4:4:0:0): removing device entry
ugen3.2: <Western Digital> at usbus3
umass4: <Western Digital My Book 1140, class 0/0, rev 3.00/10.03, addr 1> on usbus3
da4 at umass-sim4 bus 4 scbus6 target 0 lun 0
da4: <WD My Book 1140 1003> Fixed Direct Access SCSI-6 device 
da4: 400.000MB/s transfers
da4: 1907697MB (3906963456 512 byte sectors: 255H 63S/T 243197C)
ses3 at umass-sim4 bus 4 scbus6 target 0 lun 1
ses3: <WD SES Device 1003> Fixed Enclosure Services SCSI-6 device 
ses3: 400.000MB/s transfers
ses3: SCSI-3 SES Device
GEOM: da4: partition 1 does not start on a track boundary.
GEOM: da4: partition 1 does not end on a track boundary.
GEOM: da4: partition 1 does not start on a track boundary.
GEOM: da4: partition 1 does not end on a track boundary.
ugen3.2: <Western Digital> at usbus3 (disconnected)
umass4: at uhub3, port 1, addr 1 (disconnected)
(da4:umass-sim4:4:0:0): lost device
(da4:umass-sim4:4:0:0): removing device entry
(ses3:umass-sim4:4:0:1): lost device
(ses3:umass-sim4:4:0:1): removing device entry
ugen3.2: <Western Digital> at usbus3
umass4: <Western Digital My Book 1140, class 0/0, rev 3.00/10.03, addr 1> on usbus3
da4 at umass-sim4 bus 4 scbus6 target 0 lun 0
da4: <WD My Book 1140 1003> Fixed Direct Access SCSI-6 device 
da4: 400.000MB/s transfers
da4: 1907697MB (3906963456 512 byte sectors: 255H 63S/T 243197C)
ses3 at umass-sim4 bus 4 scbus6 target 0 lun 1
ses3: <WD SES Device 1003> Fixed Enclosure Services SCSI-6 device 
ses3: 400.000MB/s transfers
ses3: SCSI-3 SES Device

答案1

确定是否需要更换设备,并使用“zpool clear”清除错误或使用“zpool replace”更换设备。

看起来在最初的暂时故障之后,您可能只需要执行一个操作zpool clear来清除错误。

如果您想假装它是驱动器替换,您可能需要先清除驱动器上的数据,然后再尝试将其重新添加到池中。

答案2

您尝试过的各种命令的输出是什么?您是否尝试过切换-f其中任何一个命令?

你跑步了吗zpool clear poolname device-name

就您而言,zpool clear farcryz1 da4- 这应该已经开始重新镀银过程。

答案3

如果zpool clear不能解决问题,你可以使用以下命令让 zfs 忘记磁盘zpool labelclear <partition>(可在http://zfsonlinux.org自从zfs-v0.6.2)。

请注意,即使您使用整个设备创建了 zpool,/dev/sda您也必须指定 zfs 创建的分区,例如/dev/sda1

(感谢 DeHackEd,https://github.com/zfsonlinux/zfs/issues/2076

zpool 手册页

zpool labelclear [-f] device

Removes ZFS label information from the specified device. The device
must not be part of an active pool configuration.

  -f     Treat exported or foreign devices as inactive.

相关内容