ZFS 导致多个新驱动器出现故障

ZFS 导致多个新驱动器出现故障

我们决定用类似的 12TB 驱动器系统替换我们老化的主 NAS(由三个 48 驱动器 SAS 扩展器组成,每个 4TB 驱动器),同时重复使用大约一年前添加的一些较新的硬件、一个扩展器和 SAS 卡。我们决定尽可能保持简单和便宜,同时不占用任何额外的机架空间。

新硬件、服务器和两个扩展器都已到货,并使用 Debian Buster 和 buster-backports 存储库中提供的 ZFS 进行设置。ZFS 池由两个 U.2 SSD 驱动器镜像(用于日志)、两个 U.2 SSD 驱动器(用于缓存)、4 个 HDD 备用驱动器(每个扩展器 2 个)和 12 个 RAID-Z2 磁盘阵列(每个扩展器 6 个磁盘阵列)组成。一切看起来都很好,我开始使用一个脚本将数据从旧 NAS 复制到这个 NAS,该脚本利用了增量快照、zfs 发送和 zfs 接收。

脚本的第一次运行花费了很多天,但最终还是完成了。两端都没有问题。第二次运行也运行正常。第三次运行后,发现 ZFS Pool 存在很多问题。在 4 次突袭中,大量磁盘的状态变为 UNAVAILABLE 或 FAILED,所有 4 个备用磁盘都自动投入使用。zpool status 的输出如下。

# zpool status
  pool: bigvol
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
    continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Thu Jan 28 09:55:20 2021
    160T scanned at 11.5G/s, 151T issued at 10.8G/s, 160T total
    4.99T resilvered, 94.53% done, 0 days 00:13:46 to go
config:

    NAME                                           STATE     READ WRITE CKSUM
    bigvol                                         DEGRADED     0     0     0
      raidz2-0                                     ONLINE       0     0     0
        scsi-35000c500cacd481b                     ONLINE       0     0     0
        scsi-35000c500cacceddb                     ONLINE       0     0     0
        scsi-35000c500cacd5c4b                     ONLINE       0     0     0
        scsi-35000c500cacd19cb                     ONLINE       0     0     0
        scsi-35000c500cacd0f4f                     ONLINE       0     0     0
        scsi-35000c500cacd5efb                     ONLINE       0     0     0
        scsi-35000c500cacd133f                     ONLINE       0     0     0
      raidz2-1                                     ONLINE       0     0     0
        scsi-35000c500cab6617f                     ONLINE       0     0     0
        scsi-35000c500cacd131b                     ONLINE       0     0     0
        scsi-35000c500cacd1637                     ONLINE       0     0     0
        scsi-35000c500cacd0dd3                     ONLINE       0     0     0
        scsi-35000c500cab64247                     ONLINE       0     0     0
        scsi-35000c500cacd5f4b                     ONLINE       0     0     0
        scsi-35000c500cacd206b                     ONLINE       0     0     0
      raidz2-2                                     ONLINE       0     0     0
        scsi-35000c500cacd251f                     ONLINE       0     0     0
        scsi-35000c500cacf60a7                     ONLINE       0     0     0
        scsi-35000c500cacd55cb                     ONLINE       0     0     0
        scsi-35000c500cacd3a5f                     ONLINE       0     0     0
        scsi-35000c500cacd0fa7                     ONLINE       0     0     0
        scsi-35000c500cacd4cb3                     ONLINE       0     0     0
        scsi-35000c500cacd2edf                     ONLINE       0     0     0
      raidz2-3                                     DEGRADED     0     0     0
        scsi-35000c500cacd1627                     ONLINE       0     0     0
        scsi-35000c500cacd049f                     ONLINE       0     0     0
        scsi-35000c500cacdf9d3                     ONLINE       0     0     0
        scsi-35000c500cab51563                     DEGRADED     0     0     1  too many errors  (resilvering)
        scsi-35000c500cacd1c9b                     DEGRADED     0     0     0  too many errors
        scsi-35000c500cacdf757                     FAULTED      0    10    48  too many errors  (resilvering)
        scsi-35000c500cacd291b                     FAULTED      0    11    31  too many errors  (resilvering)
      raidz2-4                                     DEGRADED     0     0     0
        spare-0                                    DEGRADED     0     0    11
          scsi-35000c500cacdb54f                   FAULTED      0    18     0  too many errors  (resilvering)
          scsi-35000c500cacdc907                   DEGRADED     0     0     0  too many errors  (resilvering)
        scsi-35000c500cacd2c77                     DEGRADED     0     0     4  too many errors
        scsi-35000c500cacdbdd3                     DEGRADED     0     0    12  too many errors  (resilvering)
        scsi-35000c500cacd0a47                     DEGRADED     0     0     7  too many errors  (resilvering)
        scsi-35000c500cacdf107                     DEGRADED     0     0     4  too many errors  (resilvering)
        scsi-35000c500cacd59fb                     DEGRADED     0   195    79  too many errors  (resilvering)
        scsi-35000c500cacd5307                     DEGRADED     0   177    30  too many errors  (resilvering)
      raidz2-5                                     DEGRADED     0     0     0
        spare-0                                    DEGRADED     0     0    15
          scsi-35000c500cacd03a3                   FAULTED      0    12     0  too many errors  (resilvering)
          scsi-35000c500cacd340b                   ONLINE       0     0     0
        scsi-35000c500cacd29d7                     FAULTED      0    21    24  too many errors  (resilvering)
        scsi-35000c500cacd23d7                     DEGRADED     0     0    11  too many errors  (resilvering)
        scsi-35000c500cacd1c27                     DEGRADED     0     0    29  too many errors  (resilvering)
        spare-4                                    DEGRADED     0     0    32
          scsi-35000c500cacd26bb                   FAULTED      0    31     0  too many errors  (resilvering)
          scsi-35000c500cacd299f                   DEGRADED     0     0     0  too many errors  (resilvering)
        scsi-35000c500cacd258b                     DEGRADED     0   207    63  too many errors  (resilvering)
        spare-6                                    DEGRADED     0     0    24
          scsi-35000c500cacdf867                   FAULTED      0    15     0  too many errors  (resilvering)
          scsi-35000c500cacd60ef                   ONLINE       0     0     0
      raidz2-6                                     DEGRADED     0     0     0
        scsi-35000c500cacd2e37                     ONLINE       0     0     0
        scsi-35000c500cacd0ecf                     ONLINE       0     0     0
        11839096008852004814                       UNAVAIL      0     0     0  was /dev/disk/by-id/scsi-35000c500cacd1f8f-part1
        scsi-35000c500cacd088b                     ONLINE       0     0     0
        scsi-35000c500cacd28df                     ONLINE       0     0     0
        scsi-35000c500cacd068b                     ONLINE       0     0     0
        scsi-35000c500cacdbd77                     ONLINE       0     0     0
      raidz2-7                                     ONLINE       0     0     0
        scsi-35000c500cacd040b                     ONLINE       0     0     0
        scsi-35000c500cacd16bb                     ONLINE       0     0     0
        scsi-35000c500cacd4d37                     ONLINE       0     0     0
        scsi-35000c500cacd1b57                     ONLINE       0     0     0
        scsi-35000c500cacd0453                     ONLINE       0     0     0
        scsi-35000c500cacd3f6b                     ONLINE       0     0     0
        scsi-35000c500cacd0297                     ONLINE       0     0     0
      raidz2-8                                     ONLINE       0     0     0
        scsi-35000c500cacd4bcb                     ONLINE       0     0     0
        scsi-35000c500cacd36cf                     ONLINE       0     0     0
        scsi-35000c500cacd1983                     ONLINE       0     0     0
        scsi-35000c500cacd3aaf                     ONLINE       0     0     0
        scsi-35000c500cacda90b                     ONLINE       0     0     0
        scsi-35000c500cacd0d53                     ONLINE       0     0     0
        scsi-35000c500cacdaa1f                     ONLINE       0     0     0
      raidz2-9                                     ONLINE       0     0     0
        scsi-35000c500cacd3f13                     ONLINE       0     0     0
        scsi-35000c500cacd3187                     ONLINE       0     0     0
        scsi-35000c500cacd59a3                     ONLINE       0     0     0
        scsi-35000c500cacd0913                     ONLINE       0     0     0
        scsi-35000c500cacdf663                     ONLINE       0     0     0
        scsi-35000c500cacd156b                     ONLINE       0     0     0
        scsi-35000c500cacd203f                     ONLINE       0     0     0
      raidz2-10                                    ONLINE       0     0     0
        scsi-35000c500cacd4c97                     ONLINE       0     0     0
        scsi-35000c500cacd58a3                     ONLINE       0     0     0
        scsi-35000c500cacd2353                     ONLINE       0     0     0
        scsi-35000c500cacd3f67                     ONLINE       0     0     0
        scsi-35000c500cacd235f                     ONLINE       0     0     0
        scsi-35000c500cacdf14f                     ONLINE       0     0     0
        scsi-35000c500cacd2583                     ONLINE       0     0     0
      raidz2-11                                    ONLINE       0     0     0
        scsi-35000c500cacd2f87                     ONLINE       0     0     0
        scsi-35000c500cacdb557                     ONLINE       0     0     0
        scsi-35000c500cacd00f3                     ONLINE       0     0     0
        scsi-35000c500cacd3ea7                     ONLINE       0     0     0
        scsi-35000c500cacd23ff                     ONLINE       0     0     0
        scsi-35000c500cacd09d3                     ONLINE       0     0     0
        scsi-35000c500cacd3adb                     ONLINE       0     0     0
    logs    
      mirror-12                                    ONLINE       0     0     0
        nvme-eui.343842304db011100025384700000001  ONLINE       0     0     0
        nvme-eui.343842304db011060025384700000001  ONLINE       0     0     0
    cache
      nvme-eui.343842304db010920025384700000001    ONLINE       0     0     0
      nvme-eui.343842304db011080025384700000001    ONLINE       0     0     0
    spares
      scsi-35000c500cacdc907                       INUSE     currently in use
      scsi-35000c500cacd299f                       INUSE     currently in use
      scsi-35000c500cacd340b                       INUSE     currently in use
      scsi-35000c500cacd60ef                       INUSE     currently in use

errors: No known data errors

出于显而易见的原因,我已停止传输,并正在等待重新同步结束,然后再更换故障和不可用的驱动器。但是我想知道是否应该更换性能下降的驱动器?还有,是否有人知道为什么会发生这种情况?(除了一组坏驱动器的可能性之外。)或者也许我只需要关闭池并更换驱动器。无论哪种方式,我认为必须再次复制数据。

答案1

这个问题与两个 4U JBOD 机柜中的一个中的一条或两条损坏的内部 SAS 电缆有关。有问题的电缆从“主”外部 SAS 连接器连接到背板。用未使用的“次”外部连接器中的两条电缆替换它们可以解决问题。

相关内容