我注意到我的 HP N54L 正在运行,并发现dmesg
报告如下:
[ 81.945530] btrfs read error corrected: ino 1 off 16685977600 (dev /dev/sdb sector 2636776)
[ 82.010023] btrfs read error corrected: ino 1 off 16637734912 (dev /dev/sdb sector 2589656)
[ 85.927604] verify_parent_transid: 43 callbacks suppressed
[ 85.927615] parent transid verify failed on 16956989440 wanted 13182 found 12799
[ 85.974600] parent transid verify failed on 16585043968 wanted 13145 found 12357
[ 89.903548] repair_io_failure: 26 callbacks suppressed
[ 89.903560] btrfs read error corrected: ino 1 off 16875483136 (dev /dev/sdb sector 2821816)
[ 115.951579] parent transid verify failed on 16963846144 wanted 13184 found 12802
[ 115.976830] btrfs read error corrected: ino 1 off 16963846144 (dev /dev/sdb sector 2908128)
[ 115.988907] parent transid verify failed on 16978874368 wanted 13187 found 12815
[ 543.848294] btrfs: device fsid e8f8fc09-3aae-4fce-85ca-fcf7665b9f02 devid 2 transid 13199 /dev/sdb
[ 1120.854825] verify_parent_transid: 5 callbacks suppressed
[ 1120.854838] parent transid verify failed on 16956600320 wanted 13184 found 12799
[ 1120.891229] repair_io_failure: 6 callbacks suppressed
[ 1120.891243] btrfs read error corrected: ino 1 off 16956600320 (dev /dev/sdb sector 2901016)
[ 1124.851937] parent transid verify failed on 16977842176 wanted 13187 found 12814
[ 1124.885429] btrfs read error corrected: ino 1 off 16977842176 (dev /dev/sdb sector 2921768)
这是我的 BTRFS 设置。跨 4x3TB HDD 的 RAID10:
$ sudo btrfs filesystem df /mnt/btrfs
Data, RAID10: total=136.00GiB, used=134.70GiB
System, RAID10: total=64.00MiB, used=20.00KiB
System, single: total=4.00MiB, used=0.00
Metadata, RAID10: total=1.00GiB, used=363.21MiB
$ sudo btrfs filesystem show /mnt/btrfs
Label: none uuid: <UUID>
Total devices 4 FS bytes used 135.05GiB
devid 1 size 2.73TiB used 68.54GiB path /dev/sda
devid 2 size 2.73TiB used 68.53GiB path /dev/sdb
devid 3 size 2.73TiB used 68.53GiB path /dev/sdc
devid 4 size 2.73TiB used 68.53GiB path /dev/sdd
我注意到 BTRFS 的设备统计数据......奇怪......:
$ sudo btrfs device stats /mnt/btrfs
[/dev/sda].write_io_errs 0
[/dev/sda].read_io_errs 0
[/dev/sda].flush_io_errs 0
[/dev/sda].corruption_errs 0
[/dev/sda].generation_errs 0
[/dev/sdb].write_io_errs 207275
[/dev/sdb].read_io_errs 127287
[/dev/sdb].flush_io_errs 0
[/dev/sdb].corruption_errs 0
[/dev/sdb].generation_errs 0
[/dev/sdc].write_io_errs 0
[/dev/sdc].read_io_errs 0
[/dev/sdc].flush_io_errs 0
[/dev/sdc].corruption_errs 0
[/dev/sdc].generation_errs 0
[/dev/sdd].write_io_errs 0
[/dev/sdd].read_io_errs 0
[/dev/sdd].flush_io_errs 0
[/dev/sdd].corruption_errs 0
[/dev/sdd].generation_errs 0
我已经为自己订购了一个备用 3TB HDD,以防万一,但我可以放心地假设它/dev/sdb
已损坏吗?我只是觉得 BTRFS 的报道有点奇怪[/dev/sdb].corruption_errs 0
。
是否有一种普遍接受的方法来证明 BTRFS RAID 阵列中的 HDD 已损坏?
答案1
我在家里的服务器上也看到过类似的性能下降(在上面运行带有 Btrfs 的 RAID-6)。它已被证明是驱动器之一,三次。
我做的第一件事是smartctl
为每个驱动器运行。然后,对于发生故障的驱动器,我注意到原始错误的数量:
smartctl -x /dev/sdf | fgrep Raw
跟踪那些。我有一个驱动器曾经显示过一些错误,但在重置布线后在过去 9 个月内一直稳定。不知道为什么,但我确实认为那个人“还没有死”。
如果错误计数再次增加,我会移除驱动器并更换驱动器(我可以承受 RAID-6 中的两个额外驱动器之一离线半天的风险)。