从今天开始,当我尝试将适量的数据(MB 范围)写入外部硬盘驱动器上的 btrfs 卷时,该卷会切换为只读,从而中断操作。卷非常简单(没有 RAID,没有快照)。
journalctl
在写入时显示以下内容:
Jan 23 18:34:16 my-machine kernel: BTRFS: device label <...> devid 1 transid 3344 /dev/sdb1 Jan 23 18:34:16 my-machine kernel: BTRFS info (device sdb1): disk space caching is enabled Jan 23 18:34:16 my-machine kernel: BTRFS info (device sdb1): has skinny extents Jan 23 18:36:35 my-machine kernel: BTRFS critical (device sdb1): corrupt node: root=7 block=253655810048 slot=106, bad key order, current (18446744073709551606 128 9223372601711906816) next (18446744073709551606 128 564873670656) Jan 23 18:38:13 my-machine kernel: BTRFS critical (device sdb1): corrupt node: root=7 block=253655810048 slot=106, bad key order, current (18446744073709551606 128 9223372601711906816) next (18446744073709551606 128 564873670656) Jan 23 18:39:43 my-machine kernel: BTRFS critical (device sdb1): corrupt node: root=7 block=253655810048 slot=106, bad key order, current (18446744073709551606 128 9223372601711906816) next (18446744073709551606 128 564873670656) Jan 23 18:40:58 my-machine kernel: BTRFS critical (device sdb1): corrupt node: root=7 block=253655810048 slot=106, bad key order, current (18446744073709551606 128 9223372601711906816) next (18446744073709551606 128 564873670656) Jan 23 18:42:28 my-machine kernel: BTRFS critical (device sdb1): corrupt node: root=7 block=253655810048 slot=106, bad key order, current (18446744073709551606 128 9223372601711906816) next (18446744073709551606 128 564873670656) Jan 23 18:42:39 my-machine kernel: BTRFS critical (device sdb1): corrupt node: root=7 block=253655810048 slot=106, bad key order, current (18446744073709551606 128 9223372601711906816) next (18446744073709551606 128 564873670656) Jan 23 18:42:39 my-machine kernel: BTRFS critical (device sdb1): corrupt node: root=7 block=253655810048 slot=106, bad key order, current (18446744073709551606 128 9223372601711906816) next (18446744073709551606 128 564873670656) Jan 23 18:42:39 my-machine kernel: BTRFS: error (device sdb1) in btrfs_finish_ordered_io:3074: errno=-5 IO failure Jan 23 18:42:39 my-machine kernel: BTRFS info (device sdb1): forced readonly
一开始,btrfs check
给出了以下输出:
$ sudo btrfsck /dev/sdb1 Checking filesystem on /dev/sdb1 UUID: a69162a3-aeb3-43c0-b74d-cfd280bfa8b6 checking extents bad block 253655810048 ERROR: errors found in extent allocation tree or chunk allocation checking free space cache checking fs roots checking csums there are no extents for csum range 563128786944-564280360960 csum exists for 563128786944-564280377344 but there is no extent record there are no extents for csum range 566428172288-567179472896 Right section didn't have a record there are no extents for csum range 565354430464-567179472896 Right section didn't have a record there are no extents for csum range 564280688640-567179472896 Right section didn't have a record there are no extents for csum range 564280639488-567179472896 csum exists for 564280639488-567179472896 but there is no extent record ERROR: errors found in csum tree found 1681395552256 bytes used, error(s) found total csum bytes: 0 total tree bytes: 2406924288 total fs tree bytes: 2279718912 total extent tree bytes: 123813888 btree space waste bytes: 350386565 file data blocks allocated: 1685019078656 referenced 1685018304512
我跑了btrfs scrub
,但有时它会自行中断(然后我不得不重新安装驱动器):
$ sudo btrfs scrub start -B /mnt/hd ERROR: scrubbing /mnt/hd failed for device id 1: ret=-1, errno=5 (Input/output error) scrub canceled for a69162a3-aeb3-43c0-b74d-cfd280bfa8b6 scrub started at Wed Jan 23 21:26:28 2019 and was aborted after 00:45:20 total bytes scrubbed: 509.99GiB with 0 errors
不过,使用btrfs scrub resume
,它似乎确实完成了:
$ sudo btrfs scrub status /mnt/hd scrub status for a69162a3-aeb3-43c0-b74d-cfd280bfa8b6 scrub resumed at Wed Jan 23 22:24:05 2019 and finished after 01:52:15 total bytes scrubbed: 1.20TiB with 27163 errors error details: csum=27163 corrected errors: 0, uncorrectable errors: 27163, unverified errors: 0
在继续运行之前btrfs scrub
,我也尝试过btrfs check --repair
一次。这似乎没有太大变化,尽管随后的运行btrfs check
显示“坏块 253432905728”而不是“坏块 253655810048”。现在,btrfs scrub
完成后,btrfs check
说
Checking filesystem on /dev/sdb1 UUID: a69162a3-aeb3-43c0-b74d-cfd280bfa8b6 checking extents bad block 253432905728 ERROR: errors found in extent allocation tree or chunk allocation checking free space cache block group 253432430592 has wrong amount of free space, free space cache has 253722624 block group has 253689856 ERROR: free space cache has more free space than block group item, this could leads to serious corruption, please contact btrfs developers failed to load free space cache for block group 253432430592 ERROR: errors found in free space cache found 1681395535872 bytes used, error(s) found total csum bytes: 0 total tree bytes: 2406203392 total fs tree bytes: 2279718912 total extent tree bytes: 123797504 btree space waste bytes: 350047303 file data blocks allocated: 1685019078656 referenced 1685018304512
这看起来非常令人担忧!这怎么发生的?文件系统是否真的搞砸了?或者驱动器是否出现故障(它并不旧并且根本没有被大量使用;SMART 似乎没有表明任何问题)?