Btrfs、校验和损坏

Btrfs、校验和损坏

我在 3 个磁盘上设置了 Btrfs,元数据和数据位于 RAID1 中。但现在我有一个校验和错误,无法恢复。

两个副本上的校验和相同,仅与预期校验和相差一位翻转位。因此我怀疑在将校验和写入磁盘之前存在位翻转(计算机没有 ECC RAM)。在将实际文件写入此文件系统之前,我在另一台计算机上有一份实际文件的副本,但如下所示,由于文件系统中的 I/O 错误,我无法读出数据,因此我无法比较它们。

我应该如何继续修复此错误?

一些细节:

$ uname -a
Linux stan 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
$ btrfs --version
btrfs-progs v4.15.1
$ sudo btrfs fi usage /media/btrfs/
Overall:
    Device size:           7.28TiB
    Device allocated:          3.91TiB
    Device unallocated:        3.36TiB
    Device missing:          0.00B
    Used:              3.83TiB
    Free (estimated):          1.72TiB  (min: 1.72TiB)
    Data ratio:               2.00
    Metadata ratio:           2.00
    Global reserve:      512.00MiB  (used: 0.00B)

Data,RAID1: Size:1.95TiB, Used:1.91TiB
   /dev/sdb    1.95TiB
   /dev/sdc  998.00GiB
   /dev/sdd 1001.00GiB

Metadata,RAID1: Size:4.00GiB, Used:2.63GiB
   /dev/sdb    4.00GiB
   /dev/sdc    3.00GiB
   /dev/sdd    1.00GiB

System,RAID1: Size:64.00MiB, Used:304.00KiB
   /dev/sdb   64.00MiB
   /dev/sdc   64.00MiB

Unallocated:
   /dev/sdb    1.68TiB
   /dev/sdc  861.95GiB
   /dev/sdd  861.02GiB

擦洗:

$ sudo btrfs scrub status /media/btrfs/

scrub status for xxxxxx
    scrub started at Mon Aug 24 11:23:27 2020 and finished after 03:41:54
    total bytes scrubbed: 3.81TiB with 2 errors
    error details: csum=2
    corrected errors: 0, uncorrectable errors: 2, unverified errors: 0

擦洗后出现 Dmesg 错误。

$ dmesg
...
196755.786038] BTRFS warning (device sdb): checksum error at logical 3099310968832 on dev /dev/sdb, physical 1300730499072, root 5223, inod
e 6521311, offset 7614464, length 4096, links 1 (path: users/joachim/Bilder/Canon/270CANON/IMG_7003.CR2)
[196755.786168] BTRFS warning (device sdb): checksum error at logical 3099310968832 on dev /dev/sdb, physical 1300730499072, root 5303, inod
e 6521311, offset 7614464, length 4096, links 1 (path: users/joachim/Bilder/Canon/270CANON/IMG_7003.CR2)
[196755.786245] BTRFS warning (device sdb): checksum error at logical 3099310968832 on dev /dev/sdb, physical 1300730499072, root 5302, inod
e 6521311, offset 7614464, length 4096, links 1 (path: users/joachim/Bilder/Canon/270CANON/IMG_7003.CR2)
...
[196755.788274] BTRFS error (device sdb): bdev /dev/sdb errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
[196755.814044] BTRFS error (device sdb): unable to fixup (regular) error at logical 3099310968832 on dev /dev/sdb

检查块内部:

$ sudo btrfs inspect-internal logical-resolve -v 3099310968832 /media/btrfs/
ioctl ret=0, total_size=4096, bytes_left=3456, bytes_missing=0, cnt=78, missed=0
ioctl ret=0, bytes_left=4023, bytes_missing=0, cnt=1, missed=0
/media/btrfs//snapshots/stansafe.20200601T032501+0200/users/joachim/Bilder/Canon/270CANON/IMG_7003.CR2
ioctl ret=0, bytes_left=4023, bytes_missing=0, cnt=1, missed=0
/media/btrfs//snapshots/stansafe.20200910T032501+0200/users/joachim/Bilder/Canon/270CANON/IMG_7003.CR2
ioctl ret=0, bytes_left=4023, bytes_missing=0, cnt=1, missed=0
/media/btrfs//snapshots/stansafe.20200909T032502+0200/users/joachim/Bilder/Canon/270CANON/IMG_7003.CR2
...

尝试验证文件:

$ sha256sum /media/btrfs//stansafe/users/joachim/Bilder/Canon/270CANON/IMG_7003.CR2
sha256sum: /media/btrfs//stansafe/users/joachim/Bilder/Canon/270CANON/IMG_7003.CR2: Input/output error

$ dmesg
...
[1642985.509498] BTRFS warning (device sdb): csum failed root 259 ino 6521311 off 7614464 csum 0x151ad4ce expected csum 0x150ad4ce mirror 1
[1642985.509942] BTRFS warning (device sdb): csum failed root 259 ino 6521311 off 7614464 csum 0x151ad4ce expected csum 0x150ad4ce mirror 2

答案1

Linux Kernel 5.11 引入了挂载和忽略文件校验和的选项。这将允许复制出错误 csum 的数据。

mount -o rescue=ignoredatacsums /dev/sdX /mnt

如果文件损坏程度最低并且您有某种奇偶校验,这可以让您完全恢复文件,例如2杆

来源: https://btrfs.readthedocs.io/en/latest/btrfs-man5.html

答案2

如果您想将所有文件的 3 个副本存储在 3 个硬盘或 3 个分区上

使用raid1c3

这样,即使您丢失了 2 个硬盘,您在 1 个硬盘中的数据仍然存在。

raid1 只存储 2 个副本,无论您使用多少个硬盘

相关内容