我在 ZFS 上有两个采用 RAID 1(镜像)配置的 SSD。它们相当老旧(我猜大概有 10 年了),但这些年来从未使用过。这是我的配置
pool: tank
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
scan: scrub in progress since Wed Nov 22 15:56:15 2023
176G scanned at 454M/s, 28.3G issued at 73.2M/s, 176G total
4.50K repaired, 16.11% done, 00:34:21 to go
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sda ONLINE 0 0 3 (repairing)
sdb ONLINE 0 0 6 (repairing)
如您所见,在清理过程中,它发现了一些校验和不一致,并能够修复它们。奇怪的是,即使我没有在磁盘上写入任何新内容并运行两次清理操作,一次接一次完成,它总会在两个磁盘上发现新的错误。
查看 的输出dmesg
,我没有看到与磁盘相关的问题(没有可怕的红色错误)。我唯一发现的是这个
[18125.949842] RIP: 0033:0x7f5eb2eeee83
[18125.949849] RSP: 002b:00007f5eb21fc6f8 EFLAGS: 00000293 ORIG_RAX: 00000000000000d9
[18125.949859] RAX: ffffffffffffffda RBX: 00007f5ea40178d0 RCX: 00007f5eb2eeee83
[18125.949865] RDX: 0000000000008000 RSI: 00007f5ea40178d0 RDI: 0000000000000009
[18125.949870] RBP: 00007f5ea40178a4 R08: 0000000000000007 R09: 00007f5ea4007650
[18125.949876] R10: 3ade3c6b4360070e R11: 0000000000000293 R12: ffffffffffffff50
[18125.949882] R13: 0000000000000000 R14: 00007f5ea40178a0 R15: 00007f5eb21fcbf0
[18125.949894] </TASK>
[18125.949898] INFO: task fish:591217 blocked for more than 120 seconds.
[18125.949906] Tainted: P OE 6.1.0-13-amd64 #1 Debian 6.1.55-1
[18125.949914] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[18125.949919] task:fish state:D stack:0 pid:591217 ppid:584012 flags:0x00000002
[18125.949930] Call Trace:
[18125.949933] <TASK>
[18125.949939] __schedule+0x351/0xa20
[18125.949954] schedule+0x5d/0xe0
[18125.949961] io_schedule+0x42/0x70
[18125.949969] cv_wait_common+0xaa/0x130 [spl]
[18125.950003] ? cpuusage_read+0x10/0x10
[18125.950014] txg_wait_synced_impl+0xcb/0x110 [zfs]
[18125.950417] txg_wait_synced+0xc/0x40 [zfs]
[18125.950812] dmu_tx_wait+0x208/0x430 [zfs]
[18125.951127] dmu_tx_assign+0x15e/0x510 [zfs]
[18125.951442] zfs_dirty_inode+0x14d/0x360 [zfs]
[18125.951863] zpl_dirty_inode+0x25/0x40 [zfs]
[18125.952277] __mark_inode_dirty+0x53/0x380
[18125.952289] touch_atime+0x1d1/0x1f0
[18125.952299] iterate_dir+0xff/0x1c0
[18125.952309] __x64_sys_getdents64+0x84/0x120
[18125.952318] ? compat_filldir+0x190/0x190
[18125.952330] do_syscall_64+0x58/0xc0
[18125.952342] ? fpregs_assert_state_consistent+0x22/0x50
[18125.952352] ? exit_to_user_mode_prepare+0x40/0x1d0
[18125.952362] ? syscall_exit_to_user_mode+0x27/0x40
[18125.952370] ? do_syscall_64+0x67/0xc0
[18125.952380] ? do_syscall_64+0x67/0xc0
[18125.952391] entry_SYSCALL_64_after_hwframe+0x64/0xce
在我看来,这确实是 zfs 相关内容的转储(非常模糊),尽管被列为阻止的任务是 fish(我的 shell)。这可能是我的问题(我不这么认为),还是这仅仅意味着我的磁盘有故障并且即将损坏?
如果有帮助的话,我在 Debian 12 Linux 机器上。
提前感谢你的帮助;-)