我是 SnapRaid 的新手,想知道当所有清理运行都像这样时出了什么问题:
...
Data error in parity 'parity' at position '11242042', diff bits 1048371/2097152
Data error in parity 'parity' at position '11242043', diff bits 1048278/2097152
Data error in parity 'parity' at position '11242044', diff bits 1048591/2097152
Data error in parity 'parity' at position '11242045', diff bits 1047674/2097152
Data error in parity 'parity' at position '11242046', diff bits 1049725/2097152
Data error in parity 'parity' at position '11242047', diff bits 1048050/2097152
Data error in parity 'parity' at position '11242048', diff bits 1048318/2097152
Data error in parity 'parity' at position '11242049', diff bits 1049356/2097152
Data error in parity 'parity' at position '11242050', diff bits 1049158/2097152
Data error in parity 'parity' at position '11242051', diff bits 1047212/2097152
Data error in parity 'parity' at position '11242052', diff bits 1049267/2097152
Data error in parity 'parity' at position '11242053', diff bits 1048615/2097152
...
100% completed, 6380380 MB accessed in 3:49
0 file errors
0 io errors
6084943 data errors
看起来奇偶校验上的所有数据都有点损坏。但设置只有 1 个月,几天前才开始清理。没有智能错误或类似错误,并且 raid 同步。我猜这与某些配置错误或 snapraid 问题有关。
如果知道如何进一步调查这个问题就太好了。
答案1
在 SnapRaid 论坛的帮助下,我找到了问题所在:SnapRaid 似乎在第一次同步时遇到 I/O 错误时遇到了重大问题。从那时起,奇偶校验上的所有块都已堵塞。
对于遇到此问题的每个人,我将提供一些如何解决该问题的信息:
检查偏移量 11242042 处的 100 个块以获取错误类型:
$ snapraid -S 11242042 -B 100 check
100% completed, 105 MB accessed in 0:00
100 errors
0 unrecoverable errors
WARNING! There are errors!
好的,看来一切都可以恢复。
检查受影响的数据量:
$ snapraid status
[...]
No sync is in progress.
The 100% of the array is not scrubbed.
You have 164 files with zero sub-second timestamp.
Run the 'touch' command to set it to a not zero value.
No rehash is in progress or needed.
DANGER! In the array there are 7926625 errors!
They are from block 7107252 to 15055004, specifically at blocks: 7107252 [...]
好的,对我来说这相当多...因此从奇偶校验磁盘的第一个受影响的块之前强制重新同步:
$ snapraid fix -d parity -S 7000000
或者仅修复已检测到的坏块:
$ snapraid fix -d parity -e
如果你的几乎所有东西都坏了,我建议:
$ snapraid --force-full sync
虽然花了几个小时,但是 SnapRaid 现在又恢复正常了。