我在多台机器上使用 Debian/testing Linux。出于安全原因,我总是将其安装在加密的 LVM 上。通常我使用大小为 1TB 到 3TB 的 ext4 文件系统。
不幸的是,它会产生一些非常严重的副作用。
当我执行文件系统密集型操作时(例如,并行编译几个学生的 Buildroot 项目、以 tar.xz 格式存档 20 GB 的数据,或安装 Xilinx Vivado 等软件,该软件会将约 130 GB 的数据写入文件系统),系统会定期冻结约 2 分钟。当我运行 dmesg 时,我收到以下烦人的消息:
[ 8648.672075] systemd[1]: systemd-journald.service: Processes still around after final SIGKILL. Entering failed mode.
[ 8648.672083] systemd[1]: systemd-journald.service: Failed with result 'timeout'.
[ 8648.672140] systemd[1]: systemd-journald.service: Unit process 7708 (systemd-journal) remains running after unit stopped.
[ 8648.672299] systemd[1]: Failed to start Journal Service.
[ 8648.672679] systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 14.
[ 8648.672911] systemd[1]: Stopped Journal Service.
[ 8648.672980] systemd[1]: systemd-journald.service: Found left-over process 7708 (systemd-journal) in control group while starting unit. Ignoring.
[ 8648.672983] systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
[ 8648.673699] systemd[1]: Starting Journal Service...
[ 8738.922289] systemd[1]: systemd-journald.service: start operation timed out. Terminating.
[ 8828.923063] systemd[1]: systemd-journald.service: State 'stop-sigterm' timed out. Killing.
[ 8828.923105] systemd[1]: systemd-journald.service: Killing process 7854 (systemd-journal) with signal SIGKILL.
[ 8828.923141] systemd[1]: systemd-journald.service: Killing process 7708 (systemd-journal) with signal SIGKILL.
[ 8919.173428] systemd[1]: systemd-journald.service: Processes still around after SIGKILL. Ignoring.
[ 9009.423787] systemd[1]: systemd-journald.service: State 'final-sigterm' timed out. Killing.
[ 9009.423831] systemd[1]: systemd-journald.service: Killing process 7854 (systemd-journal) with signal SIGKILL.
[ 9099.674142] systemd[1]: systemd-journald.service: Processes still around after final SIGKILL. Entering failed mode.
[ 9099.674173] systemd[1]: systemd-journald.service: Failed with result 'timeout'.
[ 9099.674241] systemd[1]: systemd-journald.service: Unit process 7854 (systemd-journal) remains running after unit stopped.
[ 9099.674477] systemd[1]: Failed to start Journal Service.
[ 9099.674924] systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 15.
[ 9099.675102] systemd[1]: Stopped Journal Service.
[ 9099.675185] systemd[1]: systemd-journald.service: Found left-over process 7854 (systemd-journal) in control group while starting unit. Ignoring.
[ 9099.675209] systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
[ 9099.675958] systemd[1]: Starting Journal Service...
有趣的是,该问题在具有更大 RAM 的机器上更频繁地发生(我使用具有以下 RAM 容量的机器:16GB(Intel i7 CPU)、32GB(Intel i7 CPU)或 64GB(Intel Xeon E-2176M CPU))。
我发现了类似的旧问题报告,例如:
- https://unix.stackexchange.com/questions/265756/desktop-completely-freezing-on-io-operations
- 磁盘操作冻结 Debian
- https://www.linuxquestions.org/questions/linux-software-2/debian-buster-freezes-on-high-disk-io-task-blocked-for-more-than-120-seconds-4175676665/
基于上述情况,我已检查我是否使用了截止期限调度程序:
# cat /sys/block/sda/queue/scheduler
[mq-deadline] none
我也使用相当新的内核:
# uname -a
Linux WZabHP 5.10.0-8-amd64 #1 SMP Debian 5.10.46-2 (2021-07-20) x86_64 GNU/Linux
我甚至尝试在其中一台机器(32 GB,i7 CPU)上切换到 RT 内核,但无济于事。更糟糕的是 - 冻结导致文件系统损坏(这种影响重复了两次,之后我停止尝试使用 RT 内核)。
上述问题可能由什么原因造成?我应该检查/调整哪些设置来解决这个问题?
更新
我发现另一组帖子表明虚拟内存的参数可能与所描述的问题有关:
- https://www.reddit.com/r/Fedora/comments/ay7dkh/linux_large_transfers_freeze_system_high_io/
- https://unix.stackexchange.com/questions/233421/prevent-large-file-write-from-freezing-the-system
不过,调整vm.dirty_background_ratio
和 似乎vm.dirty_ratio
需要分析磁盘的速度。
更新 2
在发生死亡问题之前journald
,内核日志中会出现以下内容(这是第一个与超时相关的错误):
Jul 26 20:44:38 WZabHP kernel: [ 484.449978] INFO: task StreamTrans #6:2335 blocked for more than 120 seconds.
Jul 26 20:44:38 WZabHP kernel: [ 484.449981] Not tainted 5.10.0-8-amd64 #1 Debian 5.10.46-2
Jul 26 20:44:38 WZabHP kernel: [ 484.449981] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 26 20:44:38 WZabHP kernel: [ 484.449982] task:StreamTrans #6 state:D stack: 0 pid: 2335 ppid: 1945 flags:0x00004000
Jul 26 20:44:38 WZabHP kernel: [ 484.449985] Call Trace:
Jul 26 20:44:38 WZabHP kernel: [ 484.449989] __schedule+0x282/0x870
Jul 26 20:44:38 WZabHP kernel: [ 484.449991] schedule+0x46/0xb0
Jul 26 20:44:38 WZabHP kernel: [ 484.449997] jbd2_log_wait_commit+0xac/0x120 [jbd2]
Jul 26 20:44:38 WZabHP kernel: [ 484.450000] ? add_wait_queue_exclusive+0x70/0x70
Jul 26 20:44:38 WZabHP kernel: [ 484.450013] ext4_sync_file+0xd4/0x350 [ext4]
Jul 26 20:44:38 WZabHP kernel: [ 484.450016] __x64_sys_fsync+0x34/0x60
Jul 26 20:44:38 WZabHP kernel: [ 484.450017] do_syscall_64+0x33/0x80
Jul 26 20:44:38 WZabHP kernel: [ 484.450019] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 26 20:44:38 WZabHP kernel: [ 484.450021] RIP: 0033:0x7fd2cf3385eb
Jul 26 20:44:38 WZabHP kernel: [ 484.450022] RSP: 002b:00007fd2b8d528c0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Jul 26 20:44:38 WZabHP kernel: [ 484.450023] RAX: ffffffffffffffda RBX: 00007fd2af8539d0 RCX: 00007fd2cf3385eb
Jul 26 20:44:38 WZabHP kernel: [ 484.450024] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 000000000000003f
Jul 26 20:44:38 WZabHP kernel: [ 484.450025] RBP: 00007fd2af506ba0 R08: 0000000000000000 R09: 00007fd2cbbe4f28
Jul 26 20:44:38 WZabHP kernel: [ 484.450025] R10: 000200050000002a R11: 0000000000000293 R12: 0000000000000000
Jul 26 20:44:38 WZabHP kernel: [ 484.450026] R13: 00000000000000c4 R14: 00007fd2caa6a6f1 R15: 00000000000000c4
Jul 26 20:44:38 WZabHP kernel: [ 484.450030] INFO: task mozStorage #2:2430 blocked for more than 120 seconds.
Jul 26 20:44:38 WZabHP kernel: [ 484.450031] Not tainted 5.10.0-8-amd64 #1 Debian 5.10.46-2
Jul 26 20:44:38 WZabHP kernel: [ 484.450031] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 26 20:44:38 WZabHP kernel: [ 484.450032] task:mozStorage #2 state:D stack: 0 pid: 2430 ppid: 1945 flags:0x00004000
Jul 26 20:44:38 WZabHP kernel: [ 484.450033] Call Trace:
Jul 26 20:44:38 WZabHP kernel: [ 484.450034] __schedule+0x282/0x870
Jul 26 20:44:38 WZabHP kernel: [ 484.450036] schedule+0x46/0xb0
Jul 26 20:44:38 WZabHP kernel: [ 484.450039] jbd2_log_wait_commit+0xac/0x120 [jbd2]
Jul 26 20:44:38 WZabHP kernel: [ 484.450040] ? add_wait_queue_exclusive+0x70/0x70
Jul 26 20:44:38 WZabHP kernel: [ 484.450047] ext4_sync_file+0xd4/0x350 [ext4]
Jul 26 20:44:38 WZabHP kernel: [ 484.450049] __x64_sys_fsync+0x34/0x60
Jul 26 20:44:38 WZabHP kernel: [ 484.450050] do_syscall_64+0x33/0x80
Jul 26 20:44:38 WZabHP kernel: [ 484.450051] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 26 20:44:38 WZabHP kernel: [ 484.450052] RIP: 0033:0x7fd2cf3385eb
Jul 26 20:44:38 WZabHP kernel: [ 484.450053] RSP: 002b:00007fd2b334c300 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Jul 26 20:44:38 WZabHP kernel: [ 484.450054] RAX: ffffffffffffffda RBX: 00007fd2b34a6858 RCX: 00007fd2cf3385eb
Jul 26 20:44:38 WZabHP kernel: [ 484.450055] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000073
Jul 26 20:44:38 WZabHP kernel: [ 484.450055] RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000
Jul 26 20:44:38 WZabHP kernel: [ 484.450056] R10: 00007ffd1278a080 R11: 0000000000000293 R12: 00000000000001f7
Jul 26 20:44:38 WZabHP kernel: [ 484.450057] R13: 00007fd2aeffbd40 R14: 00007fd2b34a67a0 R15: 000000000000004e
Jul 26 20:48:40 WZabHP kernel: [ 726.109957] INFO: task jbd2/dm-2-8:591 blocked for more than 120 seconds.
Jul 26 20:48:40 WZabHP kernel: [ 726.109959] Not tainted 5.10.0-8-amd64 #1 Debian 5.10.46-2
Jul 26 20:48:40 WZabHP kernel: [ 726.109960] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 26 20:48:40 WZabHP kernel: [ 726.109961] task:jbd2/dm-2-8 state:D stack: 0 pid: 591 ppid: 2 flags:0x00004000
Jul 26 20:48:40 WZabHP kernel: [ 726.109963] Call Trace:
Jul 26 20:48:40 WZabHP kernel: [ 726.109967] __schedule+0x282/0x870
Jul 26 20:48:40 WZabHP kernel: [ 726.109969] ? out_of_line_wait_on_bit_lock+0xb0/0xb0
Jul 26 20:48:40 WZabHP kernel: [ 726.109970] schedule+0x46/0xb0
Jul 26 20:48:40 WZabHP kernel: [ 726.109971] io_schedule+0x42/0x70
Jul 26 20:48:40 WZabHP kernel: [ 726.109973] bit_wait_io+0xd/0x50
Jul 26 20:48:40 WZabHP kernel: [ 726.109974] __wait_on_bit+0x2a/0x90
Jul 26 20:48:40 WZabHP kernel: [ 726.109975] out_of_line_wait_on_bit+0x92/0xb0
Jul 26 20:48:40 WZabHP kernel: [ 726.109977] ? var_wake_function+0x20/0x20
Jul 26 20:48:40 WZabHP kernel: [ 726.109982] jbd2_journal_commit_transaction+0x16b3/0x1ad0 [jbd2]
Jul 26 20:48:40 WZabHP kernel: [ 726.109987] kjournald2+0xab/0x270 [jbd2]
Jul 26 20:48:40 WZabHP kernel: [ 726.109988] ? add_wait_queue_exclusive+0x70/0x70
Jul 26 20:48:40 WZabHP kernel: [ 726.109991] ? load_superblock.part.0+0xb0/0xb0 [jbd2]
Jul 26 20:48:40 WZabHP kernel: [ 726.109993] kthread+0x11b/0x140
Jul 26 20:48:40 WZabHP kernel: [ 726.109994] ? __kthread_bind_mask+0x60/0x60
Jul 26 20:48:40 WZabHP kernel: [ 726.109996] ret_from_fork+0x22/0x30
Jul 26 20:48:40 WZabHP kernel: [ 726.109999] INFO: task journal-offline:2579 blocked for more than 120 seconds.
Jul 26 20:48:40 WZabHP kernel: [ 726.109999] Not tainted 5.10.0-8-amd64 #1 Debian 5.10.46-2
Jul 26 20:48:40 WZabHP kernel: [ 726.110000] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 26 20:48:40 WZabHP kernel: [ 726.110001] task:journal-offline state:D stack: 0 pid: 2579 ppid: 1 flags:0x00004324
Jul 26 20:48:40 WZabHP kernel: [ 726.110002] Call Trace:
Jul 26 20:48:40 WZabHP kernel: [ 726.110003] __schedule+0x282/0x870
Jul 26 20:48:40 WZabHP kernel: [ 726.110004] schedule+0x46/0xb0
Jul 26 20:48:40 WZabHP kernel: [ 726.110007] jbd2_log_wait_commit+0xac/0x120 [jbd2]
Jul 26 20:48:40 WZabHP kernel: [ 726.110009] ? add_wait_queue_exclusive+0x70/0x70
Jul 26 20:48:40 WZabHP kernel: [ 726.110021] ext4_sync_file+0xd4/0x350 [ext4]
Jul 26 20:48:40 WZabHP kernel: [ 726.110024] __x64_sys_fsync+0x34/0x60
Jul 26 20:48:40 WZabHP kernel: [ 726.110025] do_syscall_64+0x33/0x80
Jul 26 20:48:40 WZabHP kernel: [ 726.110027] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 26 20:48:40 WZabHP kernel: [ 726.110029] RIP: 0033:0x7f68c72ebabb
Jul 26 20:48:40 WZabHP kernel: [ 726.110030] RSP: 002b:00007f68be229cf0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Jul 26 20:48:40 WZabHP kernel: [ 726.110031] RAX: ffffffffffffffda RBX: 000056427bbd8870 RCX: 00007f68c72ebabb
Jul 26 20:48:40 WZabHP kernel: [ 726.110032] RDX: 00007f68c5a2b000 RSI: 00007f68c7649414 RDI: 000000000000001a
Jul 26 20:48:40 WZabHP kernel: [ 726.110032] RBP: 00007f68c764bd30 R08: 0000000000000000 R09: 00007f68be22a700
Jul 26 20:48:40 WZabHP kernel: [ 726.110033] R10: 0000000000000014 R11: 0000000000000293 R12: 0000000000000002
Jul 26 20:48:40 WZabHP kernel: [ 726.110034] R13: 00007ffe691a51df R14: 00007f68be229e00 R15: 000056427bbf9920
Jul 26 20:48:40 WZabHP kernel: [ 726.110035] INFO: task journal-offline:2580 blocked for more than 120 seconds.
Jul 26 20:48:40 WZabHP kernel: [ 726.110036] Not tainted 5.10.0-8-amd64 #1 Debian 5.10.46-2
Jul 26 20:48:40 WZabHP kernel: [ 726.110036] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 26 20:48:40 WZabHP kernel: [ 726.110037] task:journal-offline state:D stack: 0 pid: 2580 ppid: 1 flags:0x00004324
Jul 26 20:48:40 WZabHP kernel: [ 726.110038] Call Trace:
Jul 26 20:48:40 WZabHP kernel: [ 726.110039] __schedule+0x282/0x870
Jul 26 20:48:40 WZabHP kernel: [ 726.110040] schedule+0x46/0xb0
Jul 26 20:48:40 WZabHP kernel: [ 726.110043] jbd2_log_wait_commit+0xac/0x120 [jbd2]
Jul 26 20:48:40 WZabHP kernel: [ 726.110045] ? add_wait_queue_exclusive+0x70/0x70
Jul 26 20:48:40 WZabHP kernel: [ 726.110051] ext4_sync_file+0xd4/0x350 [ext4]
Jul 26 20:48:40 WZabHP kernel: [ 726.110053] __x64_sys_fsync+0x34/0x60
Jul 26 20:48:40 WZabHP kernel: [ 726.110054] do_syscall_64+0x33/0x80
Jul 26 20:48:40 WZabHP kernel: [ 726.110056] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 26 20:48:40 WZabHP kernel: [ 726.110056] RIP: 0033:0x7f68c72ebabb
Jul 26 20:48:40 WZabHP kernel: [ 726.110057] RSP: 002b:00007f68bda28cf0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Jul 26 20:48:40 WZabHP kernel: [ 726.110058] RAX: ffffffffffffffda RBX: 000056427bc095d0 RCX: 00007f68c72ebabb
Jul 26 20:48:40 WZabHP kernel: [ 726.110059] RDX: 00007f68c622b000 RSI: 00007f68c7649414 RDI: 0000000000000026
Jul 26 20:48:40 WZabHP kernel: [ 726.110059] RBP: 00007f68c764bd30 R08: 0000000000000000 R09: 00007f68bda29700
Jul 26 20:48:40 WZabHP kernel: [ 726.110060] R10: 0000000000000014 R11: 0000000000000293 R12: 0000000000000002
Jul 26 20:48:40 WZabHP kernel: [ 726.110061] R13: 00007ffe691a51df R14: 00007f68bda28e00 R15: 0000000000802000
Jul 26 20:48:40 WZabHP kernel: [ 726.110064] INFO: task NetworkManager:1090 blocked for more than 120 seconds.
Jul 26 20:48:40 WZabHP kernel: [ 726.110065] Not tainted 5.10.0-8-amd64 #1 Debian 5.10.46-2
Jul 26 20:48:40 WZabHP kernel: [ 726.110065] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 26 20:48:40 WZabHP kernel: [ 726.110066] task:NetworkManager state:D stack: 0 pid: 1090 ppid: 1 flags:0x00004000
Jul 26 20:48:40 WZabHP kernel: [ 726.110067] Call Trace:
Jul 26 20:48:40 WZabHP kernel: [ 726.110068] __schedule+0x282/0x870
Jul 26 20:48:40 WZabHP kernel: [ 726.110069] schedule+0x46/0xb0
Jul 26 20:48:40 WZabHP kernel: [ 726.110072] jbd2_log_wait_commit+0xac/0x120 [jbd2]
Jul 26 20:48:40 WZabHP kernel: [ 726.110073] ? add_wait_queue_exclusive+0x70/0x70
Jul 26 20:48:40 WZabHP kernel: [ 726.110080] ext4_sync_file+0xd4/0x350 [ext4]
Jul 26 20:48:40 WZabHP kernel: [ 726.110081] __x64_sys_fsync+0x34/0x60
Jul 26 20:48:40 WZabHP kernel: [ 726.110083] do_syscall_64+0x33/0x80
Jul 26 20:48:40 WZabHP kernel: [ 726.110084] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 26 20:48:40 WZabHP kernel: [ 726.110085] RIP: 0033:0x7fd15da1c5eb
Jul 26 20:48:40 WZabHP kernel: [ 726.110085] RSP: 002b:00007ffedee571c0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Jul 26 20:48:40 WZabHP kernel: [ 726.110086] RAX: ffffffffffffffda RBX: 00007fd15cd06928 RCX: 00007fd15da1c5eb
Jul 26 20:48:40 WZabHP kernel: [ 726.110087] RDX: 0000000000000184 RSI: 000055c28ee17560 RDI: 000000000000001a
Jul 26 20:48:40 WZabHP kernel: [ 726.110088] RBP: 000000000000001a R08: 0000000000000000 R09: 00007ffedee572d0
Jul 26 20:48:40 WZabHP kernel: [ 726.110088] R10: 0000000000000184 R11: 0000000000000293 R12: 00007ffedee572d0
Jul 26 20:48:40 WZabHP kernel: [ 726.110089] R13: 000055c28ed78fd0 R14: 000055c28ee176e4 R15: 0000000000000000
Jul 26 20:48:40 WZabHP kernel: [ 726.110104] INFO: task Permission:2299 blocked for more than 120 seconds.
Jul 26 20:48:40 WZabHP kernel: [ 726.110105] Not tainted 5.10.0-8-amd64 #1 Debian 5.10.46-2
Jul 26 20:48:40 WZabHP kernel: [ 726.110105] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 26 20:48:40 WZabHP kernel: [ 726.110106] task:Permission state:D stack: 0 pid: 2299 ppid: 1945 flags:0x00004000
Jul 26 20:48:40 WZabHP kernel: [ 726.110107] Call Trace:
Jul 26 20:48:40 WZabHP kernel: [ 726.110108] __schedule+0x282/0x870
Jul 26 20:48:40 WZabHP kernel: [ 726.110109] schedule+0x46/0xb0
Jul 26 20:48:40 WZabHP kernel: [ 726.110112] jbd2_log_wait_commit+0xac/0x120 [jbd2]
Jul 26 20:48:40 WZabHP kernel: [ 726.110113] ? add_wait_queue_exclusive+0x70/0x70
Jul 26 20:48:40 WZabHP kernel: [ 726.110120] ext4_sync_file+0xd4/0x350 [ext4]
Jul 26 20:48:40 WZabHP kernel: [ 726.110122] __x64_sys_fsync+0x34/0x60
Jul 26 20:48:40 WZabHP kernel: [ 726.110123] do_syscall_64+0x33/0x80
Jul 26 20:48:40 WZabHP kernel: [ 726.110124] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 26 20:48:40 WZabHP kernel: [ 726.110125] RIP: 0033:0x7fd2cf3385eb
Jul 26 20:48:40 WZabHP kernel: [ 726.110125] RSP: 002b:00007fd2beffe3e0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Jul 26 20:48:40 WZabHP kernel: [ 726.110126] RAX: ffffffffffffffda RBX: 00007fd2bf0582e0 RCX: 00007fd2cf3385eb
Jul 26 20:48:40 WZabHP kernel: [ 726.110127] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000089
Jul 26 20:48:40 WZabHP kernel: [ 726.110128] RBP: 0000000000000002 R08: 0000000000000000 R09: 00007fd2cec47af0
Jul 26 20:48:40 WZabHP kernel: [ 726.110128] R10: 00007ffd1278a080 R11: 0000000000000293 R12: 00000000000001f5
Jul 26 20:48:40 WZabHP kernel: [ 726.110129] R13: 0000000000000000 R14: 0000000000010400 R15: 000000000000003f
Jul 26 20:48:40 WZabHP kernel: [ 726.110131] INFO: task StreamTrans #3:2319 blocked for more than 120 seconds.
Jul 26 20:48:40 WZabHP kernel: [ 726.110132] Not tainted 5.10.0-8-amd64 #1 Debian 5.10.46-2
Jul 26 20:48:40 WZabHP kernel: [ 726.110132] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 26 20:48:40 WZabHP kernel: [ 726.110133] task:StreamTrans #3 state:D stack: 0 pid: 2319 ppid: 1945 flags:0x00004000
Jul 26 20:48:40 WZabHP kernel: [ 726.110134] Call Trace:
Jul 26 20:48:40 WZabHP kernel: [ 726.110135] __schedule+0x282/0x870
Jul 26 20:48:40 WZabHP kernel: [ 726.110136] schedule+0x46/0xb0
Jul 26 20:48:40 WZabHP kernel: [ 726.110139] jbd2_log_wait_commit+0xac/0x120 [jbd2]
Jul 26 20:48:40 WZabHP kernel: [ 726.110140] ? add_wait_queue_exclusive+0x70/0x70
Jul 26 20:48:40 WZabHP kernel: [ 726.110146] ext4_sync_file+0xd4/0x350 [ext4]
Jul 26 20:48:40 WZabHP kernel: [ 726.110148] __x64_sys_fsync+0x34/0x60
Jul 26 20:48:40 WZabHP kernel: [ 726.110149] do_syscall_64+0x33/0x80
Jul 26 20:48:40 WZabHP kernel: [ 726.110150] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 26 20:48:40 WZabHP kernel: [ 726.110151] RIP: 0033:0x7fd2cf3385eb
Jul 26 20:48:40 WZabHP kernel: [ 726.110152] RSP: 002b:00007fd2bba8e8c0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Jul 26 20:48:40 WZabHP kernel: [ 726.110153] RAX: ffffffffffffffda RBX: 00007fd2af8539d0 RCX: 00007fd2cf3385eb
Jul 26 20:48:40 WZabHP kernel: [ 726.110153] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 000000000000003f
Jul 26 20:48:40 WZabHP kernel: [ 726.110154] RBP: 00007fd2aecd16d0 R08: 0000000000000000 R09: 00007fd2cbbe4f28
Jul 26 20:48:40 WZabHP kernel: [ 726.110155] R10: 0002000500000049 R11: 0000000000000293 R12: 0000000000000000
Jul 26 20:48:40 WZabHP kernel: [ 726.110155] R13: 00000000000000c4 R14: 00007fd2caa6a6f1 R15: 00000000000000c4
Jul 26 20:48:40 WZabHP kernel: [ 726.110157] INFO: task QuotaManager IO:2333 blocked for more than 120 seconds.
Jul 26 20:48:40 WZabHP kernel: [ 726.110158] Not tainted 5.10.0-8-amd64 #1 Debian 5.10.46-2
Jul 26 20:48:40 WZabHP kernel: [ 726.110158] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 26 20:48:40 WZabHP kernel: [ 726.110159] task:QuotaManager IO state:D stack: 0 pid: 2333 ppid: 1945 flags:0x00004000
Jul 26 20:48:40 WZabHP kernel: [ 726.110160] Call Trace:
Jul 26 20:48:40 WZabHP kernel: [ 726.110161] __schedule+0x282/0x870
Jul 26 20:48:40 WZabHP kernel: [ 726.110162] schedule+0x46/0xb0
Jul 26 20:48:40 WZabHP kernel: [ 726.110165] jbd2_log_wait_commit+0xac/0x120 [jbd2]
Jul 26 20:48:40 WZabHP kernel: [ 726.110166] ? add_wait_queue_exclusive+0x70/0x70
Jul 26 20:48:40 WZabHP kernel: [ 726.110172] ext4_sync_file+0xd4/0x350 [ext4]
Jul 26 20:48:40 WZabHP kernel: [ 726.110173] __x64_sys_fsync+0x34/0x60
Jul 26 20:48:40 WZabHP kernel: [ 726.110175] do_syscall_64+0x33/0x80
Jul 26 20:48:40 WZabHP kernel: [ 726.110176] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 26 20:48:40 WZabHP kernel: [ 726.110177] RIP: 0033:0x7fd2cf3385eb
Jul 26 20:48:40 WZabHP kernel: [ 726.110177] RSP: 002b:00007fd2b8d93020 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Jul 26 20:48:40 WZabHP kernel: [ 726.110178] RAX: ffffffffffffffda RBX: 00007fd2ba6f52e0 RCX: 00007fd2cf3385eb
Jul 26 20:48:40 WZabHP kernel: [ 726.110179] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000047
Jul 26 20:48:40 WZabHP kernel: [ 726.110179] RBP: 0000000000000002 R08: 0000000000000000 R09: 00007fd2b8c1b910
Jul 26 20:48:40 WZabHP kernel: [ 726.110180] R10: 00007ffd1278a080 R11: 0000000000000293 R12: 00000000000001f5
Jul 26 20:48:40 WZabHP kernel: [ 726.110180] R13: 0000000000000000 R14: 0000000000000a00 R15: 000000000000003b
Jul 26 20:48:40 WZabHP kernel: [ 726.110184] INFO: task mozStorage #2:2430 blocked for more than 120 seconds.
Jul 26 20:48:40 WZabHP kernel: [ 726.110185] Not tainted 5.10.0-8-amd64 #1 Debian 5.10.46-2
Jul 26 20:48:40 WZabHP kernel: [ 726.110185] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 26 20:48:40 WZabHP kernel: [ 726.110186] task:mozStorage #2 state:D stack: 0 pid: 2430 ppid: 1945 flags:0x00000000
Jul 26 20:48:40 WZabHP kernel: [ 726.110187] Call Trace:
Jul 26 20:48:40 WZabHP kernel: [ 726.110188] __schedule+0x282/0x870
Jul 26 20:48:40 WZabHP kernel: [ 726.110189] schedule+0x46/0xb0
Jul 26 20:48:40 WZabHP kernel: [ 726.110192] jbd2_log_wait_commit+0xac/0x120 [jbd2]
Jul 26 20:48:40 WZabHP kernel: [ 726.110193] ? add_wait_queue_exclusive+0x70/0x70
Jul 26 20:48:40 WZabHP kernel: [ 726.110196] __jbd2_journal_force_commit+0x5d/0xb0 [jbd2]
Jul 26 20:48:40 WZabHP kernel: [ 726.110199] jbd2_journal_force_commit+0x1d/0x30 [jbd2]
Jul 26 20:48:40 WZabHP kernel: [ 726.110205] ext4_sync_file+0x2c4/0x350 [ext4]
Jul 26 20:48:40 WZabHP kernel: [ 726.110206] __x64_sys_fsync+0x34/0x60
Jul 26 20:48:40 WZabHP kernel: [ 726.110207] do_syscall_64+0x33/0x80
Jul 26 20:48:40 WZabHP kernel: [ 726.110209] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 26 20:48:40 WZabHP kernel: [ 726.110209] RIP: 0033:0x7fd2cf3385eb
Jul 26 20:48:40 WZabHP kernel: [ 726.110210] RSP: 002b:00007fd2b334c300 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Jul 26 20:48:40 WZabHP kernel: [ 726.110211] RAX: ffffffffffffffda RBX: 00007fd2b34a6858 RCX: 00007fd2cf3385eb
Jul 26 20:48:40 WZabHP kernel: [ 726.110212] RDX: 00000000000a0000 RSI: 00007fd2b334c0f0 RDI: 0000000000000072
Jul 26 20:48:40 WZabHP kernel: [ 726.110212] RBP: 0000000000000002 R08: 0000000000000000 R09: 000000000000003e
Jul 26 20:48:40 WZabHP kernel: [ 726.110213] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
Jul 26 20:48:40 WZabHP kernel: [ 726.110213] R13: 00007fd2aeffbd40 R14: 00007fd2b34a67a0 R15: 000000000000004e
因此,问题与文件系统的低同步有关。磁盘吞吐量足够。但是,问题出现在带有加密卷的安装中。CPU 性能(尤其是在 Xeon 机器的情况下)足以足够快地执行加密。
加密所需的另一种稀缺资源是真正的随机数据。事实上,我现在可以访问的系统似乎没有安装 rng-tools5 或 rng-tools-debian。这可能是问题的根源。
答案1
这里描述的修改之一已经消除了大规模写入时系统冻结的情况。
cryptsetup --perf-no_write_workqueue refresh name_of_the_mapping
(就我而言是cryptsetup --perf-no_write_workqueue refresh sda6_crypt
:)
可以通过执行以下操作永久设置该选项:
cryptsetup --perf-no_write_workqueue --persistent refresh name_of_the_mapping
我已将问题报告为错误https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1016474,但这是否是cryptsetup包中的一个错误是值得怀疑的。
因此完整的解决方案似乎是:
- 安装
haveged
软件包 - 使用
--perf-no_write_workqueue
加密 LVM 映射中的选项。
答案2
问题似乎与系统中熵的缺乏有关(如评论中所述)。这解释了为什么问题会出现在加密 LVM 的系统中。写入数据的加密需要真正的随机字节。
我已经安装了rng-tools5
和haveged
(按照@michalng 的建议,谢谢!),问题似乎有所缓解。但是,还需要进一步彻底测试才能确认这确实是该问题的正确解决方案。