许多 Linux 进程陷入“D”状态，服务器卡住

2024-6-2 • tag-icon

我在 Centos 7.4 操作系统 3.10 上用 RT 内核运行一个高性能应用程序。有时一些进程会陷入 'D' 状态（不可中断），例如 systemd、runc init、umount。它们的堆栈类似，前三层完全相同。

wait_rcu_gp+0x5e/0x80
synchronize_rcu.part.46+0x1f/0x40
synchronize_rcu+0x18/0x20
namespace_unlock+0x68/0x80
SyS_umount+0x25c/0440
system_call_fastpath+0x25/0x2a

进程D的数量不断增加，直到服务器卡住，此时ssh无法使用，但ping可以正常使用。

我研究了一下rcu，它就像是内核使用的一个特定锁。我使用这个命令来搜索rcu，"trace-cmd list -f|grep synchronize"。结果是

get_state_synchronize_rcu
synchronize_rcu_expedited
synchronize_rcu.part.46
synchronize_rcu

这意味着“synchronize_rcu.part.46”是一个特定函数，但我在内核代码中找不到它。有人遇到过同样的情况吗？

相关内容