rcu_sched 检测到 CPU/任务停顿

2024-7-28 • tag-icon

我使用 Virtualbox 运行许多虚拟机。这些虚拟机使用 Debian 10.3（最新版本）。我遇到了错误/冻结，如下所示。这似乎发生在我在 Virtualbox 中连接 USB 设备（Wifi USB 加密狗）的虚拟机上：我断开了 SSH 连接，虚拟机冻结了。

我是新手，不知道它来自哪里。是内核还是发行版？

我发现这是 CPU 问题。我总是在我的虚拟机上分配 6 个 CPU（我有一台 Ryzen 5 3600）和 2 或 4G RAM（我的主机上有 16G）。

从dmesg：

[   61.290365] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[   61.290391] rcu:     4-...!: (16 GPs behind) idle=4cc/0/0x0 softirq=1782/1782 fqs=1
[   61.290408] rcu:     (detected by 2, t=5282 jiffies, g=633, q=71)
[   61.290424] Sending NMI from CPU 2 to CPUs 4:
[   61.290471] NMI backtrace for cpu 4 skipped: idling at native_safe_halt+0xe/0x10
[   61.291424] rcu: rcu_sched kthread starved for 5244 jiffies! g633 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=4
[   61.291452] rcu: RCU grace-period kthread stack dump:
[   61.291467] rcu_sched       I    0    10      2 0x80000000
[   61.291468] Call Trace:
[   61.291475]  ? __schedule+0x2a2/0x870
[   61.291476]  schedule+0x28/0x80
[   61.291478]  schedule_timeout+0x16b/0x390
[   61.291480]  ? __next_timer_interrupt+0xc0/0xc0
[   61.291483]  rcu_gp_kthread+0x40d/0x850
[   61.291484]  ? call_rcu_sched+0x20/0x20
[   61.291486]  kthread+0x112/0x130
[   61.291487]  ? kthread_bind+0x30/0x30
[   61.291488]  ret_from_fork+0x35/0x40
[   82.349534] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[   82.349560] rcu:     0-...!: (1 GPs behind) idle=f3c/0/0x0 softirq=924/924 fqs=0
[   82.349581] rcu:     4-...!: (0 ticks this GP) idle=558/0/0x0 softirq=1782/1782 fqs=0
[   82.349599] rcu:     5-...!: (13 GPs behind) idle=204/0/0x0 softirq=864/864 fqs=0
[   82.349616] rcu:     (detected by 3, t=5259 jiffies, g=637, q=198)
[   82.349633] Sending NMI from CPU 3 to CPUs 0:
[   82.349673] NMI backtrace for cpu 0 skipped: idling at native_safe_halt+0xe/0x10
[   82.350631] Sending NMI from CPU 3 to CPUs 4:
[   82.350656] NMI backtrace for cpu 4 skipped: idling at native_safe_halt+0xe/0x10
[   82.351628] Sending NMI from CPU 3 to CPUs 5:
[   82.351654] NMI backtrace for cpu 5 skipped: idling at native_safe_halt+0xe/0x10
[   82.352627] rcu: rcu_sched kthread starved for 5259 jiffies! g637 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=4
[   82.352652] rcu: RCU grace-period kthread stack dump:
[   82.352664] rcu_sched       I    0    10      2 0x80000000
[   82.352666] Call Trace:
[   82.352670]  ? __schedule+0x2a2/0x870
[   82.352671]  schedule+0x28/0x80
[   82.352672]  schedule_timeout+0x16b/0x390
[   82.352675]  ? __next_timer_interrupt+0xc0/0xc0
[   82.352676]  rcu_gp_kthread+0x40d/0x850
[   82.352678]  ? call_rcu_sched+0x20/0x20
[   82.352679]  kthread+0x112/0x130
[   82.352680]  ? kthread_bind+0x30/0x30
[   82.352681]  ret_from_fork+0x35/0x40

从/var/log/syslog

May 15 11:09:31 165-netshaper-deb-1030 kernel: [   61.290365] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
May 15 11:09:31 165-netshaper-deb-1030 kernel: [   61.290391] rcu:      4-...!: (16 GPs behind) idle=4cc/0/0x0 softirq=1782/1782 fqs=1
May 15 11:09:31 165-netshaper-deb-1030 kernel: [   61.290408] rcu:      (detected by 2, t=5282 jiffies, g=633, q=71)
May 15 11:09:31 165-netshaper-deb-1030 kernel: [   61.290424] Sending NMI from CPU 2 to CPUs 4:
May 15 11:09:31 165-netshaper-deb-1030 kernel: [   61.290471] NMI backtrace for cpu 4 skipped: idling at native_safe_halt+0xe/0x10
May 15 11:09:31 165-netshaper-deb-1030 kernel: [   61.291424] rcu: rcu_sched kthread starved for 5244 jiffies! g633 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=4
May 15 11:09:31 165-netshaper-deb-1030 kernel: [   61.291452] rcu: RCU grace-period kthread stack dump:
May 15 11:09:31 165-netshaper-deb-1030 kernel: [   61.291467] rcu_sched       I    0    10      2 0x80000000
May 15 11:09:31 165-netshaper-deb-1030 kernel: [   61.291468] Call Trace:
May 15 11:09:31 165-netshaper-deb-1030 kernel: [   61.291475]  ? __schedule+0x2a2/0x870
May 15 11:09:31 165-netshaper-deb-1030 kernel: [   61.291476]  schedule+0x28/0x80
May 15 11:09:31 165-netshaper-deb-1030 kernel: [   61.291478]  schedule_timeout+0x16b/0x390
May 15 11:09:31 165-netshaper-deb-1030 kernel: [   61.291480]  ? __next_timer_interrupt+0xc0/0xc0
May 15 11:09:31 165-netshaper-deb-1030 kernel: [   61.291483]  rcu_gp_kthread+0x40d/0x850
May 15 11:09:31 165-netshaper-deb-1030 kernel: [   61.291484]  ? call_rcu_sched+0x20/0x20
May 15 11:09:31 165-netshaper-deb-1030 kernel: [   61.291486]  kthread+0x112/0x130
May 15 11:09:31 165-netshaper-deb-1030 kernel: [   61.291487]  ? kthread_bind+0x30/0x30
May 15 11:09:31 165-netshaper-deb-1030 kernel: [   61.291488]  ret_from_fork+0x35/0x40

有人能帮助我吗？我不知道这个问题从何而来，也不知道该如何解决。

[编辑] 我刚刚在我的计算机上完全重新安装了 Windows 10 Pro（从 ISO 文件），然后安装了 Vbox，但我的虚拟机仍然遇到 CPU 问题。没有 USB 设备连接到虚拟机。我现在在这里使用 Debian 10.4。

May 19 12:02:24 102-ansible-deploy-deb-1040 kernel: [   53.265632] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
May 19 12:02:24 102-ansible-deploy-deb-1040 kernel: [   53.265657] rcu:         5-...!: (8 GPs behind) idle=6f4/0/0x0 softirq=701/701 fqs=1
May 19 12:02:24 102-ansible-deploy-deb-1040 kernel: [   53.265674] rcu:         (detected by 3, t=5261 jiffies, g=525, q=71)
May 19 12:02:24 102-ansible-deploy-deb-1040 kernel: [   53.265690] Sending NMI from CPU 3 to CPUs 5:
May 19 12:02:24 102-ansible-deploy-deb-1040 kernel: [   53.265716] NMI backtrace for cpu 5 skipped: idling at native_safe_halt+0xe/0x10
May 19 12:02:24 102-ansible-deploy-deb-1040 kernel: [   53.266688] rcu: rcu_sched kthread starved for 5208 jiffies! g525 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=5
May 19 12:02:24 102-ansible-deploy-deb-1040 kernel: [   53.266711] rcu: RCU grace-period kthread stack dump:
May 19 12:02:24 102-ansible-deploy-deb-1040 kernel: [   53.266723] rcu_sched       I    0    10      2 0x80000000
May 19 12:02:24 102-ansible-deploy-deb-1040 kernel: [   53.266725] Call Trace:
May 19 12:02:24 102-ansible-deploy-deb-1040 kernel: [   53.266729]  ? __schedule+0x2a2/0x870
May 19 12:02:24 102-ansible-deploy-deb-1040 kernel: [   53.266731]  schedule+0x28/0x80
May 19 12:02:24 102-ansible-deploy-deb-1040 kernel: [   53.266732]  schedule_timeout+0x16b/0x390
May 19 12:02:24 102-ansible-deploy-deb-1040 kernel: [   53.266734]  ? __next_timer_interrupt+0xc0/0xc0
May 19 12:02:24 102-ansible-deploy-deb-1040 kernel: [   53.266735]  rcu_gp_kthread+0x40d/0x850
May 19 12:02:24 102-ansible-deploy-deb-1040 kernel: [   53.266737]  ? call_rcu_sched+0x20/0x20
May 19 12:02:24 102-ansible-deploy-deb-1040 kernel: [   53.266738]  kthread+0x112/0x130
May 19 12:02:24 102-ansible-deploy-deb-1040 kernel: [   53.266739]  ? kthread_bind+0x30/0x30
May 19 12:02:24 102-ansible-deploy-deb-1040 kernel: [   53.266740]  ret_from_fork+0x35/0x40
May 19 12:02:48 102-ansible-deploy-deb-1040 systemd-timesyncd[293]: Synchronized to time server for the first time 51.159.6.183:123 (2.debian.pool.ntp.org).
May 19 12:03:23 102-ansible-deploy-deb-1040 kernel: [   96.971956] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
May 19 12:03:23 102-ansible-deploy-deb-1040 kernel: [   96.971984] rcu:         5-...!: (26 GPs behind) idle=cd8/0/0x0 softirq=717/717 fqs=1
May 19 12:03:23 102-ansible-deploy-deb-1040 kernel: [   96.972005] rcu:         (detected by 2, t=5252 jiffies, g=737, q=72)
May 19 12:03:23 102-ansible-deploy-deb-1040 kernel: [   96.972024] Sending NMI from CPU 2 to CPUs 5:
May 19 12:03:23 102-ansible-deploy-deb-1040 kernel: [   96.972052] NMI backtrace for cpu 5 skipped: idling at native_safe_halt+0xe/0x10
May 19 12:03:23 102-ansible-deploy-deb-1040 kernel: [   96.973023] rcu: rcu_sched kthread starved for 5174 jiffies! g737 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=5
May 19 12:03:23 102-ansible-deploy-deb-1040 kernel: [   96.973047] rcu: RCU grace-period kthread stack dump:
May 19 12:03:23 102-ansible-deploy-deb-1040 kernel: [   96.973059] rcu_sched       I    0    10      2 0x80000000
May 19 12:03:23 102-ansible-deploy-deb-1040 kernel: [   96.973060] Call Trace:
May 19 12:03:23 102-ansible-deploy-deb-1040 kernel: [   96.973065]  ? __schedule+0x2a2/0x870
May 19 12:03:23 102-ansible-deploy-deb-1040 kernel: [   96.973066]  schedule+0x28/0x80
May 19 12:03:23 102-ansible-deploy-deb-1040 kernel: [   96.973067]  schedule_timeout+0x16b/0x390
May 19 12:03:23 102-ansible-deploy-deb-1040 kernel: [   96.973070]  ? __next_timer_interrupt+0xc0/0xc0
May 19 12:03:23 102-ansible-deploy-deb-1040 kernel: [   96.973071]  rcu_gp_kthread+0x40d/0x850
May 19 12:03:23 102-ansible-deploy-deb-1040 kernel: [   96.973073]  ? call_rcu_sched+0x20/0x20
May 19 12:03:23 102-ansible-deploy-deb-1040 kernel: [   96.973074]  kthread+0x112/0x130
May 19 12:03:23 102-ansible-deploy-deb-1040 kernel: [   96.973075]  ? kthread_bind+0x30/0x30
May 19 12:03:23 102-ansible-deploy-deb-1040 kernel: [   96.973076]  ret_from_fork+0x35/0x40

在 VBox 设置、系统、处理器中，我尝试激活 PAE/NX 和 VT-x/AMD-v，但没有任何变化。我将尝试使用 Ubuntu，看看问题是否仍然存在。

[编辑]

看起来该问题没有发生在 Ubuntu 上。

[编辑于 2020/05/26]

相关内容