Ubuntu 16 上的“自我检测 CPU 停转”系统日志消息表示什么?

Ubuntu 16 上的“自我检测 CPU 停转”系统日志消息表示什么?

我正在运行 Ubuntu 16.04.6。有时我在系统日志中看到如下消息

Apr 24 05:19:46 vrni-platform kernel: [358660.688715] INFO: rcu_sched self-detected stall on CPU
Apr 24 05:21:04 vrni-platform kernel: [358660.922686] INFO: rcu_sched detected stalls on CPUs/tasks:
Apr 24 05:21:04 vrni-platform kernel: [358660.923361]   0-...: (42 ticks this GP) idle=df7/140000000000002/0 softirq=52382057/52382057 fqs=3
Apr 24 05:21:16 vrni-platform kernel: [358660.923361]   (detected by 1, t=15015 jiffies, g=17833286, c=17833285, q=62)
Apr 24 05:21:16 vrni-platform kernel: [358660.923371] Task dump for CPU 0:
Apr 24 05:21:16 vrni-platform kernel: [358660.923373] java            R  running task        0 14071  13936 0x00000088
Apr 24 05:21:16 vrni-platform kernel: [358660.923427]  ffffffff818624d1 50b27af1c55e3d3d 00007f1985ce6000 ffff880100847f58
Apr 24 05:21:16 vrni-platform kernel: [358660.923430]  0000000000000006 ffff8800359c2800 ffff88010134d400 ffff880100847f28
Apr 24 05:21:19 vrni-platform kernel: [358660.923432]  ffffffff8106eeb1 ffff880100847ef8 0000000000000002 ffff8800359c2868
Apr 24 05:21:19 vrni-platform kernel: [358660.923435] Call Trace:
Apr 24 05:21:19 vrni-platform kernel: [358660.923566]  [<ffffffff818624d1>] ? __schedule+0x341/0x810
Apr 24 05:21:19 vrni-platform kernel: [358660.923622]  [<ffffffff8106eeb1>] ? __do_page_fault+0x1c1/0x410
Apr 24 05:21:19 vrni-platform kernel: [358660.923625]  [<ffffffff8106f122>] ? do_page_fault+0x22/0x30
Apr 24 05:21:29 vrni-platform kernel: [358660.923628] rcu_sched kthread starved for 14996 jiffies! g17833286 c17833285 f0x0 s3 ->state=0x0
Apr 24 05:21:29 vrni-platform kernel: [358661.336282]
Apr 24 05:21:29 vrni-platform kernel: [358661.336413]   0-...: (42 ticks this GP) idle=df7/140000000000002/0 softirq=52382057/52382057 fqs=3
Apr 24 05:21:33 vrni-platform kernel: [358661.511285]    (t=15015 jiffies g=17833286 c=17833285 q=65)
Apr 24 05:21:33 vrni-platform kernel: [358661.511441] rcu_sched kthread starved for 14996 jiffies! g17833286 c17833285 f0x2 s3 ->state=0x0
Apr 24 05:21:33 vrni-platform kernel: [358661.738834] Task dump for CPU 0:
Apr 24 05:21:35 vrni-platform kernel: [358661.738839] java            R  running task        0 14071  13936 0x00000088
Apr 24 05:21:35 vrni-platform kernel: [358661.738844]  ffff88010134d400 50b27af1c55e3d3d ffff88043fc03ab8 ffffffff810b5d29
Apr 24 05:21:35 vrni-platform kernel: [358661.738847]  0000000000000000 ffffffff81e577c0 ffff88043fc03ad0 ffffffff810b8557
Apr 24 05:21:42 vrni-platform kernel: [358661.738860]  0000000000000001 ffff88043fc03b00 ffffffff810ed48e ffff88043fc17040
Apr 24 05:21:42 vrni-platform kernel: [358661.738863] Call Trace:
Apr 24 05:21:42 vrni-platform kernel: [358661.738860]  0000000000000001 ffff88043fc03b00 ffffffff810ed48e ffff88043fc17040
Apr 24 05:21:42 vrni-platform kernel: [358661.738863] Call Trace:
Apr 24 05:21:42 vrni-platform kernel: [358661.738868]  <IRQ>  [<ffffffff810b5d29>] sched_show_task+0xa9/0x110
Apr 24 05:21:55 vrni-platform kernel: [358661.738896]  [<ffffffff810b8557>] dump_cpu_task+0x37/0x40
Apr 24 05:21:55 vrni-platform kernel: [358661.738913]  [<ffffffff810ed48e>] rcu_dump_cpu_stacks+0x8e/0xe0
Apr 24 05:21:57 vrni-platform kernel: [358661.738916]  [<ffffffff810f1480>] rcu_check_callbacks+0x500/0x7f0
Apr 24 05:21:57 vrni-platform kernel: [358661.738940]  [<ffffffff8114b52c>] ? acct_account_cputime+0x1c/0x20
Apr 24 05:21:57 vrni-platform kernel: [358661.738942]  [<ffffffff810b8ff9>] ? account_system_time+0x79/0x120
Apr 24 05:21:57 vrni-platform kernel: [358661.738956]  [<ffffffff81107e60>] ? tick_sched_handle.isra.14+0x60/0x60
Apr 24 05:22:04 vrni-platform kernel: [358661.738959]  [<ffffffff810f7b29>] update_process_times+0x39/0x60
Apr 24 05:22:09 vrni-platform kernel: [358661.738961]  [<ffffffff81107e25>] tick_sched_handle.isra.14+0x25/0x60
Apr 24 05:22:09 vrni-platform kernel: [358661.738964]  [<ffffffff81107e9d>] tick_sched_timer+0x3d/0x70
Apr 24 05:22:09 vrni-platform kernel: [358661.738966]  [<ffffffff810f8472>] __hrtimer_run_queues+0x102/0x290
Apr 24 05:22:12 vrni-platform kernel: [358661.738968]  [<ffffffff810f8c68>] hrtimer_interrupt+0xa8/0x1a0
Apr 24 05:22:12 vrni-platform kernel: [358661.773493]  [<ffffffffc02f8400>] ? nf_ct_delete+0x290/0x290 [nf_conntrack]
Apr 24 05:22:27 vrni-platform kernel: [358661.773513]  [<ffffffff8105590e>] local_apic_timer_interrupt+0x3e/0x60
Apr 24 05:22:29 vrni-platform kernel: [358661.773520]  [<ffffffff8186ac4b>] smp_apic_timer_interrupt+0x4b/0x70
Apr 24 05:22:40 vrni-platform kernel: [358661.773525]  [<ffffffff81868394>] apic_timer_interrupt+0xd4/0xe0
Apr 24 05:22:28 vrni-platform rsyslogd-2359: action 'action 16' resumed (module 'builtin:omfwd') [v8.16.0 try http://www.rsyslog.com/e/2359 ]
Apr 24 05:22:44 vrni-platform kernel: [358661.773538]  [<ffffffffc02f8400>] ? nf_ct_delete+0x290/0x290 [nf_conntrack]
Apr 24 05:22:44 vrni-platform kernel: [358661.773544]  [<ffffffff810f0933>] ? __call_rcu.constprop.70+0x23/0x2d0
Apr 24 05:22:45 vrni-platform kernel: [358661.773550]  [<ffffffffc02f8400>] ? nf_ct_delete+0x290/0x290 [nf_conntrack]
Apr 24 05:22:45 vrni-platform kernel: [358661.773553]  [<ffffffff810f0bfa>] kfree_call_rcu+0x1a/0x20
Apr 24 05:22:45 vrni-platform kernel: [358661.773558]  [<ffffffffc02f72f8>] nf_conntrack_free+0x38/0x60 [nf_conntrack]
Apr 24 05:22:45 vrni-platform kernel: [358661.773563]  [<ffffffffc02f7b70>] destroy_conntrack+0xb0/0x100 [nf_conntrack]
Apr 24 05:22:45 vrni-platform kernel: [358661.773606]  [<ffffffff8178cfaa>] nf_conntrack_destroy+0x1a/0x20
Apr 24 05:22:45 vrni-platform kernel: [358661.773611]  [<ffffffffc02f826d>] nf_ct_delete+0xfd/0x290 [nf_conntrack]
Apr 24 05:22:45 vrni-platform kernel: [358661.773616]  [<ffffffffc02f8400>] ? nf_ct_delete+0x290/0x290 [nf_conntrack]
Apr 24 05:22:45 vrni-platform kernel: [358661.773621]  [<ffffffffc02f8412>] death_by_timeout+0x12/0x20 [nf_conntrack]
Apr 24 05:22:45 vrni-platform kernel: [358661.773624]  [<ffffffff810f5707>] call_timer_fn+0x37/0x140
Apr 24 05:22:45 vrni-platform kernel: [358661.773629]  [<ffffffffc02f8400>] ? nf_ct_delete+0x290/0x290 [nf_conntrack]
Apr 24 05:22:45 vrni-platform kernel: [358661.773632]  [<ffffffff810f7014>] run_timer_softirq+0x234/0x330
Apr 24 05:22:45 vrni-platform kernel: [358661.773646]  [<ffffffff8108b4f9>] __do_softirq+0x109/0x2b0
Apr 24 05:22:45 vrni-platform kernel: [358661.773649]  [<ffffffff8108b815>] irq_exit+0xa5/0xb0
Apr 24 05:22:45 vrni-platform kernel: [358661.773652]  [<ffffffff8186ac50>] smp_apic_timer_interrupt+0x50/0x70
Apr 24 05:22:51 vrni-platform kernel: [358661.773655]  [<ffffffff81868394>] apic_timer_interrupt+0xd4/0xe0

有人能告诉我 Ubuntu 16.04.6 上上述消息代表什么吗?它们可以被忽略吗?

答案1

jiffi 是内核中的基本度量单位,或称滴答率。对于 250 hz 内核,jiffi 为 4 毫秒,对于 1000 赫兹内核,jiffi 为 1 毫秒。无论如何,您的线程似乎已被搁置了 59 秒(假设内核为 250 赫兹)。

该消息是从调用它的例程rcu_check_gp_kthread_starvation中打印出来的kernel/rcu/tree_stall.h,并且还带有注释:

    /*
     * OK, time to rat on our buddy...
     * See Documentation/RCU/stallwarn.txt for info on how to debug
     * RCU CPU stall warnings.
     */

这里

顺便说一下,在我的系统上停顿时间是:

cat /sys/module/rcupdate/parameters/rcu_cpu_stall_timeout
60

相关内容