我正在运行 Ubuntu 16.04.6。有时我在系统日志中看到如下消息
Apr 24 05:19:46 vrni-platform kernel: [358660.688715] INFO: rcu_sched self-detected stall on CPU
Apr 24 05:21:04 vrni-platform kernel: [358660.922686] INFO: rcu_sched detected stalls on CPUs/tasks:
Apr 24 05:21:04 vrni-platform kernel: [358660.923361] 0-...: (42 ticks this GP) idle=df7/140000000000002/0 softirq=52382057/52382057 fqs=3
Apr 24 05:21:16 vrni-platform kernel: [358660.923361] (detected by 1, t=15015 jiffies, g=17833286, c=17833285, q=62)
Apr 24 05:21:16 vrni-platform kernel: [358660.923371] Task dump for CPU 0:
Apr 24 05:21:16 vrni-platform kernel: [358660.923373] java R running task 0 14071 13936 0x00000088
Apr 24 05:21:16 vrni-platform kernel: [358660.923427] ffffffff818624d1 50b27af1c55e3d3d 00007f1985ce6000 ffff880100847f58
Apr 24 05:21:16 vrni-platform kernel: [358660.923430] 0000000000000006 ffff8800359c2800 ffff88010134d400 ffff880100847f28
Apr 24 05:21:19 vrni-platform kernel: [358660.923432] ffffffff8106eeb1 ffff880100847ef8 0000000000000002 ffff8800359c2868
Apr 24 05:21:19 vrni-platform kernel: [358660.923435] Call Trace:
Apr 24 05:21:19 vrni-platform kernel: [358660.923566] [<ffffffff818624d1>] ? __schedule+0x341/0x810
Apr 24 05:21:19 vrni-platform kernel: [358660.923622] [<ffffffff8106eeb1>] ? __do_page_fault+0x1c1/0x410
Apr 24 05:21:19 vrni-platform kernel: [358660.923625] [<ffffffff8106f122>] ? do_page_fault+0x22/0x30
Apr 24 05:21:29 vrni-platform kernel: [358660.923628] rcu_sched kthread starved for 14996 jiffies! g17833286 c17833285 f0x0 s3 ->state=0x0
Apr 24 05:21:29 vrni-platform kernel: [358661.336282]
Apr 24 05:21:29 vrni-platform kernel: [358661.336413] 0-...: (42 ticks this GP) idle=df7/140000000000002/0 softirq=52382057/52382057 fqs=3
Apr 24 05:21:33 vrni-platform kernel: [358661.511285] (t=15015 jiffies g=17833286 c=17833285 q=65)
Apr 24 05:21:33 vrni-platform kernel: [358661.511441] rcu_sched kthread starved for 14996 jiffies! g17833286 c17833285 f0x2 s3 ->state=0x0
Apr 24 05:21:33 vrni-platform kernel: [358661.738834] Task dump for CPU 0:
Apr 24 05:21:35 vrni-platform kernel: [358661.738839] java R running task 0 14071 13936 0x00000088
Apr 24 05:21:35 vrni-platform kernel: [358661.738844] ffff88010134d400 50b27af1c55e3d3d ffff88043fc03ab8 ffffffff810b5d29
Apr 24 05:21:35 vrni-platform kernel: [358661.738847] 0000000000000000 ffffffff81e577c0 ffff88043fc03ad0 ffffffff810b8557
Apr 24 05:21:42 vrni-platform kernel: [358661.738860] 0000000000000001 ffff88043fc03b00 ffffffff810ed48e ffff88043fc17040
Apr 24 05:21:42 vrni-platform kernel: [358661.738863] Call Trace:
Apr 24 05:21:42 vrni-platform kernel: [358661.738860] 0000000000000001 ffff88043fc03b00 ffffffff810ed48e ffff88043fc17040
Apr 24 05:21:42 vrni-platform kernel: [358661.738863] Call Trace:
Apr 24 05:21:42 vrni-platform kernel: [358661.738868] <IRQ> [<ffffffff810b5d29>] sched_show_task+0xa9/0x110
Apr 24 05:21:55 vrni-platform kernel: [358661.738896] [<ffffffff810b8557>] dump_cpu_task+0x37/0x40
Apr 24 05:21:55 vrni-platform kernel: [358661.738913] [<ffffffff810ed48e>] rcu_dump_cpu_stacks+0x8e/0xe0
Apr 24 05:21:57 vrni-platform kernel: [358661.738916] [<ffffffff810f1480>] rcu_check_callbacks+0x500/0x7f0
Apr 24 05:21:57 vrni-platform kernel: [358661.738940] [<ffffffff8114b52c>] ? acct_account_cputime+0x1c/0x20
Apr 24 05:21:57 vrni-platform kernel: [358661.738942] [<ffffffff810b8ff9>] ? account_system_time+0x79/0x120
Apr 24 05:21:57 vrni-platform kernel: [358661.738956] [<ffffffff81107e60>] ? tick_sched_handle.isra.14+0x60/0x60
Apr 24 05:22:04 vrni-platform kernel: [358661.738959] [<ffffffff810f7b29>] update_process_times+0x39/0x60
Apr 24 05:22:09 vrni-platform kernel: [358661.738961] [<ffffffff81107e25>] tick_sched_handle.isra.14+0x25/0x60
Apr 24 05:22:09 vrni-platform kernel: [358661.738964] [<ffffffff81107e9d>] tick_sched_timer+0x3d/0x70
Apr 24 05:22:09 vrni-platform kernel: [358661.738966] [<ffffffff810f8472>] __hrtimer_run_queues+0x102/0x290
Apr 24 05:22:12 vrni-platform kernel: [358661.738968] [<ffffffff810f8c68>] hrtimer_interrupt+0xa8/0x1a0
Apr 24 05:22:12 vrni-platform kernel: [358661.773493] [<ffffffffc02f8400>] ? nf_ct_delete+0x290/0x290 [nf_conntrack]
Apr 24 05:22:27 vrni-platform kernel: [358661.773513] [<ffffffff8105590e>] local_apic_timer_interrupt+0x3e/0x60
Apr 24 05:22:29 vrni-platform kernel: [358661.773520] [<ffffffff8186ac4b>] smp_apic_timer_interrupt+0x4b/0x70
Apr 24 05:22:40 vrni-platform kernel: [358661.773525] [<ffffffff81868394>] apic_timer_interrupt+0xd4/0xe0
Apr 24 05:22:28 vrni-platform rsyslogd-2359: action 'action 16' resumed (module 'builtin:omfwd') [v8.16.0 try http://www.rsyslog.com/e/2359 ]
Apr 24 05:22:44 vrni-platform kernel: [358661.773538] [<ffffffffc02f8400>] ? nf_ct_delete+0x290/0x290 [nf_conntrack]
Apr 24 05:22:44 vrni-platform kernel: [358661.773544] [<ffffffff810f0933>] ? __call_rcu.constprop.70+0x23/0x2d0
Apr 24 05:22:45 vrni-platform kernel: [358661.773550] [<ffffffffc02f8400>] ? nf_ct_delete+0x290/0x290 [nf_conntrack]
Apr 24 05:22:45 vrni-platform kernel: [358661.773553] [<ffffffff810f0bfa>] kfree_call_rcu+0x1a/0x20
Apr 24 05:22:45 vrni-platform kernel: [358661.773558] [<ffffffffc02f72f8>] nf_conntrack_free+0x38/0x60 [nf_conntrack]
Apr 24 05:22:45 vrni-platform kernel: [358661.773563] [<ffffffffc02f7b70>] destroy_conntrack+0xb0/0x100 [nf_conntrack]
Apr 24 05:22:45 vrni-platform kernel: [358661.773606] [<ffffffff8178cfaa>] nf_conntrack_destroy+0x1a/0x20
Apr 24 05:22:45 vrni-platform kernel: [358661.773611] [<ffffffffc02f826d>] nf_ct_delete+0xfd/0x290 [nf_conntrack]
Apr 24 05:22:45 vrni-platform kernel: [358661.773616] [<ffffffffc02f8400>] ? nf_ct_delete+0x290/0x290 [nf_conntrack]
Apr 24 05:22:45 vrni-platform kernel: [358661.773621] [<ffffffffc02f8412>] death_by_timeout+0x12/0x20 [nf_conntrack]
Apr 24 05:22:45 vrni-platform kernel: [358661.773624] [<ffffffff810f5707>] call_timer_fn+0x37/0x140
Apr 24 05:22:45 vrni-platform kernel: [358661.773629] [<ffffffffc02f8400>] ? nf_ct_delete+0x290/0x290 [nf_conntrack]
Apr 24 05:22:45 vrni-platform kernel: [358661.773632] [<ffffffff810f7014>] run_timer_softirq+0x234/0x330
Apr 24 05:22:45 vrni-platform kernel: [358661.773646] [<ffffffff8108b4f9>] __do_softirq+0x109/0x2b0
Apr 24 05:22:45 vrni-platform kernel: [358661.773649] [<ffffffff8108b815>] irq_exit+0xa5/0xb0
Apr 24 05:22:45 vrni-platform kernel: [358661.773652] [<ffffffff8186ac50>] smp_apic_timer_interrupt+0x50/0x70
Apr 24 05:22:51 vrni-platform kernel: [358661.773655] [<ffffffff81868394>] apic_timer_interrupt+0xd4/0xe0
有人能告诉我 Ubuntu 16.04.6 上上述消息代表什么吗?它们可以被忽略吗?
答案1
jiffi 是内核中的基本度量单位,或称滴答率。对于 250 hz 内核,jiffi 为 4 毫秒,对于 1000 赫兹内核,jiffi 为 1 毫秒。无论如何,您的线程似乎已被搁置了 59 秒(假设内核为 250 赫兹)。
该消息是从调用它的例程rcu_check_gp_kthread_starvation
中打印出来的kernel/rcu/tree_stall.h
,并且还带有注释:
/*
* OK, time to rat on our buddy...
* See Documentation/RCU/stallwarn.txt for info on how to debug
* RCU CPU stall warnings.
*/
即这里。
顺便说一下,在我的系统上停顿时间是:
cat /sys/module/rcupdate/parameters/rcu_cpu_stall_timeout
60