我的服务器自 2014 年 10 月起上线。
Debian GNU/Linux 7.6
uname -r
3.10.23-xxxx-std-ipv6-64
直到昨晚它都运行良好。昨晚它变得不负责任。我从数据中心控制面板重新启动它,一切都正常。今晚它又在不同的时间冻结了。
我没有安装任何新软件,只是定期更新。
服务器日志文件显示:rcu_sched 自我检测 CPU 停转
有任何想法吗?
Mar 4 01:51:01 server4 kernel: INFO: rcu_sched self-detected stall on CPU { 6} (t=15001 jiffies g=78281006 c=78281005 q=5678)
Mar 4 01:51:01 server4 kernel: sending NMI to all CPUs:
Mar 4 01:51:01 server4 kernel: NMI backtrace for cpu 6
Mar 4 01:51:01 server4 kernel: CPU: 6 PID: 2057 Comm: ps Not tainted 3.10.23-xxxx-std-ipv6-64 #1
Mar 4 01:51:01 server4 kernel: Hardware name: /DH67BL, BIOS BLH6710H.86A.0160.2012.1204.1156 12/04/2012
Mar 4 01:51:01 server4 kernel: task: ffff8803a9dbd620 ti: ffff8807bac0c000 task.ti: ffff8807bac0c000
Mar 4 01:51:01 server4 kernel: RIP: 0010:[<ffffffff81607f80>] [<ffffffff81607f80>] delay_loop+0x30/0x30
Mar 4 01:51:01 server4 kernel: RSP: 0018:ffff88081f383de0 EFLAGS: 00000887
Mar 4 01:51:01 server4 kernel: RAX: 00000000833e8900 RBX: 0000000000002710 RCX: 00000000019e1c28
Mar 4 01:51:01 server4 kernel: RDX: 000000000033599b RSI: 0000000000000060 RDI: 000000000033599c
Mar 4 01:51:01 server4 kernel: RBP: ffff88081f383de8 R08: 0000000000000400 R09: 000000000003d73d
Mar 4 01:51:01 server4 kernel: R10: 0000000000000002 R11: 000000000003d73c R12: 0000000000000006
Mar 4 01:51:01 server4 kernel: R13: ffffffff82168340 R14: ffff88081f38d5e0 R15: 000000000000162e
Mar 4 01:51:01 server4 kernel: FS: 00007fae36658700(0000) GS:ffff88081f380000(0000) knlGS:0000000000000000
Mar 4 01:51:01 server4 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 4 01:51:01 server4 kernel: CR2: 00007fae3665e000 CR3: 00000007eddd3000 CR4: 00000000001407e0
Mar 4 01:51:01 server4 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 4 01:51:01 server4 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Mar 4 01:51:01 server4 kernel: Stack:
Mar 4 01:51:01 server4 kernel: ffffffff8160806c ffff88081f383e08 ffffffff8106338a 0000000000000000
Mar 4 01:51:01 server4 kernel: ffffffff82168340 ffff88081f383e78 ffffffff81115cdd ffff88081f383e28
Mar 4 01:51:01 server4 kernel: ffffffff81117c17 ffff88081f383e68 ffffffff810e97df ffff8807bac0c000
Mar 4 01:51:01 server4 kernel: Call Trace:
Mar 4 01:51:01 server4 kernel: <IRQ>
Mar 4 01:51:01 server4 kernel: [<ffffffff8160806c>] ? __const_udelay+0x2c/0x30
Mar 4 01:51:01 server4 kernel: [<ffffffff8106338a>] arch_trigger_all_cpu_backtrace+0x6a/0xa0
Mar 4 01:51:01 server4 kernel: [<ffffffff81115cdd>] rcu_check_callbacks+0x2ed/0x560
Mar 4 01:51:01 server4 kernel: [<ffffffff81117c17>] ? acct_account_cputime+0x17/0x20
Mar 4 01:51:01 server4 kernel: [<ffffffff810e97df>] ? account_system_time+0xcf/0x180
Mar 4 01:51:01 server4 kernel: [<ffffffff810ca2c3>] update_process_times+0x43/0x80
Mar 4 01:51:01 server4 kernel: [<ffffffff810f95c1>] tick_sched_handle.isra.12+0x31/0x40
Mar 4 01:51:01 server4 kernel: [<ffffffff810f9704>] tick_sched_timer+0x44/0x70
Mar 4 01:51:01 server4 kernel: [<ffffffff810df07a>] __run_hrtimer.isra.29+0x4a/0xd0
Mar 4 01:51:01 server4 kernel: [<ffffffff810df9b5>] hrtimer_interrupt+0xf5/0x230
Mar 4 01:51:01 server4 kernel: [<ffffffff810626c4>] smp_apic_timer_interrupt+0x64/0xa0
Mar 4 01:51:01 server4 kernel: [<ffffffff81d421ca>] apic_timer_interrupt+0x6a/0x70
Mar 4 01:51:01 server4 kernel: <EOI>
Mar 4 01:51:01 server4 kernel: [<ffffffff81606c0a>] ? vsnprintf+0x3ea/0x640
Mar 4 01:51:01 server4 kernel: [<ffffffff81d409fd>] ? _raw_spin_lock+0x1d/0x30
Mar 4 01:51:01 server4 kernel: [<ffffffff8118632a>] __d_lookup+0x7a/0x160
Mar 4 01:51:01 server4 kernel: [<ffffffff8117ad56>] ? path_get+0x26/0x40
Mar 4 01:51:01 server4 kernel: [<ffffffff8117b771>] lookup_fast+0x161/0x2e0
Mar 4 01:51:01 server4 kernel: [<ffffffff811cf2fb>] ? proc_pid_permission+0xcb/0xe0
Mar 4 01:51:01 server4 kernel: [<ffffffff8117d581>] do_last.isra.62+0x171/0xc20
Mar 4 01:51:01 server4 kernel: [<ffffffff8117aab3>] ? inode_permission+0x13/0x50
Mar 4 01:51:01 server4 kernel: [<ffffffff8117bf35>] ? link_path_walk+0x245/0x810
Mar 4 01:51:01 server4 kernel: [<ffffffff8117e0de>] path_openat.isra.63+0xae/0x460
Mar 4 01:51:01 server4 kernel: [<ffffffff8117e4cc>] do_filp_open+0x3c/0x90
Mar 4 01:51:01 server4 kernel: [<ffffffff8118b212>] ? __alloc_fd+0x42/0x100
Mar 4 01:51:01 server4 kernel: [<ffffffff811701ff>] do_sys_open+0xef/0x1d0
Mar 4 01:51:01 server4 kernel: [<ffffffff811702fd>] SyS_open+0x1d/0x20
Mar 4 01:51:01 server4 kernel: [<ffffffff81d41692>] system_call_fastpath+0x16/0x1b
Mar 4 01:51:01 server4 kernel: Code: 89 e5 48 85 c0 74 19 eb 02 66 90 eb 0e 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 48 ff c8 75 fb 48 ff c8 5d c3 66 0f 1f 44 00 00 <55> 48 89 e5 65 44 8b 04 25 34$
Mar 4 01:51:01 server4 kernel: NMI backtrace for cpu 0
Mar 4 01:51:01 server4 kernel: CPU: 0 PID: 2091 Comm: php Not tainted 3.10.23-xxxx-std-ipv6-64 #1
Mar 4 01:51:01 server4 kernel: Hardware name: /DH67BL, BIOS BLH6710H.86A.0160.2012.1204.1156 12/04/2012
Mar 4 01:51:01 server4 kernel: task: ffff8807ebd8bba0 ti: ffff88066503e000 task.ti: ffff88066503e000
Mar 4 01:51:01 server4 kernel: RIP: 0010:[<ffffffff81151243>] [<ffffffff81151243>] get_vmalloc_info+0x63/0xe0
Mar 4 01:51:01 server4 kernel: RSP: 0018:ffff88066503fb88 EFLAGS: 00000287
Mar 4 01:51:01 server4 kernel: RAX: ffff8807eddcda80 RBX: ffff88066503fd80 RCX: ffffc8ffffffffff
Mar 4 01:51:01 server4 kernel: RDX: ffffc90008934000 RSI: 0000000000021000 RDI: ffffc90007cc2000
Mar 4 01:51:01 server4 kernel: RBP: ffff88066503fb98 R08: ffffe8fffffffffe R09: 0000000000018000
Mar 4 01:51:01 server4 kernel: R10: 0000000000000001 R11: 0000000000000202 R12: 000000000042a14e
Mar 4 01:51:01 server4 kernel: R13: 000000000035b072 R14: 000000000004712c R15: ffff880785132800
Mar 4 01:51:01 server4 kernel: FS: 00007f7e85629720(0000) GS:ffff88081f200000(0000) knlGS:0000000000000000
Mar 4 01:51:01 server4 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 4 01:51:01 server4 kernel: CR2: 00007f7e824fdc40 CR3: 00000007e7b15000 CR4: 00000000001407f0
Mar 4 01:51:01 server4 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 4 01:51:01 server4 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Mar 4 01:51:01 server4 kernel: Stack:
Mar 4 01:51:01 server4 kernel: 0000000000000000 00000000003366b3 ffff88066503fe58 ffffffff811d45dc
Mar 4 01:51:01 server4 kernel: ffff88066503fbd8 ffffffff81190164 ffff8807f28b2780 ffff880632895900
Mar 4 01:51:01 server4 kernel: ffff8807e7a6d600 0000000180400028 ffff88066503fc18 ffffffff81190cf0
Mar 4 01:51:01 server4 kernel: Call Trace:
Mar 4 01:51:01 server4 kernel: [<ffffffff811d45dc>] meminfo_proc_show+0xac/0x530
Mar 4 01:51:01 server4 kernel: [<ffffffff81190164>] ? seq_open+0x84/0x160
Mar 4 01:51:01 server4 kernel: [<ffffffff81190cf0>] ? single_open+0x60/0xb0
Mar 4 01:51:01 server4 kernel: [<ffffffff811615dc>] ? kmem_cache_free+0xec/0x100
Mar 4 01:51:01 server4 kernel: [<ffffffff8114bae3>] ? anon_vma_chain_free+0x13/0x20
Mar 4 01:51:01 server4 kernel: [<ffffffff8114d07e>] ? unlink_anon_vmas+0xce/0x1a0
Mar 4 01:51:01 server4 kernel: [<ffffffff81146906>] ? vma_gap_update+0x26/0x30
Mar 4 01:51:01 server4 kernel: [<ffffffff8114726d>] ? vma_adjust+0x3ad/0x660
Mar 4 01:51:01 server4 kernel: [<ffffffff811479fa>] ? vma_merge+0x2fa/0x320
Mar 4 01:51:01 server4 kernel: [<ffffffff81146b8f>] ? __vm_enough_memory+0x2f/0x180
Mar 4 01:51:01 server4 kernel: [<ffffffff81148eaa>] ? mmap_region+0x14a/0x5c0
Mar 4 01:51:01 server4 kernel: [<ffffffff8119037e>] seq_read+0x13e/0x380
Mar 4 01:51:01 server4 kernel: [<ffffffff8119037e>] seq_read+0x13e/0x380
Mar 4 01:51:01 server4 kernel: [<ffffffff811cd3b8>] proc_reg_read+0x38/0x70
Mar 4 01:51:01 server4 kernel: [<ffffffff811712c4>] vfs_read+0xa4/0x180
Mar 4 01:51:01 server4 kernel: [<ffffffff811717cd>] SyS_read+0x4d/0x90
Mar 4 01:51:01 server4 kernel: [<ffffffff81d41692>] system_call_fastpath+0x16/0x1b
Mar 4 01:51:01 server4 kernel: Code: 00 00 4c 8b 4b 08 48 83 e8 30 48 bf 00 00 00 00 00 c9 ff ff 48 b9 ff ff ff ff ff c8 ff ff 49 b8 fe ff ff ff ff e8 ff ff 48 8b 10 <48> 39 ca 76 28 4c 39 c2 77 34$
Mar 4 01:51:01 server4 kernel: NMI backtrace for cpu 2
Mar 4 01:51:01 server4 kernel: CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.10.23-xxxx-std-ipv6-64 #1
Mar 4 01:51:01 server4 kernel: Hardware name: /DH67BL, BIOS BLH6710H.86A.0160.2012.1204.1156 12/04/2012
Mar 4 01:51:01 server4 kernel: task: ffff8807f34bc240 ti: ffff8807f34f0000 task.ti: ffff8807f34f0000
Mar 4 01:51:01 server4 kernel: RIP: 0010:[<ffffffff81649d59>] [<ffffffff81649d59>] intel_idle+0xa9/0x100
Mar 4 01:51:01 server4 kernel: RSP: 0018:ffff8807f34f1dd8 EFLAGS: 00000046
Mar 4 01:51:01 server4 kernel: RAX: 0000000000000020 RBX: 0000000000000008 RCX: 0000000000000001
Mar 4 01:51:01 server4 kernel: RDX: 0000000000000000 RSI: ffff8807f34f1fd8 RDI: 0000000000000002
Mar 4 01:51:01 server4 kernel: RBP: ffff8807f34f1e08 R08: 0000000000000981 R09: 0000000000000010
Mar 4 01:51:01 server4 kernel: R10: 0000000000000f9c R11: 0000000000000000 R12: 0000000000000004
Mar 4 01:51:01 server4 kernel: R13: 0000000000000020 R14: 0000000000000003 R15: ffffffff821890b8
Mar 4 01:51:01 server4 kernel: FS: 0000000000000000(0000) GS:ffff88081f280000(0000) knlGS:0000000000000000
Mar 4 01:51:01 server4 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 4 01:51:01 server4 kernel: CR2: ffffffffff600400 CR3: 0000000002139000 CR4: 00000000001407e0
Mar 4 01:51:01 server4 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 4 01:51:01 server4 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Mar 4 01:51:01 server4 kernel: Stack:
今天的 syslog 文件以此内容开头:
Mar 5 06:25:08 server4 kernel: BUG: unable to handle kernel paging request at ffff800788fd1c30
Mar 5 06:25:08 server4 kernel: IP: [<ffffffff8119b22f>] __find_get_block_slow+0x9f/0x180
Mar 5 06:25:08 server4 kernel: PGD 0
Mar 5 06:25:08 server4 kernel: Oops: 0000 [#1] SMP
Mar 5 06:25:08 server4 kernel: CPU: 2 PID: 32054 Comm: updatedb.mlocat Tainted: G B 3.10.23-xxxx-std-ipv6-64 #1
Mar 5 06:25:08 server4 kernel: Hardware name: /DH67BL, BIOS BLH6710H.86A.0160.2012.1204.1156 12/04/2012
Mar 5 06:25:08 server4 kernel: task: ffff8807ebf0b500 ti: ffff8807ee3ce000 task.ti: ffff8807ee3ce000
Mar 5 06:25:08 server4 kernel: RIP: 0010:[<ffffffff8119b22f>] [<ffffffff8119b22f>] __find_get_block_slow+0x9f/0x180
Mar 5 06:25:08 server4 kernel: RSP: 0018:ffff8807ee3cfa78 EFLAGS: 00010202
Mar 5 06:25:08 server4 kernel: RAX: 0000000000000001 RBX: 0000000008e02393 RCX: 0000000000000002
Mar 5 06:25:08 server4 kernel: RDX: ffff800788fd1c30 RSI: ffff880788d01b50 RDI: ffff8807eead82b8
Mar 5 06:25:08 server4 kernel: RBP: ffff8807ee3cfae8 R08: 0000000000000002 R09: ffffea001d94cd9c
Mar 5 06:25:08 server4 kernel: R10: ffff880762a20020 R11: ffff8807fe803a00 R12: ffff8807eead80f0
Mar 5 06:25:08 server4 kernel: R13: ffff8807eead8230 R14: ffff800788fd1c30 R15: ffffea001d94cd80
Mar 5 06:25:08 server4 kernel: FS: 00007f7934b31700(0000) GS:ffff88081f280000(0000) knlGS:0000000000000000
Mar 5 06:25:08 server4 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 5 06:25:08 server4 kernel: CR2: ffff800788fd1c30 CR3: 00000007ece8c000 CR4: 00000000001407e0
Mar 5 06:25:08 server4 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 5 06:25:08 server4 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Mar 5 06:25:08 server4 kernel: Stack:
Mar 5 06:25:08 server4 kernel: ffff8807ee3cfa98 ffff8807eead8000 0000000008e02032 ffff8807eead80f0
Mar 5 06:25:08 server4 kernel: ffff8807ee3cfb18 ffffffff8119b258 ffffea0000000000 00000000de831424
Mar 5 06:25:08 server4 kernel: 3534000000215000 0000000000000000 ffff8807eead8000 0000000000001000
Mar 5 06:25:08 server4 kernel: Call Trace:
然后继续使用 rcu_sched 自我检测 CPU 停顿