突然,毫无原因,我的服务器停止响应。这是我在 /var/log/messages 中发现的。这是什么情况?
Apr 29 13:40:47 stephan kernel: ------------[ cut here ]------------
Apr 29 13:40:47 stephan kernel: WARNING: at lib/list_debug.c:56 __list_del_entry+0x82/0xd0()
Apr 29 13:40:47 stephan kernel: Hardware name: S5520SC
Apr 29 13:40:47 stephan kernel: list_del corruption. next->prev should be ffff880c86f92000, but was ffff880c86f92800
Apr 29 13:40:47 stephan kernel: Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack iptable_filter ip_tables bonding ip6t_REJECT nf_conntrack_ipv6 nf_defr$
Apr 29 13:40:47 stephan kernel: Pid: 66, comm: kswapd1 Not tainted 3.0.0+ #1
Apr 29 13:40:47 stephan kernel: Call Trace:
Apr 29 13:40:47 stephan kernel: <IRQ> [<ffffffff81062b2f>] warn_slowpath_common+0x7f/0xc0
Apr 29 13:40:47 stephan kernel: [<ffffffff8101b927>] ? intel_pmu_enable_all+0xa7/0x160
Apr 29 13:40:47 stephan kernel: [<ffffffff81062c26>] warn_slowpath_fmt+0x46/0x50
Apr 29 13:40:47 stephan kernel: [<ffffffff81268c72>] __list_del_entry+0x82/0xd0
Apr 29 13:40:47 stephan kernel: [<ffffffff81268cd1>] list_del+0x11/0x40
Apr 29 13:40:47 stephan kernel: [<ffffffff8114ba5b>] free_block+0xcb/0x180
Apr 29 13:40:47 stephan kernel: [<ffffffff8114b8e0>] kmem_cache_free+0x290/0x2b0
Apr 29 13:40:47 stephan kernel: [<ffffffff811ba941>] proc_i_callback+0x31/0x40
Apr 29 13:40:47 stephan kernel: [<ffffffff810ce6bc>] rcu_do_batch+0xdc/0x250
Apr 29 13:40:47 stephan kernel: [<ffffffff810ce8e4>] __rcu_process_callbacks+0xb4/0x1d0
Apr 29 13:40:47 stephan kernel: [<ffffffff810cea25>] rcu_process_callbacks+0x25/0x50
Apr 29 13:40:47 stephan kernel: [<ffffffff81069847>] __do_softirq+0xb7/0x210
Apr 29 13:40:47 stephan kernel: [<ffffffff810898c1>] ? hrtimer_interrupt+0x151/0x240
Apr 29 13:40:47 stephan kernel: [<ffffffff8150317c>] call_softirq+0x1c/0x30
Apr 29 13:40:47 stephan kernel: [<ffffffff8100d345>] do_softirq+0x65/0xa0
Apr 29 13:40:47 stephan kernel: [<ffffffff8106964d>] irq_exit+0xbd/0xe0
Apr 29 13:40:47 stephan kernel: <IRQ> [<ffffffff81062b2f>] warn_slowpath_common+0x7f/0xc0
Apr 29 13:40:47 stephan kernel: [<ffffffff8101b927>] ? intel_pmu_enable_all+0xa7/0x160
Apr 29 13:40:47 stephan kernel: [<ffffffff81062c26>] warn_slowpath_fmt+0x46/0x50
Apr 29 13:40:47 stephan kernel: [<ffffffff81268c72>] __list_del_entry+0x82/0xd0
Apr 29 13:40:47 stephan kernel: [<ffffffff81268cd1>] list_del+0x11/0x40
Apr 29 13:40:47 stephan kernel: [<ffffffff8114ba5b>] free_block+0xcb/0x180
Apr 29 13:40:47 stephan kernel: [<ffffffff8114b8e0>] kmem_cache_free+0x290/0x2b0
Apr 29 13:40:47 stephan kernel: [<ffffffff811ba941>] proc_i_callback+0x31/0x40
Apr 29 13:40:47 stephan kernel: [<ffffffff810ce6bc>] rcu_do_batch+0xdc/0x250
Apr 29 13:40:47 stephan kernel: [<ffffffff810ce8e4>] __rcu_process_callbacks+0xb4/0x1d0
Apr 29 13:40:47 stephan kernel: [<ffffffff810cea25>] rcu_process_callbacks+0x25/0x50
Apr 29 13:40:47 stephan kernel: [<ffffffff81069847>] __do_softirq+0xb7/0x210
Apr 29 13:40:47 stephan kernel: [<ffffffff810898c1>] ? hrtimer_interrupt+0x151/0x240
Apr 29 13:40:47 stephan kernel: [<ffffffff8150317c>] call_softirq+0x1c/0x30
Apr 29 13:40:47 stephan kernel: [<ffffffff8100d345>] do_softirq+0x65/0xa0
Apr 29 13:40:47 stephan kernel: [<ffffffff8106964d>] irq_exit+0xbd/0xe0
Apr 29 13:40:47 stephan kernel: [<ffffffff81503abe>] smp_apic_timer_interrupt+0x6e/0x99
Apr 29 13:40:47 stephan kernel: [<ffffffff81502933>] apic_timer_interrupt+0x13/0x20
Apr 29 13:40:47 stephan kernel: <EOI> [<ffffffffa03d9b08>] ? xfs_perag_get_tag+0x8/0xd0 [xfs]
Apr 29 13:40:47 stephan kernel: [<ffffffffa03f3968>] xfs_reclaim_inode_shrink+0x58/0xb0 [xfs]
Apr 29 13:40:47 stephan kernel: [<ffffffff81113191>] shrink_slab+0x81/0x1a0
Apr 29 13:40:47 stephan kernel: [<ffffffff811162ee>] balance_pgdat+0x70e/0x8f0
Apr 29 13:40:47 stephan kernel: [<ffffffff81116696>] kswapd+0x1c6/0x210
Apr 29 13:40:47 stephan kernel: [<ffffffff811164d0>] ? balance_pgdat+0x8f0/0x8f0
Apr 29 13:40:47 stephan kernel: [<ffffffff81084d16>] kthread+0x96/0xa0
Apr 29 13:40:47 stephan kernel: [<ffffffff81503084>] kernel_thread_helper+0x4/0x10
Apr 29 13:40:47 stephan kernel: [<ffffffff81084c80>] ? kthread_worker_fn+0x1a0/0x1a0
Apr 29 13:40:47 stephan kernel: [<ffffffff81503080>] ? gs_change+0x13/0x13
Apr 29 13:40:47 stephan kernel: ---[ end trace 40eb9c6ec15a76bf ]---
Apr 29 13:40:47 stephan kernel: ------------[ cut here ]------------
Apr 29 13:40:47 stephan kernel: WARNING: at lib/list_debug.c:53 __list_del_entry+0xa1/0xd0()
Apr 29 13:40:47 stephan kernel: Hardware name: S5520SC
Apr 29 13:40:47 stephan kernel: list_del corruption. prev->next should be ffff880c798a3000, but was 7f07e74200000000
Apr 29 13:40:47 stephan kernel: ------------[ cut here ]------------
Apr 29 13:40:47 stephan kernel: WARNING: at lib/list_debug.c:53 __list_del_entry+0xa1/0xd0()
Apr 29 13:40:47 stephan kernel: Hardware name: S5520SC
Apr 29 13:40:47 stephan kernel: list_del corruption. prev->next should be ffff880da7db9000, but was ffff880caa441000
Apr 29 13:40:47 stephan kernel: Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack iptable_filter ip_tables bonding ip6t_REJECT nf_conntrack_ipv6 nf_defr$
Apr 29 13:40:47 stephan kernel: Pid: 66, comm: kswapd1 Tainted: G W 3.0.0+ #1
Apr 29 13:40:47 stephan kernel: Call Trace:
Apr 29 13:40:47 stephan kernel: <IRQ> [<ffffffff81062b2f>] warn_slowpath_common+0x7f/0xc0
Apr 29 13:40:47 stephan kernel: [<ffffffff8101b927>] ? intel_pmu_enable_all+0xa7/0x160
Apr 29 13:40:47 stephan kernel: [<ffffffff81062c26>] warn_slowpath_fmt+0x46/0x50
Apr 29 13:40:47 stephan kernel: [<ffffffff81268c91>] __list_del_entry+0xa1/0xd0
Apr 29 13:40:47 stephan kernel: [<ffffffff81268cd1>] list_del+0x11/0x40
Apr 29 13:40:47 stephan kernel: [<ffffffff8114ba5b>] free_block+0xcb/0x180
Apr 29 13:40:47 stephan kernel: [<ffffffff8114b8e0>] kmem_cache_free+0x290/0x2b0
Apr 29 13:40:47 stephan kernel: [<ffffffff811ba941>] proc_i_callback+0x31/0x40
Apr 29 13:40:47 stephan kernel: [<ffffffff810ce6bc>] rcu_do_batch+0xdc/0x250
Apr 29 13:40:47 stephan kernel: [<ffffffff810ce8e4>] __rcu_process_callbacks+0xb4/0x1d0
Apr 29 13:40:47 stephan kernel: [<ffffffff810cea25>] rcu_process_callbacks+0x25/0x50
Apr 29 13:40:47 stephan kernel: [<ffffffff81069847>] __do_softirq+0xb7/0x210
我使用的是 centos6 64bit,不是 VM,系统运行了一年都没有问题。三个月前我将 CPU 升级到了 x5680。我希望不是 CPU 的问题,因为它相当昂贵。
答案1
虽然我们需要更多信息(内核版本、机器在此之前运行了多长时间、硬件),但我想提请你注意ffff880c86f92000,但原来是 ffff880c86f92800-line,这意味着位#11 从 0 翻转为 1。如果您没有 ECC RAM,我建议检查您的内存。
Apr 29 13:40:47 stephan kernel: list_del corruption. next->prev should be ffff880c86f92000, but was ffff880c86f92800