在轻负载的 vps(debian squeeze kernel 2.6.38-bpo.2-amd64,2GB,SSD)上出现了这些失控情况 - 最高负载为零,然后跳到 30.0,一切都停滞不前。我猜 Xen Hypervisor 启动并限制了一切 - 供应商控制面板显示 CPU @ 175%。几分钟后,CPU 和负载将回落到正常水平,一切恢复正常。
这是最新的 kern.log 的副本
Oct 11 20:10:34 stage kernel: [348092.046302] Clocksource tsc unstable (delta = -8590343613 ns)
Oct 11 20:16:55 stage kernel: [348311.490907] INFO: task bounce:23869 blocked for more than 120 seconds.
Oct 11 20:16:56 stage kernel: [348311.490918] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 11 20:16:56 stage kernel: [348311.490924] bounce D ffff880003085100 0 23869 1153 0x00000000
Oct 11 20:16:56 stage kernel: [348311.490933] ffff880003085100 0000000000000286 0000000000000000 ffff88007aaf3600
Oct 11 20:16:56 stage kernel: [348311.490943] 0000000000013700 ffff88007c16bfd8 ffff88007c16bfd8 0000000000013700
Oct 11 20:16:56 stage kernel: [348311.490953] ffff880003085100 ffff88007c16a010 ffff8800032bc098 ffffffff8103b96e
Oct 11 20:16:56 stage kernel: [348311.490963] Call Trace:
Oct 11 20:16:56 stage kernel: [348311.490973] [<ffffffff8103b96e>] ? __wake_up+0x35/0x46
Oct 11 20:16:56 stage kernel: [348311.490982] [<ffffffff81326e50>] ? _raw_spin_lock_irqsave+0x11/0x2f
Oct 11 20:16:56 stage kernel: [348311.490993] [<ffffffffa00249ec>] ? log_wait_commit+0xc0/0x111 [jbd]
Oct 11 20:16:56 stage kernel: [348311.491000] [<ffffffff8106033d>] ? autoremove_wake_function+0x0/0x2a
Oct 11 20:16:56 stage kernel: [348311.491006] [<ffffffff81326ea0>] ? _raw_spin_unlock_irqrestore+0x10/0x11
Oct 11 20:16:56 stage kernel: [348311.491015] [<ffffffffa00377f6>] ? ext3_sync_file+0xbe/0xec [ext3]
Oct 11 20:16:56 stage kernel: [348311.491024] [<ffffffff81117733>] ? vfs_fsync_range+0x4c/0x73
Oct 11 20:16:56 stage kernel: [348311.491030] [<ffffffff811177db>] ? do_fsync+0x27/0x3c
Oct 11 20:16:56 stage kernel: [348311.491035] [<ffffffff8111780d>] ? sys_fsync+0xb/0xf
Oct 11 20:16:56 stage kernel: [348311.491041] [<ffffffff81009973>] ? sysret_check+0x17/0x5a
Oct 11 20:16:56 stage kernel: [348311.491047] [<ffffffff81009952>] ? system_call_fastpath+0x16/0x1b
Oct 11 20:16:56 stage kernel: [348311.491053] INFO: task bounce:23873 blocked for more than 120 seconds.
Oct 11 20:16:56 stage kernel: [348311.491057] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 11 20:16:56 stage kernel: [348311.491063] bounce D ffff880003083600 0 23873 1153 0x00000000
Oct 11 20:16:56 stage kernel: [348311.491071] ffff880003083600 0000000000000282 0000000000000000 ffff88007aaf3600
Oct 11 20:16:56 stage kernel: [348311.491080] 0000000000013700 ffff88006d859fd8 ffff88006d859fd8 0000000000013700
Oct 11 20:16:56 stage kernel: [348311.491089] ffff880003083600 ffff88006d858010 ffff88006d859eb0 0000000000000000
Oct 11 20:16:56 stage kernel: [348311.491098] Call Trace:
Oct 11 20:16:56 stage kernel: [348311.491104] [<ffffffffa00249ec>] ? log_wait_commit+0xc0/0x111 [jbd]
Oct 11 20:16:56 stage kernel: [348311.491111] [<ffffffff8106033d>] ? autoremove_wake_function+0x0/0x2a
Oct 11 20:16:56 stage kernel: [348311.491118] [<ffffffffa00377f6>] ? ext3_sync_file+0xbe/0xec [ext3]
Oct 11 20:16:56 stage kernel: [348311.491125] [<ffffffff81117733>] ? vfs_fsync_range+0x4c/0x73
Oct 11 20:16:56 stage kernel: [348311.491130] [<ffffffff811177db>] ? do_fsync+0x27/0x3c
Oct 11 20:16:56 stage kernel: [348311.491136] [<ffffffff8111780d>] ? sys_fsync+0xb/0xf
Oct 11 20:16:56 stage kernel: [348311.491141] [<ffffffff81009973>] ? sysret_check+0x17/0x5a
Oct 11 20:16:56 stage kernel: [348311.491146] [<ffffffff81009952>] ? system_call_fastpath+0x16/0x1b
Oct 11 20:16:56 stage kernel: [348311.491152] INFO: task bounce:23875 blocked for more than 120 seconds.
Oct 11 20:16:56 stage kernel: [348311.491157] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 11 20:16:56 stage kernel: [348311.491162] bounce D ffff880003085e80 0 23875 1153 0x00000000
Oct 11 20:16:56 stage kernel: [348311.491170] ffff880003085e80 0000000000000286 0000000000000000 ffff88007aaf3600
Oct 11 20:16:56 stage kernel: [348311.491179] 0000000000013700 ffff88004eecdfd8 ffff88004eecdfd8 0000000000013700
Oct 11 20:16:56 stage kernel: [348311.491188] ffff880003085e80 ffff88004eecc010 ffff880003085e80 0000000000000000
Oct 11 20:16:56 stage kernel: [348311.491198] Call Trace:
Oct 11 20:16:56 stage kernel: [348311.491203] [<ffffffffa00249ec>] ? log_wait_commit+0xc0/0x111 [jbd]
Oct 11 20:16:56 stage kernel: [348311.491210] [<ffffffff8106033d>] ? autoremove_wake_function+0x0/0x2a
Oct 11 20:16:56 stage kernel: [348311.491217] [<ffffffff8100679f>] ? xen_restore_fl_direct_end+0x0/0x1
Oct 11 20:16:56 stage kernel: [348311.491223] [<ffffffff81326ea0>] ? _raw_spin_unlock_irqrestore+0x10/0x11
Oct 11 20:16:56 stage kernel: [348311.491230] [<ffffffffa00377f6>] ? ext3_sync_file+0xbe/0xec [ext3]
Oct 11 20:16:56 stage kernel: [348311.491237] [<ffffffff81117733>] ? vfs_fsync_range+0x4c/0x73
Oct 11 20:16:56 stage kernel: [348311.491243] [<ffffffff811177db>] ? do_fsync+0x27/0x3c
Oct 11 20:16:56 stage kernel: [348311.491248] [<ffffffff8111780d>] ? sys_fsync+0xb/0xf
Oct 11 20:16:56 stage kernel: [348311.491253] [<ffffffff81009973>] ? sysret_check+0x17/0x5a
Oct 11 20:16:56 stage kernel: [348311.491258] [<ffffffff81009952>] ? system_call_fastpath+0x16/0x1b
Oct 11 20:16:56 stage kernel: [348431.612291] INFO: task bounce:23869 blocked for more than 120 seconds.
Oct 11 20:16:56 stage kernel: [348431.612305] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 11 20:16:56 stage kernel: [348431.612314] bounce D ffff880003085100 0 23869 1153 0x00000000
Oct 11 20:16:56 stage kernel: [348431.612327] ffff880003085100 0000000000000286 0000000000000000 ffff88007aaf3600
Oct 11 20:16:56 stage kernel: [348431.612343] 0000000000013700 ffff88007c16bfd8 ffff88007c16bfd8 0000000000013700
Oct 11 20:16:56 stage kernel: [348431.612357] ffff880003085100 ffff88007c16a010 ffff8800032bc098 ffffffff8103b96e
Oct 11 20:16:56 stage kernel: [348431.612372] Call Trace:
Oct 11 20:16:56 stage kernel: [348431.612386] [<ffffffff8103b96e>] ? __wake_up+0x35/0x46
Oct 11 20:16:56 stage kernel: [348431.612397] [<ffffffff81326e50>] ? _raw_spin_lock_irqsave+0x11/0x2f
Oct 11 20:16:56 stage kernel: [348431.612413] [<ffffffffa00249ec>] ? log_wait_commit+0xc0/0x111 [jbd]
Oct 11 20:16:56 stage kernel: [348431.612423] [<ffffffff8106033d>] ? autoremove_wake_function+0x0/0x2a
Oct 11 20:16:56 stage kernel: [348431.612432] [<ffffffff81326ea0>] ? _raw_spin_unlock_irqrestore+0x10/0x11
Oct 11 20:16:56 stage kernel: [348431.612445] [<ffffffffa00377f6>] ? ext3_sync_file+0xbe/0xec [ext3]
Oct 11 20:16:56 stage kernel: [348431.612461] [<ffffffff81117733>] ? vfs_fsync_range+0x4c/0x73
Oct 11 20:16:56 stage kernel: [348431.612470] [<ffffffff811177db>] ? do_fsync+0x27/0x3c
Oct 11 20:16:56 stage kernel: [348431.612478] [<ffffffff8111780d>] ? sys_fsync+0xb/0xf
Oct 11 20:16:56 stage kernel: [348431.612487] [<ffffffff81009973>] ? sysret_check+0x17/0x5a
Oct 11 20:16:56 stage kernel: [348431.612495] [<ffffffff81009952>] ? system_call_fastpath+0x16/0x1b
Oct 11 20:16:56 stage kernel: [348431.612504] INFO: task bounce:23873 blocked for more than 120 seconds.
Oct 11 20:16:56 stage kernel: [348431.612511] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 11 20:16:56 stage kernel: [348431.612520] bounce D ffff880003083600 0 23873 1153 0x00000000
Oct 11 20:16:56 stage kernel: [348431.612531] ffff880003083600 0000000000000282 0000000000000000 ffff88007aaf3600
Oct 11 20:16:56 stage kernel: [348431.612545] 0000000000013700 ffff88006d859fd8 ffff88006d859fd8 0000000000013700
Oct 11 20:16:56 stage kernel: [348431.612559] ffff880003083600 ffff88006d858010 ffff88006d859eb0 0000000000000000
Oct 11 20:16:56 stage kernel: [348431.612573] Call Trace:
Oct 11 20:16:56 stage kernel: [348431.612582] [<ffffffffa00249ec>] ? log_wait_commit+0xc0/0x111 [jbd]
Oct 11 20:16:56 stage kernel: [348431.612591] [<ffffffff8106033d>] ? autoremove_wake_function+0x0/0x2a
Oct 11 20:16:56 stage kernel: [348431.612603] [<ffffffffa00377f6>] ? ext3_sync_file+0xbe/0xec [ext3]
Oct 11 20:16:56 stage kernel: [348431.612612] [<ffffffff81117733>] ? vfs_fsync_range+0x4c/0x73
Oct 11 20:16:56 stage kernel: [348431.612621] [<ffffffff811177db>] ? do_fsync+0x27/0x3c
Oct 11 20:16:56 stage kernel: [348431.612629] [<ffffffff8111780d>] ? sys_fsync+0xb/0xf
Oct 11 20:16:56 stage kernel: [348431.612637] [<ffffffff81009973>] ? sysret_check+0x17/0x5a
Oct 11 20:16:56 stage kernel: [348431.612645] [<ffffffff81009952>] ? system_call_fastpath+0x16/0x1b
Oct 11 20:16:56 stage kernel: [348431.612653] INFO: task bounce:23875 blocked for more than 120 seconds.
Oct 11 20:16:56 stage kernel: [348431.612660] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 11 20:16:56 stage kernel: [348431.612669] bounce D ffff880003085e80 0 23875 1153 0x00000000
Oct 11 20:16:56 stage kernel: [348431.612680] ffff880003085e80 0000000000000286 0000000000000000 ffff88007aaf3600
Oct 11 20:16:56 stage kernel: [348431.612694] 0000000000013700 ffff88004eecdfd8 ffff88004eecdfd8 0000000000013700
Oct 11 20:16:56 stage kernel: [348431.612708] ffff880003085e80 ffff88004eecc010 ffff880003085e80 0000000000000000
Oct 11 20:16:56 stage kernel: [348431.612721] Call Trace:
Oct 11 20:16:56 stage kernel: [348431.612730] [<ffffffffa00249ec>] ? log_wait_commit+0xc0/0x111 [jbd]
Oct 11 20:16:56 stage kernel: [348431.612739] [<ffffffff8106033d>] ? autoremove_wake_function+0x0/0x2a
Oct 11 20:16:56 stage kernel: [348431.612750] [<ffffffff8100679f>] ? xen_restore_fl_direct_end+0x0/0x1
Oct 11 20:16:56 stage kernel: [348431.612759] [<ffffffff81326ea0>] ? _raw_spin_unlock_irqrestore+0x10/0x11
Oct 11 20:16:56 stage kernel: [348431.612770] [<ffffffffa00377f6>] ? ext3_sync_file+0xbe/0xec [ext3]
Oct 11 20:16:56 stage kernel: [348431.612780] [<ffffffff81117733>] ? vfs_fsync_range+0x4c/0x73
Oct 11 20:16:56 stage kernel: [348431.612788] [<ffffffff811177db>] ? do_fsync+0x27/0x3c
Oct 11 20:16:56 stage kernel: [348431.612797] [<ffffffff8111780d>] ? sys_fsync+0xb/0xf
Oct 11 20:16:56 stage kernel: [348431.612804] [<ffffffff81009973>] ? sysret_check+0x17/0x5a
Oct 11 20:16:56 stage kernel: [348431.612812] [<ffffffff81009952>] ? system_call_fastpath+0x16/0x1b
Oct 11 20:16:56 stage kernel: [348431.612820] INFO: task bounce:23876 blocked for more than 120 seconds.
Oct 11 20:16:56 stage kernel: [348431.612827] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 11 20:16:56 stage kernel: [348431.612836] bounce D ffff880003081440 0 23876 1153 0x00000000
Oct 11 20:16:56 stage kernel: [348431.612847] ffff880003081440 0000000000000282 0000000000000000 ffff88007abea1c0
Oct 11 20:16:56 stage kernel: [348431.612861] 0000000000013700 ffff88006d80ffd8 ffff88006d80ffd8 0000000000013700
Oct 11 20:16:56 stage kernel: [348431.612874] ffff880003081440 ffff88006d80e010 ffff88006d80feb0 0000000000000000
Oct 11 20:16:56 stage kernel: [348431.612888] Call Trace:
Oct 11 20:16:56 stage kernel: [348431.612897] [<ffffffffa00249ec>] ? log_wait_commit+0xc0/0x111 [jbd]
Oct 11 20:16:56 stage kernel: [348431.612906] [<ffffffff8106033d>] ? autoremove_wake_function+0x0/0x2a
Oct 11 20:16:56 stage kernel: [348431.612917] [<ffffffffa00377f6>] ? ext3_sync_file+0xbe/0xec [ext3]
Oct 11 20:16:56 stage kernel: [348431.612926] [<ffffffff81117733>] ? vfs_fsync_range+0x4c/0x73
Oct 11 20:16:56 stage kernel: [348431.612935] [<ffffffff811177db>] ? do_fsync+0x27/0x3c
Oct 11 20:16:56 stage kernel: [348431.612943] [<ffffffff8111780d>] ? sys_fsync+0xb/0xf
Oct 11 20:16:56 stage kernel: [348431.612951] [<ffffffff81009973>] ? sysret_check+0x17/0x5a
Oct 11 20:16:56 stage kernel: [348431.612959] [<ffffffff81009952>] ? system_call_fastpath+0x16/0x1b
Oct 11 20:16:56 stage kernel: [348431.612968] INFO: task postdrop:23904 blocked for more than 120 seconds.
Oct 11 20:16:56 stage kernel: [348431.612976] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 11 20:16:56 stage kernel: [348431.612984] postdrop D ffff88007b586540 0 23904 23903 0x00000000
Oct 11 20:16:56 stage kernel: [348431.612995] ffff88007b586540 0000000000000282 0000000000000000 ffff88007abe9440
Oct 11 20:16:56 stage kernel: [348431.613009] 0000000000013700 ffff88004e94ffd8 ffff88004e94ffd8 0000000000013700
Oct 11 20:16:56 stage kernel: [348431.613023] ffff88007b586540 ffff88004e94e010 ffff88007abe9440 0000000000000000
Oct 11 20:16:56 stage kernel: [348431.613036] Call Trace:
Oct 11 20:16:56 stage kernel: [348431.613045] [<ffffffffa00249ec>] ? log_wait_commit+0xc0/0x111 [jbd]
Oct 11 20:16:56 stage kernel: [348431.613054] [<ffffffff8106033d>] ? autoremove_wake_function+0x0/0x2a
Oct 11 20:16:56 stage kernel: [348431.613065] [<ffffffffa00377f6>] ? ext3_sync_file+0xbe/0xec [ext3]
Oct 11 20:16:56 stage kernel: [348431.613075] [<ffffffff81117733>] ? vfs_fsync_range+0x4c/0x73
Oct 11 20:16:56 stage kernel: [348431.613083] [<ffffffff811177db>] ? do_fsync+0x27/0x3c
Oct 11 20:16:56 stage kernel: [348431.613091] [<ffffffff8111780d>] ? sys_fsync+0xb/0xf
Oct 11 20:16:56 stage kernel: [348431.613099] [<ffffffff81009973>] ? sysret_check+0x17/0x5a
Oct 11 20:16:56 stage kernel: [348431.613107] [<ffffffff81009952>] ? system_call_fastpath+0x16/0x1b
Oct 11 20:21:02 stage kernel: [348552.062201] INFO: task bounce:23869 blocked for more than 120 seconds.
Oct 11 20:21:02 stage kernel: [348552.062216] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 11 20:21:02 stage kernel: [348552.062225] bounce D ffff880003085100 0 23869 1153 0x00000000
Oct 11 20:21:02 stage kernel: [348552.062238] ffff880003085100 0000000000000286 0000000000000000 ffff88007aaf3600
Oct 11 20:21:02 stage kernel: [348552.062253] 0000000000013700 ffff88007c16bfd8 ffff88007c16bfd8 0000000000013700
Oct 11 20:21:02 stage kernel: [348552.062268] ffff880003085100 ffff88007c16a010 ffff8800032bc098 ffffffff8103b96e
Oct 11 20:21:02 stage kernel: [348552.062282] Call Trace:
Oct 11 20:21:02 stage kernel: [348552.062301] [<ffffffff8103b96e>] ? __wake_up+0x35/0x46
Oct 11 20:21:02 stage kernel: [348552.062313] [<ffffffff81326e50>] ? _raw_spin_lock_irqsave+0x11/0x2f
Oct 11 20:21:02 stage kernel: [348552.062329] [<ffffffffa00249ec>] ? log_wait_commit+0xc0/0x111 [jbd]
Oct 11 20:21:02 stage kernel: [348552.062339] [<ffffffff8106033d>] ? autoremove_wake_function+0x0/0x2a
Oct 11 20:21:02 stage kernel: [348552.062349] [<ffffffff81326ea0>] ? _raw_spin_unlock_irqrestore+0x10/0x11
Oct 11 20:21:02 stage kernel: [348552.062361] [<ffffffffa00377f6>] ? ext3_sync_file+0xbe/0xec [ext3]
Oct 11 20:21:02 stage kernel: [348552.062373] [<ffffffff81117733>] ? vfs_fsync_range+0x4c/0x73
Oct 11 20:21:02 stage kernel: [348552.062382] [<ffffffff811177db>] ? do_fsync+0x27/0x3c
Oct 11 20:21:02 stage kernel: [348552.062390] [<ffffffff8111780d>] ? sys_fsync+0xb/0xf
Oct 11 20:21:02 stage kernel: [348552.062399] [<ffffffff81009973>] ? sysret_check+0x17/0x5a
Oct 11 20:21:02 stage kernel: [348552.062407] [<ffffffff81009952>] ? system_call_fastpath+0x16/0x1b
Oct 11 20:21:02 stage kernel: [348552.062416] INFO: task bounce:23873 blocked for more than 120 seconds.
Oct 11 20:21:02 stage kernel: [348552.062423] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 11 20:21:02 stage kernel: [348552.062431] bounce D ffff880003083600 0 23873 1153 0x00000000
Oct 11 20:21:02 stage kernel: [348552.062442] ffff880003083600 0000000000000282 0000000000000000 ffff88007aaf3600
Oct 11 20:21:02 stage kernel: [348552.062456] 0000000000013700 ffff88006d859fd8 ffff88006d859fd8 0000000000013700
Oct 11 20:21:02 stage kernel: [348552.062470] ffff880003083600 ffff88006d858010 ffff88006d859eb0 0000000000000000
Oct 11 20:21:02 stage kernel: [348552.062484] Call Trace:
Oct 11 20:21:02 stage kernel: [348552.062493] [<ffffffffa00249ec>] ? log_wait_commit+0xc0/0x111 [jbd]
Oct 11 20:21:02 stage kernel: [348552.062502] [<ffffffff8106033d>] ? autoremove_wake_function+0x0/0x2a
Oct 11 20:21:02 stage kernel: [348552.062513] [<ffffffffa00377f6>] ? ext3_sync_file+0xbe/0xec [ext3]
Oct 11 20:21:02 stage kernel: [348552.062522] [<ffffffff81117733>] ? vfs_fsync_range+0x4c/0x73
Oct 11 20:21:02 stage kernel: [348552.062531] [<ffffffff811177db>] ? do_fsync+0x27/0x3c
Oct 11 20:21:02 stage kernel: [348552.062539] [<ffffffff8111780d>] ? sys_fsync+0xb/0xf
Oct 11 20:21:02 stage kernel: [348552.062547] [<ffffffff81009973>] ? sysret_check+0x17/0x5a
Oct 11 20:21:02 stage kernel: [348552.062554] [<ffffffff81009952>] ? system_call_fastpath+0x16/0x1b
关于去哪里看,您有什么想法吗?
[编辑] 添加几天前发生的另一个软锁定 - 与反弹无关:
Oct 7 06:26:39 stage kernel: [5729599.258122] BUG: soft lockup - CPU#1 stuck for 80s! [events/1:16]
Oct 7 06:26:39 stage kernel: [5729599.258122] Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp ipt_LOG iptable_filter ip_tables x_tables loop snd_pcm snd_timer evdev snd soundcore snd_page_alloc pcspkr ext3 jbd mbcache xen_netfront xen_blkfront
Oct 7 06:26:39 stage kernel: [5729599.258122] CPU 1:
Oct 7 06:26:39 stage kernel: [5729599.258122] Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp ipt_LOG iptable_filter ip_tables x_tables loop snd_pcm snd_timer evdev snd soundcore snd_page_alloc pcspkr ext3 jbd mbcache xen_netfront xen_blkfront
Oct 7 06:26:39 stage kernel: [5729599.258122] Pid: 16, comm: events/1 Not tainted 2.6.32-5-amd64 #1
Oct 7 06:26:39 stage kernel: [5729599.258122] RIP: e030:[<ffffffff8100922a>] [<ffffffff8100922a>] hypercall_page+0x22a/0x1001
Oct 7 06:26:39 stage kernel: [5729599.258122] RSP: e02b:ffff88007ffb1cc8 EFLAGS: 00000246
Oct 7 06:26:39 stage kernel: [5729599.258122] RAX: 0000000000040001 RBX: ffff880003519780 RCX: ffffffff8100922a
Oct 7 06:26:39 stage kernel: [5729599.258122] RDX: 00000000000116e0 RSI: 0000000000000000 RDI: 0000000000000000
Oct 7 06:26:39 stage kernel: [5729599.258122] RBP: ffff880001a77100 R08: ffff88007ffb0000 R09: ffffffff8100e22f
Oct 7 06:26:54 stage kernel: [5729599.258122] R10: 0000000000000000 R11: 0000000000000246 R12: ffff88007ffa8000
Oct 7 06:26:54 stage kernel: [5729599.258122] R13: ffff8800008e3880 R14: 0000000000000000 R15: 0000000000000000
Oct 7 06:26:54 stage kernel: [5729599.258122] FS: 00007f1e47300720(0000) GS:ffff880003504000(0000) knlGS:0000000000000000
Oct 7 06:26:54 stage kernel: [5729599.258122] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
Oct 7 06:26:54 stage kernel: [5729599.258122] CR2: 000000000065e000 CR3: 0000000001001000 CR4: 0000000000002660
Oct 7 06:26:54 stage kernel: [5729599.258122] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Oct 7 06:26:54 stage kernel: [5729599.258122] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Oct 7 06:26:54 stage kernel: [5729599.258122] Call Trace:
Oct 7 06:26:54 stage kernel: [5729599.258122] [<ffffffff8100dc35>] ? xen_force_evtchn_callback+0x9/0xa
Oct 7 06:26:54 stage kernel: [5729599.258122] [<ffffffff8100e242>] ? check_events+0x12/0x20
Oct 7 06:26:54 stage kernel: [5729599.258122] [<ffffffff8100e22f>] ? xen_restore_fl_direct_end+0x0/0x1
Oct 7 06:26:54 stage kernel: [5729599.258122] [<ffffffff8100e1e9>] ? xen_irq_enable_direct_end+0x0/0x7
Oct 7 06:26:54 stage kernel: [5729599.258122] [<ffffffff8104827b>] ? finish_task_switch+0x44/0xaf
Oct 7 06:26:54 stage kernel: [5729599.258122] [<ffffffff812fb155>] ? thread_return+0x4e/0xe0
Oct 7 06:26:54 stage kernel: [5729599.258122] [<ffffffff8100e242>] ? check_events+0x12/0x20
Oct 7 06:26:54 stage kernel: [5729599.258122] [<ffffffff8106182b>] ? worker_thread+0xcc/0x21d
Oct 7 06:26:54 stage kernel: [5729599.258122] [<ffffffff810c8060>] ? vmstat_update+0x0/0x39
Oct 7 06:26:54 stage kernel: [5729599.258122] [<ffffffff81064f1a>] ? autoremove_wake_function+0x0/0x2e
Oct 7 06:26:54 stage kernel: [5729599.258122] [<ffffffff8106175f>] ? worker_thread+0x0/0x21d
Oct 7 06:26:54 stage kernel: [5729599.258122] [<ffffffff81064c4d>] ? kthread+0x79/0x81
Oct 7 06:26:54 stage kernel: [5729599.258122] [<ffffffff81011baa>] ? child_rip+0xa/0x20
Oct 7 06:26:54 stage kernel: [5729599.258122] [<ffffffff81010d61>] ? int_ret_from_sys_call+0x7/0x1b
Oct 7 06:26:54 stage kernel: [5729599.258122] [<ffffffff8101151d>] ? retint_restore_args+0x5/0x6
Oct 7 06:26:54 stage kernel: [5729599.258122] [<ffffffff8100e22f>] ? xen_restore_fl_direct_end+0x0/0x1
Oct 7 06:26:54 stage kernel: [5729599.258122] [<ffffffff81011ba0>] ? child_rip+0x0/0x20
Oct 7 06:26:56 stage /USR/SBIN/CRON[12186]: (root) CMD (/root/phptest.sh)
Oct 7 06:27:03 stage kernel: [5729625.855236] BUG: soft lockup - CPU#0 stuck for 65s! [watchdog/0:5]
Oct 7 06:27:03 stage kernel: [5729625.855236] Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp ipt_LOG iptable_filter ip_tables x_tables loop snd_pcm snd_timer evdev snd soundcore snd_page_alloc pcspkr ext3 jbd mbcache xen_netfront xen_blkfront
Oct 7 06:27:03 stage kernel: [5729625.855236] CPU 0:
Oct 7 06:27:03 stage kernel: [5729625.855236] Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp ipt_LOG iptable_filter ip_tables x_tables loop snd_pcm snd_timer evdev snd soundcore snd_page_alloc pcspkr ext3 jbd mbcache xen_netfront xen_blkfront
Oct 7 06:27:03 stage kernel: [5729625.855236] Pid: 5, comm: watchdog/0 Not tainted 2.6.32-5-amd64 #1
Oct 7 06:27:03 stage kernel: [5729625.855236] RIP: e030:[<ffffffff8100922a>] [<ffffffff8100922a>] hypercall_page+0x22a/0x1001
Oct 7 06:27:03 stage kernel: [5729625.855236] RSP: e02b:ffff88007ff6fd48 EFLAGS: 00000246
Oct 7 06:27:03 stage kernel: [5729625.855236] RAX: 0000000000040001 RBX: ffff8800034fb780 RCX: ffffffff8100922a
Oct 7 06:27:03 stage kernel: [5729625.855236] RDX: 00000000000116e0 RSI: 0000000000000000 RDI: 0000000000000000
Oct 7 06:27:03 stage kernel: [5729625.855236] RBP: ffff88007e1c4000 R08: ffff88007ff6e000 R09: ffffffff8100e22f
Oct 7 06:27:05 stage kernel: [5729625.855236] R10: ffff880002d23100 R11: 0000000000000246 R12: ffff88007ff53170
Oct 7 06:27:05 stage kernel: [5729625.855236] R13: ffff8800230b9530 R14: 0000000000000000 R15: 0000000000000000
Oct 7 06:27:05 stage kernel: [5729625.855236] FS: 00007f1e47300720(0000) GS:ffff8800034e6000(0000) knlGS:0000000000000000
Oct 7 06:27:05 stage kernel: [5729625.855236] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
Oct 7 06:27:05 stage kernel: [5729625.855236] CR2: 00007f3d34099120 CR3: 0000000001001000 CR4: 0000000000002660
Oct 7 06:27:05 stage kernel: [5729625.855236] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Oct 7 06:27:05 stage kernel: [5729625.855236] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Oct 7 06:27:05 stage kernel: [5729625.855236] Call Trace:
Oct 7 06:27:05 stage kernel: [5729625.855236] [<ffffffff8100dc35>] ? xen_force_evtchn_callback+0x9/0xa
Oct 7 06:27:05 stage kernel: [5729625.855236] [<ffffffff8100e242>] ? check_events+0x12/0x20
Oct 7 06:27:06 stage kernel: [5729625.855236] [<ffffffff8100e22f>] ? xen_restore_fl_direct_end+0x0/0x1
Oct 7 06:27:06 stage kernel: [5729625.855236] [<ffffffff8100e1e9>] ? xen_irq_enable_direct_end+0x0/0x7
Oct 7 06:27:06 stage kernel: [5729625.855236] [<ffffffff8104827b>] ? finish_task_switch+0x44/0xaf
Oct 7 06:27:06 stage kernel: [5729625.855236] [<ffffffff812fb155>] ? thread_return+0x4e/0xe0
Oct 7 06:27:06 stage kernel: [5729625.855236] [<ffffffff8104914d>] ? switched_to_rt+0x0/0x58
Oct 7 06:27:06 stage kernel: [5729625.855236] [<ffffffff8100e22f>] ? xen_restore_fl_direct_end+0x0/0x1
Oct 7 06:27:06 stage kernel: [5729625.855236] [<ffffffff810693bf>] ? cpu_clock+0x28/0x30
Oct 7 06:27:06 stage kernel: [5729625.855236] [<ffffffff8109428f>] ? watchdog+0x0/0x75
Oct 7 06:27:06 stage kernel: [5729625.855236] [<ffffffff810942d3>] ? watchdog+0x44/0x75
Oct 7 06:27:06 stage kernel: [5729625.855236] [<ffffffff8109428f>] ? watchdog+0x0/0x75
Oct 7 06:27:06 stage kernel: [5729625.855236] [<ffffffff81064c4d>] ? kthread+0x79/0x81
Oct 7 06:27:06 stage kernel: [5729625.855236] [<ffffffff81011baa>] ? child_rip+0xa/0x20
Oct 7 06:27:06 stage kernel: [5729625.855236] [<ffffffff81010d61>] ? int_ret_from_sys_call+0x7/0x1b
Oct 7 06:27:06 stage kernel: [5729625.855236] [<ffffffff8101151d>] ? retint_restore_args+0x5/0x6
Oct 7 06:27:06 stage kernel: [5729625.855236] [<ffffffff8100e22f>] ? xen_restore_fl_direct_end+0x0/0x1
Oct 7 06:27:06 stage kernel: [5729625.855236] [<ffffffff81011ba0>] ? child_rip+0x0/0x20
答案1
你的内核没有崩溃。
您正在运行 postfix 吗?据我所知,这是唯一一个有可执行文件“bounce”的程序。
看起来 bounce 正在等待读/写/flock 完成,事实上,查看堆栈,它似乎在 ext3_sync_file() 处进入休眠状态
文件系统是否使用同步选项挂载?虽然将其更改为异步无法解决问题,但在解决底层故障后,它将提高性能。
我首先会检查磁盘是否处于停转/禁用状态,然后尝试对磁盘上的所有分区运行 fsck,如果一切正常,则检查 bounce 正在访问的文件是否存在争用(例如,如果是 Postfix,是否有多个实例正在运行?或者您在发生这种情况时是否正在运行备份?)。如果什么都没显示出来,请尝试将数据文件移动到其他物理磁盘。
(并检查你的内核补丁是否是最新的 - 似乎有关于类似问题的报告 - 虽然我在这个特定版本中没有看到过)