Google Compute Engine 上的“BUG:无法处理内核 NULL 指针取消引用”

Google Compute Engine 上的“BUG:无法处理内核 NULL 指针取消引用”

我经常会看到 GCE 实例冻结并显示以下错误消息(来自串行控制台):

g[1375589.784755] BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
g[1375589.786206] IP: [<ffffffff810a67d9>] check_preempt_wakeup+0xd9/0x1d0
g[1375589.787341] PGD 5da04067 PUD db83067 PMD 0 
g[1375589.788607] Oops: 0000 [#1] SMP 
g[1375589.788705] Modules linked in: veth xt_addrtype xt_conntrack iptable_filter ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 ip_tables x_tables nf_nat nf_conntrack bridge stp llc aufs(C) softdog crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 processor psmouse parport_pc parport i2c_piix4 i2c_core thermal_sys lrw virtio_net evdev pcspkr serio_raw gf128mul glue_helper ablk_helper cryptd button ext4 crc16 mbcache jbd2 sd_mod crc_t10dif crct10dif_common virtio_scsi scsi_mod virtio_pci virtio virtio_ring
g[1375589.788705] CPU: 1 PID: 1515 Comm: docker Tainted: G         C    3.16.0-0.bpo.4-amd64 #1 Debian 3.16.7-ckt9-3~deb8u1~bpo70+1
g[1375589.788705] Hardware name: Google Google, BIOS Google 01/01/2011
g[1375589.788705] task: ffff88006fffc110 ti: ffff880003ac4000 task.ti: ffff880003ac4000
g[1375589.788705] RIP: 0010:[<ffffffff810a67d9>]  [<ffffffff810a67d9>] check_preempt_wakeup+0xd9/0x1d0
g[1375589.788705] RSP: 0018:ffff880003ac7e30  EFLAGS: 00010002
g[1375589.788705] RAX: 0000000000000001 RBX: ffff880073112ec0 RCX: 0000000000000002
g[1375589.788705] RDX: 0000000000000001 RSI: ffff880009156d20 RDI: ffff880073112f38
g[1375589.788705] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
g[1375589.788705] R10: ffffffffffffffe0 R11: 0000000000000000 R12: ffff88006d2dcd00
g[1375589.788705] R13: ffff88006fffc110 R14: 0000000000000000 R15: 0000000000000000
g[1375589.788705] FS:  000000000323a880(0063) GS:ffff880073100000(0000) knlGS:0000000000000000
g[1375589.788705] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
g[1375589.788705] CR2: 0000000000000078 CR3: 0000000034bff000 CR4: 00000000000406e0
g[1375589.788705] Stack:
g[1375589.788705]  0000000000000000 ffffffff00000000 ffff88000000006e ffff880073112ec0
g[1375589.788705]  ffff8800091573a4 0000000000000286 0000000000012ec0 ffff880073112ec0
g[1375589.788705]  0000000000000002 ffffffff8109cef4 ffff880009156d20 ffffffff810a01a4
g[1375589.788705] Call Trace:
g[1375589.788705]  [<ffffffff8109cef4>] ? check_preempt_curr+0x84/0xa0
g[1375589.788705]  [<ffffffff810a01a4>] ? wake_up_new_task+0xf4/0x1b0
g[1375589.788705]  [<ffffffff8118516d>] ? mprotect_fixup+0x15d/0x250
g[1375589.788705]  [<ffffffff8106d10f>] ? do_fork+0xcf/0x340
g[1375589.788705]  [<ffffffff8154b779>] ? stub_clone+0x69/0x90
g[1375589.788705]  [<ffffffff8154b40d>] ? system_call_fast_compare_end+0x10/0x15
g[1375589.788705] Code: 00 00 83 e8 01 4d 8b 64 24 70 39 d0 7f f4 48 8b 7d 78 49 3b 7c 24 78 74 1d 66 0f 1f 84 00 00 00 00 00 48 8b 6d 70 4d 8b 64 24 70 <48> 8b 7d 78 49 3b 7c 24 78 75 ec 48 85 ff 74 e7 e8 f2 f9 ff ff 
g[1375589.788705] RIP  [<ffffffff810a67d9>] check_preempt_wakeup+0xd9/0x1d0
g[1375589.788705]  RSP <ffff880003ac7e30>
g[1375589.788705] CR2: 0000000000000078
g[1375589.788705] ---[ end trace 5fab7713cb2d171f ]---

我能够恢复它们的唯一方法是登录 Web 界面并手动重置它们。不用说,它无法扩展。

我已经尝试设置看门狗设备并进行设置kernel.panic = 10,理论上应该重新启动虚拟机。

对于这些虚拟机,我使用“container-vm”作为操作系统风格(即或多或少预装了 Docker 的 Debian)。

还有谁见过这个吗?

答案1

我没有足够的声誉来发表评论。所以我在这里发表评论。我遇到了同样的问题。我在互联网上查找错误报告,发现几乎每个内核输出都包含do_fork()函数。之后我发现:

http://www.serverphorums.com/read.php?12,1053418

并在此处更新版本:

https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/kernel/sched/core.c?id=ea86cb4b7621e1298a37197005bf0abcc86348d4

我希望它能对某些人有所帮助。

我希望在我的发行版中修复这个问题,但我不知道如何推动发行版人员将此补丁放入默认内核。

相关内容