我正在我的 Dell XPS Core 2 Duo 塔式机上运行 rsync,突然它死机了。这台机器运行的是 Ubuntu 8.04 LTS、3GB RAM 和软件 RAID 5 (mdadm),分布在 3 个磁盘上。系统在第 4 个磁盘上。重启后,我在 /var/log/kern.log 中发现了这个有趣的信息:
Oct 31 02:38:33 myhostname kernel: [617414.584615] Unable to handle kernel NULL pointer dereference at 0000000000000070 RIP:
然后今天早上又发生了这种情况,但日志中有更多信息(见下文)。我想知道是否有人可以解释一下这意味着什么。不幸的是,这台机器现在位于距我 3000 英里的数据中心,因此交换内存会很棘手。
在此先感谢您的任何建议!
Nov 1 01:24:55 myhostname kernel: [34780.996038] Unable to handle kernel NULL pointer dereference at 0000000000000070 RIP:
Nov 1 01:24:55 myhostname kernel: [34780.996050] [<ffffffff80470a60>] _spin_lock+0x0/0x10
Nov 1 01:24:55 myhostname kernel: [34780.996099] PGD bb0b5067 PUD bbc91067 PMD 0
Nov 1 01:24:55 myhostname kernel: [34780.996121] Oops: 0002 [1] SMP
Nov 1 01:24:55 myhostname kernel: [34780.996140] CPU 1
Nov 1 01:24:55 myhostname kernel: [34780.996156] Modules linked in: nfs lockd nfs_acl sunrpc autofs4 iptable_filter ip_tables x_tables ipv6 parport_pc lp parport loop af_packet serio_raw psmouse button dcdbas intel_agp snd_hda_intel shpchp pci_hotplug iTCO_wdt iTCO_vendor_support evdev snd_pcm snd_timer snd_page_alloc snd_hwdep snd soundcore pcspkr ext3 jbd mbcache sg sr_mod cdrom sd_mod 8139too ata_generic pata_acpi usbhid hid ata_piix 8139cp mii libata scsi_mod ehci_hcd uhci_hcd e1000 usbcore raid10 raid456 async_xor async_memcpy async_tx xor raid1 raid0 multipath linear md_mod thermal processor fan fbcon tileblit font bitblit softcursor fuse
Nov 1 01:24:55 myhostname kernel: [34780.996422] Pid: 171, comm: kswapd0 Not tainted 2.6.24-16-server #1
Nov 1 01:24:55 myhostname kernel: [34780.996442] RIP: 0010:[<ffffffff80470a60>] [<ffffffff80470a60>] _spin_lock+0x0/0x10
Nov 1 01:24:55 myhostname kernel: [34780.996474] RSP: 0018:ffff8100b904fd48 EFLAGS: 00010202
Nov 1 01:24:55 myhostname kernel: [34780.996492] RAX: 0000000000000001 RBX: ffff8100167d23c8 RCX: 0000000000000000
Nov 1 01:24:55 myhostname kernel: [34780.996514] RDX: 0000000000000001 RSI: 00000000000000d0 RDI: 0000000000000070
Nov 1 01:24:55 myhostname kernel: [34780.996535] RBP: ffff8100167d2550 R08: 0000000000000000 R09: 0000000000000000
Nov 1 01:24:55 myhostname kernel: [34780.996555] R10: 0000000000000000 R11: ffffffff88232010 R12: 0000000000000028
Nov 1 01:24:55 myhostname kernel: [34780.996576] R13: ffff8100167d24d8 R14: 0000000000000000 R15: 0000000000000000
Nov 1 01:24:55 myhostname kernel: [34780.996597] FS: 0000000000000000(0000) GS:ffff8100bd001700(0000) knlGS:0000000000000000
Nov 1 01:24:55 myhostname kernel: [34780.996628] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Nov 1 01:24:55 myhostname kernel: [34780.996647] CR2: 0000000000000070 CR3: 00000000bbd44000 CR4: 00000000000006e0
Nov 1 01:24:55 myhostname kernel: [34780.996668] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 1 01:24:55 myhostname kernel: [34780.996688] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Nov 1 01:24:55 myhostname kernel: [34780.996710] Process kswapd0 (pid: 171, threadinfo ffff8100b904e000, task ffff8100b90487e0)
Nov 1 01:24:55 myhostname kernel: [34780.996741] Stack: ffffffff802dc5b2 ffff8100167d23c8 0000000000000080 0000000000000028
Nov 1 01:24:55 myhostname kernel: [34780.996779] ffff8100b904fd80 0000000000000028 ffffffff802cb244 ffff8100167d20d8
Nov 1 01:24:55 myhostname kernel: [34780.996815] ffff810092da43d8 00000000001c4cec 0000000000067714 000000000000009b
Nov 1 01:24:55 myhostname kernel: [34780.996839] Call Trace:
Nov 1 01:24:55 myhostname kernel: [34780.996868] [remove_inode_buffers+0x42/0x100] remove_inode_buffers+0x42/0x100
Nov 1 01:24:55 myhostname kernel: [34780.996891] [shrink_icache_memory+0x1f4/0x2a0] shrink_icache_memory+0x1f4/0x2a0
Nov 1 01:24:55 myhostname kernel: [34780.996916] [shrink_slab+0x124/0x180] shrink_slab+0x124/0x180
Nov 1 01:24:55 myhostname kernel: [34780.996939] [kswapd+0x391/0x560] kswapd+0x391/0x560
Nov 1 01:24:55 myhostname kernel: [34780.996965] [<ffffffff80254200>] autoremove_wake_function+0x0/0x30
Nov 1 01:24:55 myhostname kernel: [34780.996989] [kswapd+0x0/0x560] kswapd+0x0/0x560
Nov 1 01:24:55 myhostname kernel: [34780.997009] [kthread+0x4b/0x80] kthread+0x4b/0x80
Nov 1 01:24:55 myhostname kernel: [34780.997029] [child_rip+0xa/0x12] child_rip+0xa/0x12
Nov 1 01:24:55 myhostname kernel: [34780.997053] [kthread+0x0/0x80] kthread+0x0/0x80
Nov 1 01:24:55 myhostname kernel: [34780.997072] [child_rip+0x0/0x12] child_rip+0x0/0x12
Nov 1 01:24:55 myhostname kernel: [34780.997091]
Nov 1 01:24:55 myhostname kernel: [34780.997104]
Nov 1 01:24:55 myhostname kernel: [34780.997105] Code: f0 ff 0f 79 09 f3 90 83 3f 00 7e f9 eb f2 c3 90 f0 81 2f 00
Nov 1 01:24:55 myhostname kernel: [34780.997184] RIP [<ffffffff80470a60>] _spin_lock+0x0/0x10
Nov 1 01:24:55 myhostname kernel: [34780.997205] RSP <ffff8100b904fd48>
Nov 1 01:24:55 myhostname kernel: [34780.997221] CR2: 0000000000000070
Nov 1 01:24:55 myhostname kernel: [34780.997458] ---[ end trace 26a2b00c44abedb6 ]---
答案1
好的,这是一个相当标准的内核错误。这可能是由于“进程 kswapd0”对磁盘做了一些不良操作而导致的。
要检查的事项:1)在所有磁盘上运行 smartctl,检查它们是否在建议的容差范围内运行。
2)查看 dmesg 和 /var/log/messages 并查看是否同时发生了任何不正常的事件。
3) 在 Launchpad 和 ubuntu 论坛中搜索可能导致此问题的原因,或者在 freenode IRC 上的 #ubuntu 上询问一些提示。您可能会被要求提供更多信息,例如 lspci、lsmod 等。
很有可能其他人也遇到过类似的问题。
4)整夜运行 memtest86,看看是否出现任何严重的内存错误。