保罗

保罗

希望有人能帮助解释这里发生的事情:

[ 2081.280253] BUG: unable to handle kernel paging request at ffff8801ad287000
[ 2081.280262] IP: [<ffffffff8000f549>] __sanitize_i387_state+0x29/0x120
[ 2081.280272] PGD 1e30067 PUD 39ab067 PMD 3b15067 PTE 0
[ 2081.280277] Oops: 0000 [#4] SMP
[ 2081.280281] last sysfs file: /sys/devices/xen-backend/vbd-5-51715/uevent
[ 2081.280285] CPU 1
[ 2081.280286] Modules linked in: tun md5 ip6table_filter ip6_tables iptable_filter         ip_tables x_tables usbbk gntdev netbk blkbk blkback_pagemap blktap xenbus_be evtchn nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs bridge stp llc edd sbs sbshc max6650 lm75 coretemp domctl snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device adm1021 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx dm_mod snd_hda_codec_hdmi 8250_pci snd_hda_codec_realtek snd_hda_intel snd_hda_codec ir_lirc_codec lirc_dev ir_sony_decoder ir_jvc_decoder snd_hwdep ir_rc6_decoder ir_rc5_decoder rc_rc6_mce sg ir_nec_decoder nouveau ttm tpm_tis tpm mceusb ir_core i2c_i801 e1000e snd_pcm pcspkr tpm_bios iTCO_wdt iTCO_vendor_support snd_timer 8250 serial_core snd soundcore snd_page_alloc ext4 jbd2 crc16 drm_kms_helper drm i2c_algo_bit i2c_core video output ehci_hcd usbcore button xenblk cdrom xennet fan processor thermal thermal_sys hwmon ata_generic
[ 2081.280350]
[ 2081.280354] Pid: 6623, comm: block Tainted: G      D     2.6.37.6-0.5-xen #1                  /DQ67OW
[ 2081.280359] RIP: e030:[<ffffffff8000f549>]  [<ffffffff8000f549>] __sanitize_i387_state+0x29/0x120
[ 2081.280365] RSP: e02b:ffff88006bb0dd98  EFLAGS: 00010246
[ 2081.280368] RAX: 0000000000000000 RBX: ffff8801ad286e00 RCX: ffff88006bb0dfd8
[ 2081.280371] RDX: ffff88006bae4440 RSI: 0000000000000200 RDI: ffff88006bae4440
[ 2081.280375] RBP: ffff88006bae4440 R08: ffff88006bb0df58 R09: 0000000000000000
[ 2081.280378] R10: 0000000000000000 R11: 00000000ffffffff R12: 0000000000000011
[ 2081.280381] R13: ffff88006bb0df58 R14: 00007fffc379b800 R15: 00007fffc379b638
[ 2081.280388] FS:  00007f89c8b00700(0000) GS:ffff8801e651d000(0000) knlGS:0000000000000000
[ 2081.280391] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2081.280394] CR2: ffff8801ad287000 CR3: 000000006bb10000 CR4: 0000000000002660
[ 2081.280398] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2081.280408] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 2081.280412] Process block (pid: 6623, threadinfo ffff88006bb0c000, task ffff88006bae4440)
[ 2081.280415] Stack:
[ 2081.280417]  00007fffc379b800 ffff88006bae4440 0000000000000011 ffffffff8000f90a
[ 2081.280422]  ffff88006bb0dee8 ffff88006bae4998 0000000000000011 ffffffff80006a22
[ 2081.280426]  ffff8801d88d65c0 ffff88006bae4440 ffff88006bb0de68 000000116bae4440
[ 2081.280431] Call Trace:
[ 2081.280438]  [<ffffffff8000f90a>] save_i387_xstate+0x1aa/0x210
[ 2081.280444]  [<ffffffff80006a22>] __setup_rt_frame+0x2f2/0x370
[ 2081.280449]  [<ffffffff80006dd1>] handle_signal+0x201/0x2b0
[ 2081.280454]  [<ffffffff80006f09>] do_signal+0x89/0x1b0
[ 2081.280459]  [<ffffffff800070b5>] do_notify_resume+0x65/0x90
[ 2081.280464]  [<ffffffff8000770e>] int_signal+0x12/0x17
[ 2081.280471]  [<00007f89c7fb1090>] 0x7f89c7fb1090
[ 2081.280474] Code: 00 00 41 54 55 53 48 8b 9f 10 05 00 00 48 85 db 0f 84 9c 00 00 00 48 8b 47 08 f6 40 14 01 0f 85 ef 00 00 00 48 8b 05 37 55 89 00 <48> 8b ab 00 02 00 00 48 89 c2 48 21 ea 48 39 d0 74 75 48 89 e8
[ 2081.280499] RIP  [<ffffffff8000f549>] __sanitize_i387_state+0x29/0x120
[ 2081.280504]  RSP <ffff88006bb0dd98>
[ 2081.280506] CR2: ffff8801ad287000
[ 2081.284005] ---[ end trace 56e37f97ef72fda4 ]---

这是在具有 8GB RAM 的 i2500 上运行 opensuse 11.4、内核 2.6.37.6-0.5-xen 的新服务器版本。

我尝试了几个不同的内核(恰巧通过 zypper 进行了更新),我分别尝试了两根 RAM(4GB)并交换了它们的位置。主板 DQ67OW 有集成显卡,我尝试了独立显卡,以防集成显卡占用了内核不知道的内存。任何 CPU 核心都可能出现错误。

它似乎没有被任何特定的活动触发 - 我正在运行 mdadm raid5,并且通常是“阻止”过程触发了 oops,但是 bash 和 udevd 也触发了它。

似乎如果某个足够关键的进程出现了问题,整个服务器就会挂起,大写锁定和滚动锁定灯会闪烁。

处理器、主板和 RAM 都是新的。我预计这是由硬件故障或驱动程序错误引起的。也许是网卡驱动程序……?

任何关于如何缩小罪魁祸首的建议都将非常有用。

干杯,

保罗

后续追踪:

[17836.273843] BUG: unable to handle kernel paging request at ffff8801ad287000
[17836.273853] IP: [<ffffffff8000f549>] __sanitize_i387_state+0x29/0x120
[17836.273863] PGD 1e30067 PUD 39ab067 PMD 3b15067 PTE 0
[17836.273868] Oops: 0000 [#6] SMP
[17836.273871] last sysfs file: /sys/devices/xen-backend/vbd-6-51715/statistics/wr_sect
[17836.273875] CPU 1
[17836.273876] Modules linked in: usb_storage uas tun md5 ip6table_filter ip6_tables iptable_filter ip_tables x_tables usbbk gntdev netbk blkbk blkback_pagemap blktap xenbus_be evtchn nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs bridge stp llc edd sbs sbshc max6650 lm75 coretemp domctl snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device adm1021 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx dm_mod snd_hda_codec_hdmi 8250_pci snd_hda_codec_realtek snd_hda_intel snd_hda_codec ir_lirc_codec lirc_dev ir_sony_decoder ir_jvc_decoder snd_hwdep ir_rc6_decoder ir_rc5_decoder rc_rc6_mce sg ir_nec_decoder nouveau ttm tpm_tis tpm mceusb ir_core i2c_i801 e1000e snd_pcm pcspkr tpm_bios iTCO_wdt iTCO_vendor_support snd_timer 8250 serial_core snd soundcore snd_page_alloc ext4 jbd2 crc16 drm_kms_helper drm i2c_algo_bit i2c_core video output ehci_hcd usbcore button xenblk cdrom xennet fan processor thermal thermal_sys hwmon ata_generic
[17836.273940]
[17836.273943] Pid: 9479, comm: bash Tainted: G      D     2.6.37.6-0.5-xen #1                  /DQ67OW
[17836.273949] RIP: e030:[<ffffffff8000f549>]  [<ffffffff8000f549>] __sanitize_i387_state+0x29/0x120
[17836.273954] RSP: e02b:ffff88002afebd98  EFLAGS: 00010246
[17836.273957] RAX: 0000000000000000 RBX: ffff8801ad286e00 RCX: ffff88002afebfd8
[17836.273960] RDX: ffff88002ad62800 RSI: 0000000000000200 RDI: ffff88002ad62800
[17836.273964] RBP: ffff88002ad62800 R08: ffff88002afebf58 R09: 0000000000000000
[17836.273967] R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000011
[17836.273970] R13: ffff88002afebf58 R14: 00007fff522ce400 R15: 00007fff522ce238
[17836.273976] FS:  00007f5908ab2700(0000) GS:ffff8801e651d000(0000) knlGS:0000000000000000
[17836.273979] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[17836.273982] CR2: ffff8801ad287000 CR3: 00000000fa6a2000 CR4: 0000000000002660
[17836.273986] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[17836.273989] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[17836.273993] Process bash (pid: 9479, threadinfo ffff88002afea000, task ffff88002ad62800)
[17836.273996] Stack:
[17836.273998]  00007fff522ce400 ffff88002ad62800 0000000000000011 ffffffff8000f90a
[17836.274003]  ffff88002afebee8 ffff88002ad62d58 0000000000000011 ffffffff80006a22
[17836.274007]  ffff8801d91c4e80 ffff88002ad62800 ffff88002afebe68 000000112ad62800
[17836.274011] Call Trace:
[17836.274019]  [<ffffffff8000f90a>] save_i387_xstate+0x1aa/0x210
[17836.274025]  [<ffffffff80006a22>] __setup_rt_frame+0x2f2/0x370
[17836.274030]  [<ffffffff80006dd1>] handle_signal+0x201/0x2b0
[17836.274035]  [<ffffffff80006f09>] do_signal+0x89/0x1b0
[17836.274040]  [<ffffffff800070b5>] do_notify_resume+0x65/0x90
[17836.274046]  [<ffffffff8000770e>] int_signal+0x12/0x17
[17836.274052]  [<00007f5907ecfd80>] 0x7f5907ecfd80
[17836.274055] Code: 00 00 41 54 55 53 48 8b 9f 10 05 00 00 48 85 db 0f 84 9c 00 00 00 48 8b 47 08 f6 40 14 01 0f 85 ef 00 00 00 48 8b 05 37 55 89 00 <48> 8b ab 00 02 00 00 48 89 c2 48 21 ea 48 39 d0 74 75 48 89 e8
[17836.274081] RIP  [<ffffffff8000f549>] __sanitize_i387_state+0x29/0x120
[17836.274085]  RSP <ffff88002afebd98>
[17836.274088] CR2: ffff8801ad287000
[17836.274091] ---[ end trace 56e37f97ef72fda6 ]---

答案1

这通常是由于内存不足造成的,但正如您所说,也可能是由于软件错误造成的。(这相当于内核空间中的段错误。)

运行 memtest 一整夜。安装包后,它应该会显示为启动选项。

如果这没有发现任何问题,那么很可能是软件问题。比较不同的崩溃日志,看看第一行报告的地址或中途给出的调用跟踪是否有任何共同点。如果它们都非常相似,那么很可能是软件错误。将此报告为发行版的内核错误,看看您能得到什么帮助。

答案2

虽然我运行 memtest 的时间并不长,但我对 openSUSE 安装产生了怀疑。这是一个全新安装,但我的直觉是内核问题或类似问题。

因此我将 Debian 安装到不同的分区,并启动了我的虚拟机和其他一切,从此再没有出现任何故障。

我认为最有可能的原因是 Debian Xen 内核是 2.6.32,而 Opensuse 是 2.6.37。这可能是内核中的错误,或者只是配置不兼容。

等我有时间了,我会比较一下 .configs。它已经运行了几天,平均每小时都会收到一次错误,现在我不再收到错误了……

相关内容