我们要求服务器提供商增加服务器上的内存(从 4GB 到 6GB)。在他们增加并重新启动服务器后,它开始给出以下消息:
Message from syslogd@www at Jul 1 14:39:04 ...
kernel:[ 30.426516] Oops: 0002 [#1] SMP
Message from syslogd@www at Jul 1 14:39:04 ...
kernel:[ 30.426677] last sysfs file: /sys/devices/virtual/net/lo/operstate
Message from syslogd@www at Jul 1 14:39:04 ...
kernel:[ 30.431797] Stack:
Message from syslogd@www at Jul 1 14:39:04 ...
kernel:[ 30.432892] Call Trace:
Message from syslogd@www at Jul 1 14:39:04 ...
kernel:[ 30.433380] Code: 00 00 55 53 49 8b 6c 24 08 48 89 fb 4c 39 e5 48 0f 44 e8 48 85 ed 74 3d 48 83 bf 00 02 00 00 00 75 09 83 3d 4e 3f 27 00 00 74 2a <80> 0c 25 5c 00 00 00 01 ff 45 54 ff 83 54 05 00 00 48 83 bb 00
Message from syslogd@www at Jul 1 14:39:04 ...
kernel:[ 30.436159] CR2: 000000000000005c
此后我们的 ERP (Odoo) 停止工作。我们的提供商表示,这是 Odoo 无法正常工作的原因,并不是因为他们增加了 RAM,而是在增加 RAM 之前一切都运行良好。
也不知道是否相关,但是在该服务器上 postgresql 被优化并且内核共享内存增加(在 ram 升级之前)- http://www.postgresql.org/docs/9.1/static/kernel-resources.html
它看起来像内存相关的问题吗?对我来说,它看起来是这样,但我们的服务器提供商似乎无法修复它。
PS服务器是Linux debian挤压。内核信息:2.6.32-5-amd64
.服务器运行在虚拟机上
更新 系统日志也显示了这一点(日志的片段):
Jul 1 15:20:56 www snmpd[1288]: /etc/snmp/snmpd.conf: line 146: Error: unknown payload OID
Jul 1 15:20:56 www snmpd[1288]: Unknown payload OID: fileName
Jul 1 15:20:56 www snmpd[1288]: /etc/snmp/snmpd.conf: line 146: Error: Unknown payload OID
Jul 1 15:20:56 www snmpd[1288]: payload OID: fileErrorMsg
Jul 1 15:20:56 www snmpd[1288]: /etc/snmp/snmpd.conf: line 146: Error: unknown payload OID
Jul 1 15:20:56 www snmpd[1288]: Unknown payload OID: fileErrorMsg
Jul 1 15:20:56 www snmpd[1288]: /etc/snmp/snmpd.conf: line 146: Error: Unknown payload OID
Jul 1 15:20:56 www snmpd[1288]: trigger OID: fileErrorFlag
Jul 1 15:20:56 www snmpd[1288]: /etc/snmp/snmpd.conf: line 146: Error: unknown monitor OID
Jul 1 15:20:56 www snmpd[1288]: payload OID: snmperrErrMessage
Jul 1 15:20:56 www snmpd[1288]: /etc/snmp/snmpd.conf: line 146: Error: unknown payload OID
Jul 1 15:20:56 www snmpd[1288]: Unknown payload OID: snmperrErrMessage
Jul 1 15:20:56 www snmpd[1288]: /etc/snmp/snmpd.conf: line 146: Error: Unknown payload OID
Jul 1 15:20:56 www snmpd[1288]: trigger OID: snmperrErrorFlag
Jul 1 15:20:56 www snmpd[1288]: /etc/snmp/snmpd.conf: line 146: Error: unknown monitor OID
Jul 1 15:20:56 www snmpd[1288]: net-snmp: 33 error(s) in config file(s)
更新2
它还显示了这个错误。这可能是根本原因,但想知道为什么会发生以及如何解决它。
Jul 1 15:57:50 www kernel: [ 866.578614] BUG: unable to handle kernel NULL pointer dereference at 000000000000005c
Jul 1 15:57:50 www kernel: [ 866.578788] IP: [<ffffffff8128c528>] tcp_send_fin+0x37/0x1ab
Jul 1 15:57:50 www kernel: [ 866.578894] PGD 1bcee6067 PUD 1bd558067 PMD 0
Jul 1 15:57:50 www kernel: [ 866.579027] Oops: 0002 [#15] SMP
Jul 1 15:57:50 www kernel: [ 866.579129] last sysfs file: /sys/devices/virtual/net/lo/operstate
Jul 1 15:57:50 www kernel: [ 866.579211] CPU 0
Jul 1 15:57:50 www kernel: [ 866.579281] Modules linked in: iptable_filter iptable_mangle ip_tables x_tables loop evdev snd_pcm snd_timer snd soundcore snd_page_alloc pcspkr parport_pc parport psmouse serio_raw container shpchp pci_hotplug processor i2c_piix4 button ac i2c_core ext3 jbd mbcache dm_mod vmw_pvscsi vmxnet3 sg sd_mod sr_mod cdrom ata_generic crc_t10dif ata_piix libata floppy mptspi mptscsih mptbase e1000 scsi_transport_spi scsi_mod thermal thermal_sys [last unloaded: scsi_wait_scan]
Jul 1 15:57:50 www kernel: [ 866.581127] Pid: 2639, comm: python Tainted: G D 2.6.32-5-amd64 #1 VMware Virtual Platform
Jul 1 15:57:50 www kernel: [ 866.581248] RIP: 0010:[<ffffffff8128c528>] [<ffffffff8128c528>] tcp_send_fin+0x37/0x1ab
Jul 1 15:57:50 www kernel: [ 866.581386] RSP: 0018:ffff8801bc937f08 EFLAGS: 00010286
Jul 1 15:57:50 www kernel: [ 866.581463] RAX: 0000000000000000 RBX: ffff8801bcf9d480 RCX: ffff8801bdfac901
Jul 1 15:57:50 www kernel: [ 866.581555] RDX: ffff880006e189d8 RSI: 0000000000000004 RDI: ffff8801bcf9d480
Jul 1 15:57:50 www kernel: [ 866.581648] RBP: ffff8801bae70000 R08: ffff8801badcd030 R09: 0000000000000002
Jul 1 15:57:50 www kernel: [ 866.581740] R10: 0000000000000002 R11: ffff8801bcf9d480 R12: ffff8801bcf9d548
Jul 1 15:57:50 www kernel: [ 866.581832] R13: ffff8801bdd6a3c0 R14: 00000000ffffffff R15: 00000000ffffffff
Jul 1 15:57:50 www kernel: [ 866.581924] FS: 00007fca17123700(0000) GS:ffff880006e00000(0000) knlGS:0000000000000000
Jul 1 15:57:50 www kernel: [ 866.582037] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 1 15:57:50 www kernel: [ 866.582124] CR2: 000000000000005c CR3: 00000001bd570000 CR4: 00000000000406f0
Jul 1 15:57:50 www kernel: [ 866.582234] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jul 1 15:57:50 www kernel: [ 866.582329] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jul 1 15:57:50 www kernel: [ 866.582421] Process python (pid: 2639, threadinfo ffff8801bc936000, task ffff8801bd510e20)
Jul 1 15:57:50 www kernel: [ 866.582537] Stack:
Jul 1 15:57:50 www kernel: [ 866.582588] ffff8801bcf9d480 0000000000000000 0000000000000002 ffffffff8129b6af
Jul 1 15:57:50 www kernel: [ 866.582757] <0> 000000000000002c ffff8801bdd6a3c0 0000000000000001 0000000000000001
Jul 1 15:57:50 www kernel: [ 866.583032] <0> 00007fca33f36829 ffffffff812423e7 00007fca171236a8 00007fca246bbc20
Jul 1 15:57:50 www kernel: [ 866.583291] Call Trace:
Jul 1 15:57:50 www kernel: [ 866.583350] [<ffffffff8129b6af>] ? inet_shutdown+0x97/0xdd
Jul 1 15:57:50 www kernel: [ 866.583431] [<ffffffff812423e7>] ? sys_shutdown+0x3d/0x5d
Jul 1 15:57:50 www kernel: [ 866.583512] [<ffffffff81010b22>] ? system_call_fastpath+0x16/0x1b
Jul 1 15:57:50 www kernel: [ 866.583596] Code: 00 00 55 53 49 8b 6c 24 08 48 89 fb 4c 39 e5 48 0f 44 e8 48 85 ed 74 3d 48 83 bf 00 02 00 00 00 75 09 83 3d 4e 3f 27 00 00 74 2a <80> 0c 25 5c 00 00 00 01 ff 45 54 ff 83 54 05 00 00 48 83 bb 00
Jul 1 15:57:50 www kernel: [ 866.585279] RIP [<ffffffff8128c528>] tcp_send_fin+0x37/0x1ab
Jul 1 15:57:50 www kernel: [ 866.585400] RSP <ffff8801bc937f08>
Jul 1 15:57:50 www kernel: [ 866.585468] CR2: 000000000000005c
Jul 1 15:57:50 www kernel: [ 866.585716] ---[ end trace 29537c3dcdc7a93f ]---
答案1
这是一个内核错误,Debian 错误 #789037又名上游错误#99161。它是在最近的内核更新中引入的,您可能会在重新启动(安装 RAM)后看到它,因为您现在正在运行该内核。
该修复已经可用;您需要安装它(并重新启动)。
(顺便说一句:由于断路器熔断而导致意外重启后,我在我们的一些服务器上看到了这一点......Google 的关键部分是“tcp_send_fin+0x37”。)