我的服务器每天凌晨 00:50 都会死机。我不知道这是怎么回事。在日志文件中,我发现了这样一个可疑列表,但我不知道这意味着什么。有人能帮我吗?
kernel: php5-cgi invoked oom-killer: gfp_mask=0x84d0, order=0, oom_adj=0, oom_score_adj=0
kernel: php5-cgi cpuset=/ mems_allowed=0
kernel: Pid: 20316, comm: php5-cgi Not tainted 2.6.38.2-xxxx-std-ipv6-64 #2
kernel: Call Trace:
kernel: [<ffffffff810de9e8>] ? dump_header+0x88/0x1d0
kernel: [<ffffffff810aa6e3>] ? ktime_get_ts+0xb3/0xe0
kernel: [<ffffffff810de931>] ? oom_unkillable_task+0x91/0xc0
kernel: [<ffffffff8150c7e5>] ? ___ratelimit+0xa5/0x120
kernel: [<ffffffff810def0c>] ? oom_kill_process+0x8c/0x2e0
kernel: [<ffffffff810dedd3>] ? select_bad_process+0x93/0x140
kernel: [<ffffffff810df398>] ? out_of_memory+0x238/0x3e0
kernel: [<ffffffff810e45bd>] ? __alloc_pages_nodemask+0x86d/0x8a0
kernel: [<ffffffff8110ff4a>] ? alloc_pages_current+0xaa/0x120
kernel: [<ffffffff81069d46>] ? pte_alloc_one+0x16/0x40
kernel: [<ffffffff810fa159>] ? __pte_alloc+0x29/0xd0
kernel: [<ffffffff810fa363>] ? handle_mm_fault+0x163/0x200
kernel: [<ffffffff81066077>] ? do_page_fault+0x197/0x410
kernel: [<ffffffff81100556>] ? do_brk+0x286/0x390
kernel: [<ffffffff81a7419f>] ? page_fault+0x1f/0x30
kernel: Mem-Info:
kernel: Node 0 DMA per-cpu:
kernel: CPU 0: hi: 0, btch: 1 usd: 0
kernel: Node 0 DMA32 per-cpu:
kernel: CPU 0: hi: 186, btch: 31 usd: 156
kernel: active_anon:468626 inactive_anon:383 isolated_anon:0
kernel: active_file:66 inactive_file:101 isolated_file:64
kernel: unevictable:0 dirty:0 writeback:0 unstable:0
kernel: free:3426 slab_reclaimable:1691 slab_unreclaimable:13557
kernel: mapped:380 shmem:404 pagetables:10150 bounce:0
kernel: Node 0 DMA free:7932kB min:44kB low:52kB high:64kB active_anon:7056kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15684kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:388kB kernel_stack:16kB pagetables:488kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
kernel: lowmem_reserve[]: 0 1967 1967 1967
kernel: Node 0 DMA32 free:5772kB min:5648kB low:7060kB high:8472kB active_anon:1867448kB inactive_anon:1532kB active_file:264kB inactive_file:404kB unevictable:0kB isolated(anon):0kB isolated(file):256kB present:2014316kB mlocked:0kB dirty:0kB writeback:0kB mapped:1520kB shmem:1616kB slab_reclaimable:6764kB slab_unreclaimable:53840kB kernel_stack:1912kB pagetables:40112kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1423 all_unreclaimable? no
kernel: lowmem_reserve[]: 0 0 0 0
kernel: Node 0 DMA: 45*4kB 56*8kB 29*16kB 8*32kB 7*64kB 4*128kB 2*256kB 0*512kB 1*1024kB 0*2048kB 1*4096kB = 7940kB
kernel: Node 0 DMA32: 149*4kB 1*8kB 1*16kB 1*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB 0*2048kB 1*4096kB = 5772kB
kernel: 646 total pagecache pages
kernel: 0 pages in swap cache
kernel: Swap cache stats: add 0, delete 0, find 0/0
kernel: Free swap = 0kB
kernel: Total swap = 0kB
kernel: 515824 pages RAM
kernel: 12393 pages reserved
kernel: 310091 pages shared
kernel: 436769 pages non-shared
kernel: [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name
kernel: [ 1562] 0 1562 4238 40 0 0 0 upstart-udev-br
kernel: [ 1564] 0 1564 4230 74 0 -17 -1000 udevd
kernel: [ 1656] 0 1656 4229 75 0 -17 -1000 udevd
kernel: [ 1661] 0 1661 4229 73 0 -17 -1000 udevd
kernel: [ 2620] 0 2620 12328 141 0 -17 -1000 sshd
kernel: [ 2631] 101 2631 13713 221 0 0 0 rsyslogd
kernel: [ 2648] 0 2648 1532 30 0 0 0 getty
kernel: [ 2652] 0 2652 1532 29 0 0 0 getty
kernel: [ 2655] 0 2655 1532 29 0 0 0 getty
kernel: [ 2656] 0 2656 1532 29 0 0 0 getty
kernel: [ 2659] 0 2659 1532 30 0 0 0 getty
kernel: [ 2666] 0 2666 5281 65 0 0 0 cron
kernel: [ 2701] 102 2701 32660 5339 0 0 0 named
kernel: [ 2708] 104 2708 63627 11103 0 0 0 mysqld
kernel: [ 2723] 0 2723 3482 38 0 0 0 couriertcpd
kernel: [ 2725] 0 2725 980 17 0 0 0 courierlogger
kernel: [ 2735] 0 2735 3482 38 0 0 0 couriertcpd
kernel: [ 2737] 0 2737 980 17 0 0 0 courierlogger
kernel: [ 2744] 0 2744 3482 41 0 0 0 couriertcpd
kernel: [ 2747] 0 2747 1013 26 0 0 0 courierlogger
kernel: [ 2754] 0 2754 3482 38 0 0 0 couriertcpd
kernel: [ 2757] 0 2757 980 18 0 0 0 courierlogger
kernel: [ 3313] 65534 3313 15729 79 0 0 0 memcached
kernel: [ 3383] 1002 3383 12937 1590 0 0 0 sw-cp-serverd
kernel: [ 3393] 0 3393 4894 58 0 0 0 xinetd
kernel: [ 3535] 2522 3535 1027 28 0 0 0 qmail-send
kernel: [ 3536] 2022 3536 1015 26 0 0 0 splogger
kernel: [ 3537] 0 3537 1025 33 0 0 0 qmail-lspawn
kernel: [ 3538] 2521 3538 1025 17 0 0 0 qmail-rspawn
kernel: [ 3539] 2520 3539 1014 22 0 0 0 qmail-clean
kernel: [ 3621] 0 3621 68797 3424 0 0 0 apache2
kernel: [ 3622] 0 3622 40565 1745 0 0 0 apache2
kernel: [ 3922] 106 3922 40087 38746 0 0 0 drwebd.real
kernel: [ 3985] 0 3985 3163 37 0 0 0 mdadm
kernel: [ 4024] 0 4024 1532 30 0 0 0 getty
kernel: [24625] 0 24625 28299 11913 0 0 0 spamd
kernel: [24626] 110 24626 28299 11912 0 0 0 spamd
kernel: [24628] 110 24628 28299 11912 0 0 0 spamd
kernel: [12008] 33 12008 68960 3226 0 0 0 apache2
kernel: [12016] 33 12016 68946 3232 0 0 0 apache2
kernel: [12568] 33 12568 68952 3229 0 0 0 apache2
kernel: [13362] 33 13362 68933 3220 0 0 0 apache2
kernel: [16894] 33 16894 68946 3204 0 0 0 apache2
kernel: [16895] 33 16895 68902 3189 0 0 0 apache2
kernel: [18991] 106 18991 40087 38745 0 0 0 drwebd.real
kernel: [18992] 106 18992 40087 38745 0 0 0 drwebd.real
kernel: [18993] 106 18993 40087 38745 0 0 0 drwebd.real
kernel: [18994] 106 18994 40087 38745 0 0 0 drwebd.real
kernel: [19165] 33 19165 68995 3216 0 0 0 apache2
kernel: [19178] 33 19178 68947 3225 0 0 0 apache2
kernel: [19918] 33 19918 68961 3218 0 0 0 apache2
如果您需要更多信息,请写信给我。
多马内尼
答案1
这里报告了一个解决方案:http://www.hskupin.info/2010/06/17/how-to-fix-the-oom-killer-crashe-under-linux/
那么发生了什么?原因可以简单解释一下:Linux 内核总是喜欢在应用程序请求时分配内存。默认情况下,它不会真正检查是否有足够的内存可用。鉴于这种行为,应用程序可以分配更多内存。在某些时候,它肯定会导致内存不足的情况。因此,将调用 OOM 终止程序并终止该进程:
Jun 11 11:35:21 vsrv03 kernel: [378878.356858] php-cgi invoked oom-killer: gfp_mask=0x1280d2, order=0, oomkilladj=0 Jun 11 11:36:11 vsrv03 kernel: [378878.356880] Pid: 8490, comm: php-cgi Not tainted 2.6.26-2-xen-amd64 #1
此操作的缺点是所有其他正在运行的进程也会受到影响。结果整个虚拟机无法运行,需要重新启动。
解决此问题内核的行为必须改变,这样它就不会再为应用程序请求过度分配内存。最后,我已将上述值包含在 /etc/sysctl.conf 文件中,这样它们就会在启动时自动应用:
vm.overcommit_memory = 2 vm.overcommit_ratio = 80
(重新启动以应用更改。)
有关过度承诺的更多信息:http://www.win.tue.nl/~aeb/linux/lk/lk-9.html#ss9.6
答案2
发生的事情很简单,但原因却很简单(需要更多信息)。
php5-cgi
此时开始使用大量内存(可能是内存泄漏或副作用),以至于系统内存不足。因此内核会将其杀死(oom-killer
是内核的内存不足杀手)以保持系统稳定性。
这看起来像是 VPS —— 是吗?什么类型的?在具有足够(1 GB+)RAM 和交换空间(至少 2x RAM)的物理机上,OOM 错误通常很少发生。