我有一台服务器,它总是崩溃。我知道服务器崩溃的原因有很多。但如果原因是系统在崩溃前内存不足,我该如何确认这是原因?我应该查看哪些日志文件?我应该查找哪些行/错误消息?我正在运行 CentOS。大量使用 php 解析 xml 文件,最多超过 2 GB。该服务器有 16GB 内存。
编辑1
[root@61540 ~]# free -m
total used free shared buffers cached
Mem: 16035 1526 14509 0 40 1002
-/+ buffers/cache: 483 15552
Swap: 8197 0 8197
编辑 2 /var/log/messages
Feb 17 20:38:26 61540 syslogd 1.4.1: restart.
Feb 17 20:38:26 61540 proftpd[3896]: 66.90.101.85 - received SIGHUP -- master server reparsing configuration file
Feb 17 22:23:06 61540 avahi-daemon[3984]: recvmsg(): Resource temporarily unavailable
Feb 17 23:07:37 61540 proftpd[10620] - (Several lines of ftp session)
Feb 18 23:03:48 61540 syslogd 1.4.1: restart.
Feb 18 23:03:48 61540 kernel: klogd 1.4.1, log source = /proc/kmsg started.
Feb 18 23:03:48 61540 kernel: Linux version 2.6.18-308.el5 ([email protected]) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-52)) #1 SMP Tue Feb 21 20:06:06 EST 2012
Feb 18 23:03:48 61540 kernel: Command line: ro root=LABEL=/
Feb 18 23:03:48 61540 kernel: BIOS-provided physical RAM map:
Feb 18 23:03:48 61540 kernel: BIOS-e820: 0000000000010000 - 000000000009a000 (usable)
Feb 18 23:03:48 61540 kernel: BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved)
Feb 18 23:03:48 61540 kernel: BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
Feb 18 23:03:48 61540 kernel: BIOS-e820: 0000000000100000 - 00000000cfda0000 (usable)
Feb 18 23:03:48 61540 kernel: BIOS-e820: 00000000cfda0000 - 00000000cfdd1000 (ACPI NVS)
Feb 18 23:03:48 61540 kernel: BIOS-e820: 00000000cfdd1000 - 00000000cfe00000 (ACPI data)
Feb 18 23:03:48 61540 kernel: BIOS-e820: 00000000cfe00000 - 00000000cff00000 (reserved)
Feb 18 23:03:48 61540 kernel: BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
Feb 18 23:03:48 61540 kernel: BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
Feb 18 23:03:48 61540 kernel: BIOS-e820: 0000000100000000 - 000000042f000000 (usable)
Feb 18 23:03:48 61540 kernel: DMI 2.4 present.
Feb 18 23:03:48 61540 kernel: No NUMA configuration found
Feb 18 23:03:48 61540 kernel: Faking a node at 0000000000000000-000000042f000000
Feb 18 23:03:48 61540 kernel: Bootmem setup node 0 0000000000000000-000000042f000000
Feb 18 23:03:48 61540 kernel: Memory for crash kernel (0x0 to 0x0) notwithin permissible range
Feb 18 23:03:48 61540 kernel: disabling kdump
Feb 18 23:03:48 61540 kernel: ACPI: PM-Timer IO Port: 0x808
Feb 18 23:03:48 61540 kernel: ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Feb 18 23:03:48 61540 kernel: Processor #0 5:1 APIC version 16
Feb 18 23:03:48 61540 kernel: ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
Feb 18 23:03:48 61540 kernel: Processor #1 5:1 APIC version 16
Feb 18 23:03:48 61540 kernel: ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
Feb 18 23:03:48 61540 kernel: Processor #2 5:1 APIC version 16
Feb 18 23:03:48 61540 kernel: ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled)
Feb 18 23:03:48 61540 kernel: Processor #3 5:1 APIC version 16
Feb 18 23:03:48 61540 kernel: ACPI: LAPIC (acpi_id[0x04] lapic_id[0x04] enabled)
Feb 18 23:03:48 61540 kernel: Processor #4 5:1 APIC version 16
Feb 18 23:03:48 61540 kernel: ACPI: LAPIC (acpi_id[0x05] lapic_id[0x05] enabled)
Feb 18 23:03:48 61540 kernel: Processor #5 5:1 APIC version 16
Feb 18 23:03:48 61540 kernel: ACPI: LAPIC (acpi_id[0x06] lapic_id[0x06] enabled)
Feb 18 23:03:48 61540 kernel: Processor #6 5:1 APIC version 16
Feb 18 23:03:48 61540 kernel: ACPI: LAPIC (acpi_id[0x07] lapic_id[0x07] enabled)
Feb 18 23:03:48 61540 kernel: Processor #7 5:1 APIC version 16
答案1
答案2
如果 Linux 内存不足,它通常会启动 OOM killer(内存不足)。这是一个内核进程,它会终止其他进程以释放内存。如果发生这种情况,您在输入时应该会看到相应的日志dmesg
。
试试这个:dmesg | grep -i oom
。如果没有输出,OOM 杀手可能没有杀死你的进程。