我安装了 CentOS 6.2 64 位,已经运行了 44 天。它突然崩溃了,所以我登录 KVM 并检查 - 我设法截取了此屏幕截图。
!http://picpaste.com/1-cgYdKDAy.png(我是新手,所以无法在这里上传图片)
知道是什么原因造成的吗?我要求数据中心硬重启服务器,现在一切正常,我可以登录 ssh 了。我应该检查什么日志?
更新
以下是从 /var/log/message 请求的日志:
Jun 28 12:24:27 la-noc lfd[13058]: cannot chdir to /etc/csf from /tmp/.wapi: Permission denied, aborting. at /usr/sbin/lfd line 4790.
Jun 28 12:24:27 la-noc lfd[13058]: daemon stopped
Jun 28 12:25:55 la-noc proftpd[12732]: 96.44.184.123 (115.133.56.39[115.133.56.39]) - Client session idle timeout, disconnected
Jun 28 12:25:55 la-noc proftpd[12732]: 96.44.184.123 (115.133.56.39[115.133.56.39]) - FTP session closed.
Jun 28 12:26:28 la-noc lfd[13114]: cannot chdir to /etc/csf from /tmp/.wapi: Permission denied, aborting. at /usr/sbin/lfd line 4790.
Jun 28 12:26:28 la-noc lfd[13114]: daemon stopped
Jun 28 12:26:42 la-noc lfd[13125]: DynDNS - update IP addresses
Jun 28 12:28:06 la-noc proftpd[13188]: 96.44.184.123 (127.0.0.1[127.0.0.1]) - FTP session opened.
Jun 28 12:28:06 la-noc proftpd[13188]: 96.44.184.123 (127.0.0.1[127.0.0.1]) - FTP session closed.
Jun 28 12:28:28 la-noc lfd[13204]: cannot chdir to /etc/csf from /tmp/.wapi: Permission denied, aborting. at /usr/sbin/lfd line 4790.
Jun 28 12:28:28 la-noc lfd[13204]: daemon stopped
Jun 28 12:28:55 la-noc kernel: Firewall: *TCP_IN Blocked* IN=eth0 OUT= MAC=00:e0:81:43:95:42:00:04:80:5c:17:25:08:00 SRC=79.169.210.214 DST=96.44.184.126 LEN=60 TOS=0x$
Jun 28 12:28:58 la-noc kernel: Firewall: *TCP_IN Blocked* IN=eth0 OUT= MAC=00:e0:81:43:95:42:00:04:80:5c:17:25:08:00 SRC=79.169.210.214 DST=96.44.184.126 LEN=60 TOS=0x$
Jun 28 12:30:29 la-noc lfd[13291]: cannot chdir to /etc/csf from /tmp/.wapi: Permission denied, aborting. at /usr/sbin/lfd line 4790.
Jun 28 12:30:29 la-noc lfd[13291]: daemon stopped
Jun 28 12:31:43 la-noc lfd[13332]: DynDNS - update IP addresses
Jun 28 12:32:29 la-noc lfd[13363]: cannot chdir to /etc/csf from /tmp/.wapi: Permission denied, aborting. at /usr/sbin/lfd line 4790.
Jun 28 12:32:29 la-noc lfd[13363]: daemon stopped
Jun 28 12:34:02 la-noc proftpd[13415]: 96.44.184.123 (127.0.0.1[127.0.0.1]) - FTP session opened.
Jun 28 12:34:02 la-noc proftpd[13415]: 96.44.184.123 (127.0.0.1[127.0.0.1]) - FTP session closed.
Jun 28 12:34:29 la-noc lfd[13434]: cannot chdir to /etc/csf from /tmp/.wapi: Permission denied, aborting. at /usr/sbin/lfd line 4790.
Jun 28 12:34:29 la-noc lfd[13434]: daemon stopped
Jun 28 12:36:29 la-noc lfd[13493]: cannot chdir to /etc/csf from /tmp/.wapi: Permission denied, aborting. at /usr/sbin/lfd line 4790.
Jun 28 12:36:29 la-noc lfd[13493]: daemon stopped
Jun 28 12:36:44 la-noc lfd[13506]: DynDNS - update IP addresses
Jun 28 12:38:29 la-noc lfd[13555]: cannot chdir to /etc/csf from /tmp/.wapi: Permission denied, aborting. at /usr/sbin/lfd line 4790.
Jun 28 12:38:29 la-noc lfd[13555]: daemon stopped
Jun 28 12:39:03 la-noc proftpd[13600]: 96.44.184.123 (127.0.0.1[127.0.0.1]) - FTP session opened.
Jun 28 12:39:03 la-noc proftpd[13600]: 96.44.184.123 (127.0.0.1[127.0.0.1]) - FTP session closed.
Jun 28 12:40:29 la-noc lfd[13648]: cannot chdir to /etc/csf from /tmp/.wapi: Permission denied, aborting. at /usr/sbin/lfd line 4790.
Jun 28 12:40:29 la-noc lfd[13648]: daemon stopped
Jun 28 12:41:44 la-noc lfd[13680]: DynDNS - update IP addresses
Jun 28 12:42:29 la-noc lfd[13712]: cannot chdir to /etc/csf from /tmp/.wapi: Permission denied, aborting. at /usr/sbin/lfd line 4790.
Jun 28 12:42:29 la-noc lfd[13712]: daemon stopped
Jun 28 12:44:29 la-noc lfd[13771]: cannot chdir to /etc/csf from /tmp/.wapi: Permission denied, aborting. at /usr/sbin/lfd line 4790.
Jun 28 12:44:29 la-noc lfd[13771]: daemon stopped
Jun 28 12:44:30 la-noc proftpd[13781]: 96.44.184.123 (127.0.0.1[127.0.0.1]) - FTP session opened.
Jun 28 12:44:30 la-noc proftpd[13781]: 96.44.184.123 (127.0.0.1[127.0.0.1]) - FTP session closed.
Jun 28 15:56:26 la-noc kernel: imklog 4.6.2, log source = /proc/kmsg started.
Jun 28 15:56:26 la-noc rsyslogd: [origin software="rsyslogd" swVersion="4.6.2" x-pid="1459" x-info="http://www.rsyslog.com"] (re)start
Jun 28 15:56:26 la-noc kernel: Initializing cgroup subsys cpuset
Jun 28 15:56:26 la-noc kernel: Initializing cgroup subsys cpu
Jun 28 15:56:26 la-noc kernel: Linux version 2.6.32-220.17.1.el6.x86_64 ([email protected]) (gcc version 4.4.6 20110731 (Red Hat 4.4.6-3) (GCC) ) #1 S$
Jun 28 15:56:26 la-noc kernel: Command line: ro root=/dev/mapper/VolGroup-lv_root rd_NO_LUKS LANG=en_US.UTF-8 rd_NO_MD rd_LVM_LV=VolGroup/lv_swap SYSFONT=latarcyrheb-s$
Jun 28 15:56:26 la-noc kernel: KERNEL supported cpus:
Jun 28 15:56:26 la-noc kernel: Intel GenuineIntel
Jun 28 15:56:26 la-noc kernel: AMD AuthenticAMD
Jun 28 15:56:26 la-noc kernel: Centaur CentaurHauls
Jun 28 15:56:26 la-noc kernel: BIOS-provided physical RAM map:
Jun 28 15:56:26 la-noc kernel: BIOS-e820: 0000000000000000 - 000000000009f400 (usable)
Jun 28 15:56:26 la-noc kernel: BIOS-e820: 000000000009f400 - 00000000000a0000 (reserved)
Jun 28 15:56:26 la-noc kernel: BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
Jun 28 15:56:26 la-noc kernel: BIOS-e820: 0000000000100000 - 00000000fbff0000 (usable)
Jun 28 15:56:26 la-noc kernel: BIOS-e820: 00000000fbff0000 - 00000000fbfff000 (ACPI data)
Jun 28 15:56:26 la-noc kernel: BIOS-e820: 00000000fbfff000 - 00000000fc000000 (ACPI NVS)
Jun 28 15:56:26 la-noc kernel: BIOS-e820: 00000000ff780000 - 0000000100000000 (reserved)
Jun 28 15:56:26 la-noc kernel: BIOS-e820: 0000000100000000 - 0000000400000000 (usable)
Jun 28 15:56:26 la-noc kernel: DMI 2.3 present.
Jun 28 15:56:26 la-noc kernel: SMBIOS version 2.3 @ 0xF7570
Jun 28 15:56:26 la-noc kernel: AMI BIOS detected: BIOS may corrupt low RAM, working around it.
Jun 28 15:56:26 la-noc kernel: last_pfn = 0x400000 max_arch_pfn = 0x400000000
Jun 28 15:56:26 la-noc kernel: x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
Jun 28 15:56:26 la-noc kernel: total RAM covered: 16320M
Jun 28 15:56:26 la-noc kernel: Found optimal setting for mtrr clean up
Jun 28 15:56:26 la-noc kernel: gran_size: 64K chunk_size: 128M num_reg: 4 lose cover RAM: 0G
Jun 28 15:56:26 la-noc kernel: last_pfn = 0xfbff0 max_arch_pfn = 0x400000000
Jun 28 15:56:26 la-noc kernel: init_memory_mapping: 0000000000000000-00000000fbff0000
Jun 28 15:56:26 la-noc kernel: init_memory_mapping: 0000000100000000-0000000400000000
Jun 28 15:56:26 la-noc kernel: RAMDISK: 37217000 - 37fefcd2
Jun 28 15:56:26 la-noc kernel: ACPI: RSDP 00000000000f6f20 00024 (v02 ACPIAM)
Jun 28 15:56:26 la-noc kernel: ACPI: XSDT 00000000fbff0100 00054 (v01 A M I OEMXSDT 07000626 MSFT 00000097)
Jun 28 15:56:26 la-noc kernel: ACPI: FACP 00000000fbff0281 000F4 (v01 A M I OEMFACP 07000626 MSFT 00000097)
Jun 28 15:56:26 la-noc kernel: ACPI: DSDT 00000000fbff0410 03751 (v01 0AAAA 0AAAA000 00000000 INTL 02002026)
Jun 28 15:56:26 la-noc kernel: ACPI: FACS 00000000fbfff000 00040
Jun 28 15:56:26 la-noc kernel: ACPI: APIC 00000000fbff0380 00084 (v01 A M I OEMAPIC 07000626 MSFT 00000097)
Jun 28 15:56:26 la-noc kernel: ACPI: OEMB 00000000fbfff040 00041 (v01 A M I OEMBIOS 07000626 MSFT 00000097)
Jun 28 15:56:26 la-noc kernel: ACPI: SRAT 00000000fbff3b70 00110 (v01 A M I OEMSRAT 07000626 MSFT 00000097)
Jun 28 15:56:26 la-noc kernel: ACPI: ASF! 00000000fbff3cc0 00086 (v01 AMIASF AMDSTRET 00000001 INTL 02002026)
Jun 28 15:56:26 la-noc kernel: SRAT: PXM 0 -> APIC 0 -> Node 0
Jun 28 15:56:26 la-noc kernel: SRAT: PXM 0 -> APIC 1 -> Node 0
Jun 28 15:56:26 la-noc kernel: SRAT: PXM 1 -> APIC 2 -> Node 1
Jun 28 15:56:26 la-noc kernel: SRAT: PXM 1 -> APIC 3 -> Node 1
Jun 28 15:56:26 la-noc kernel: SRAT: Node 0 PXM 0 100000-fc000000
Jun 28 15:56:26 la-noc kernel: SRAT: Node 1 PXM 1 200000000-400000000
Jun 28 15:56:26 la-noc kernel: SRAT: Node 0 PXM 0 100000000-200000000
Jun 28 15:56:26 la-noc kernel: SRAT: Node 0 PXM 0 0-9fc00
Jun 28 15:56:26 la-noc kernel: Bootmem setup node 0 0000000000000000-0000000200000000
Jun 28 15:56:26 la-noc kernel: NODE_DATA [0000000000028040 - 000000000005c03f]
Jun 28 15:56:26 la-noc kernel: bootmap [000000000005d000 - 000000000009cfff] pages 40
Jun 28 15:56:26 la-noc kernel: (9 early reservations) ==> bootmem [0000000000 - 0200000000]
Jun 28 15:56:26 la-noc kernel: #0 [0000000000 - 0000001000] BIOS data page ==> [0000000000 - 0000001000]
Jun 28 15:56:26 la-noc kernel: #1 [0000006000 - 0000008000] TRAMPOLINE ==> [0000006000 - 0000008000]
Jun 28 15:56:26 la-noc kernel: #2 [0001000000 - 000200c864] TEXT DATA BSS ==> [0001000000 - 000200c864]
Jun 28 15:56:26 la-noc kernel: #3 [0037217000 - 0037fefcd2] RAMDISK ==> [0037217000 - 0037fefcd2]
Jun 28 15:56:26 la-noc kernel: #4 [000009f400 - 0000100000] BIOS reserved ==> [000009f400 - 0000100000]
更新 sar 输出到这里:
root@la-noc [~]# sar
Linux 2.6.32-220.13.1.el6.x86_64 (server.abc.com) 06/28/2012 _x86_64_ (4 CPU)
12:00:01 AM CPU %user %nice %system %iowait %steal %idle
12:10:01 AM all 0.87 0.01 0.34 0.35 0.00 98.44
12:20:01 AM all 0.51 0.01 0.25 0.18 0.00 99.04
12:30:01 AM all 0.62 0.01 0.26 0.22 0.00 98.89
12:40:01 AM all 0.78 0.01 0.31 0.27 0.00 98.63
12:50:01 AM all 0.52 0.01 0.25 0.18 0.00 99.04
01:00:01 AM all 0.71 0.01 0.25 0.22 0.00 98.81
01:10:01 AM all 0.61 0.19 0.33 0.33 0.00 98.54
01:20:01 AM all 0.51 0.01 0.24 0.19 0.00 99.05
01:30:01 AM all 0.55 0.01 0.26 0.21 0.00 98.97
01:40:01 AM all 0.56 0.01 0.31 0.21 0.00 98.92
01:50:01 AM all 0.40 0.01 0.21 0.18 0.00 99.20
02:00:01 AM all 0.55 0.01 0.25 0.23 0.00 98.96
02:10:01 AM all 0.60 0.01 0.29 0.36 0.00 98.75
02:20:01 AM all 0.66 0.01 0.24 0.19 0.00 98.91
02:30:01 AM all 2.65 0.01 0.43 0.24 0.00 96.66
02:40:01 AM all 1.90 0.01 0.54 0.26 0.00 97.29
02:50:01 AM all 3.31 0.02 0.54 0.31 0.00 95.82
03:00:01 AM all 1.48 0.01 0.33 0.27 0.00 97.91
03:10:01 AM all 0.88 0.01 0.33 0.44 0.00 98.34
03:20:01 AM all 0.62 0.19 0.40 0.24 0.00 98.54
03:30:01 AM all 0.94 0.01 0.41 0.19 0.00 98.45
03:40:01 AM all 1.17 0.01 0.35 0.21 0.00 98.26
03:50:01 AM all 0.82 0.02 0.37 0.20 0.00 98.59
04:00:01 AM all 0.61 0.01 0.30 0.18 0.00 98.91
04:10:01 AM all 0.66 0.01 0.28 0.35 0.00 98.70
04:20:01 AM all 0.37 0.01 0.23 0.17 0.00 99.22
04:30:01 AM all 0.72 0.01 0.25 0.16 0.00 98.86
04:40:01 AM all 0.83 0.02 0.29 0.18 0.00 98.69
04:50:01 AM all 0.51 0.01 0.24 0.21 0.00 99.03
05:00:01 AM all 0.63 0.01 0.25 0.22 0.00 98.89
05:10:01 AM all 0.80 0.01 0.34 0.39 0.00 98.47
05:20:01 AM all 0.56 0.19 0.26 0.22 0.00 98.77
05:30:01 AM all 0.69 0.02 0.35 0.26 0.00 98.69
05:40:01 AM all 0.79 0.01 0.51 0.24 0.00 98.45
05:50:01 AM all 0.45 0.01 0.23 0.16 0.00 99.15
06:00:01 AM all 0.52 0.01 0.26 0.21 0.00 98.99
06:10:01 AM all 0.95 0.01 0.33 0.44 0.00 98.27
06:20:01 AM all 0.79 0.02 0.30 0.24 0.00 98.65
06:30:01 AM all 1.16 0.01 0.31 0.20 0.00 98.33
06:40:01 AM all 0.70 0.01 0.29 0.23 0.00 98.77
06:50:01 AM all 0.77 0.01 0.25 0.21 0.00 98.77
07:00:01 AM all 0.76 0.01 0.27 0.26 0.00 98.70
07:00:01 AM CPU %user %nice %system %iowait %steal %idle
07:10:01 AM all 0.68 0.20 0.32 0.40 0.00 98.40
07:20:01 AM all 1.03 0.01 0.37 0.21 0.00 98.38
07:30:01 AM all 0.67 0.01 0.25 0.19 0.00 98.89
07:40:01 AM all 0.77 0.01 0.31 0.25 0.00 98.66
07:50:01 AM all 1.09 0.01 0.30 0.33 0.00 98.27
08:00:01 AM all 1.27 0.02 0.36 0.23 0.00 98.13
08:10:01 AM all 0.70 0.01 0.29 0.37 0.00 98.64
08:20:01 AM all 0.54 0.01 0.24 0.19 0.00 99.03
08:30:01 AM all 0.73 0.01 0.27 0.27 0.00 98.73
08:40:01 AM all 0.67 0.01 0.28 0.27 0.00 98.77
08:50:01 AM all 0.48 0.02 0.23 0.16 0.00 99.11
09:00:01 AM all 0.52 0.01 0.24 0.21 0.00 99.02
09:10:01 AM all 0.63 0.18 0.32 0.34 0.00 98.52
09:20:01 AM all 0.86 0.01 0.31 0.23 0.00 98.60
09:30:01 AM all 0.84 0.01 0.28 0.29 0.00 98.57
09:40:01 AM all 1.36 0.02 0.34 0.27 0.00 98.01
09:50:01 AM all 1.12 0.01 0.31 0.26 0.00 98.29
10:00:01 AM all 0.49 0.01 0.25 0.20 0.00 99.05
10:10:01 AM all 0.55 0.01 0.26 0.34 0.00 98.84
10:20:01 AM all 0.61 0.01 0.27 0.23 0.00 98.89
10:30:01 AM all 0.76 0.02 0.28 0.28 0.00 98.66
10:40:01 AM all 0.60 0.01 0.30 0.25 0.00 98.84
10:50:01 AM all 0.71 0.01 0.37 0.27 0.00 98.65
11:00:01 AM all 0.58 0.01 0.35 0.25 0.00 98.81
11:10:01 AM all 1.03 0.21 0.44 0.43 0.00 97.89
11:20:01 AM all 0.74 0.02 0.27 0.26 0.00 98.72
11:30:01 AM all 0.78 0.01 0.27 0.29 0.00 98.66
11:40:01 AM all 0.79 0.01 0.29 0.20 0.00 98.70
11:50:01 AM all 0.90 0.01 0.55 0.54 0.00 98.00
12:00:01 PM all 0.84 0.01 0.53 0.73 0.00 97.89
12:10:01 PM all 0.92 0.02 0.90 1.50 0.00 96.66
12:20:01 PM all 0.87 0.01 0.87 1.44 0.00 96.81
12:30:01 PM all 0.89 0.01 0.86 1.42 0.00 96.82
12:40:01 PM all 0.88 0.01 0.86 1.31 0.00 96.93
Average: all 0.82 0.02 0.34 0.32 0.00 98.49
03:56:19 PM LINUX RESTART
04:00:01 PM CPU %user %nice %system %iowait %steal %idle
04:10:01 PM all 0.96 0.19 0.41 1.10 0.00 97.34
04:20:01 PM all 0.47 0.01 0.22 0.30 0.00 99.00
04:30:01 PM all 0.52 0.01 0.24 0.33 0.00 98.90
04:40:01 PM all 0.88 0.02 0.33 0.65 0.00 98.12
04:50:01 PM all 1.35 0.01 0.30 0.27 0.00 98.06
05:00:01 PM all 0.66 0.01 0.26 0.26 0.00 98.82
05:10:01 PM all 0.46 0.01 0.23 0.23 0.00 99.08
05:20:01 PM all 0.51 0.01 0.22 0.23 0.00 99.03
05:30:01 PM all 0.64 0.01 0.30 0.26 0.00 98.78
05:40:01 PM all 0.73 0.01 0.29 0.41 0.00 98.56
05:50:01 PM all 0.60 0.01 0.22 0.23 0.00 98.94
06:00:01 PM all 0.61 0.01 0.35 0.26 0.00 98.78
06:10:01 PM all 0.55 0.01 0.26 0.29 0.00 98.89
06:20:01 PM all 0.67 0.21 0.27 0.31 0.00 98.55
06:30:01 PM all 1.07 0.01 0.36 0.33 0.00 98.23
06:40:01 PM all 0.95 0.01 0.51 0.39 0.00 98.14
06:50:01 PM all 0.75 0.01 0.39 0.24 0.00 98.61
07:00:01 PM all 0.84 0.01 0.50 0.23 0.00 98.43
Average: all 0.73 0.03 0.31 0.35 0.00 98.57
root@la-noc [~]#
更新我正在使用 ftp 以 1.1Mbps 的速度将巨大的视频文件上传到我的服务器,硬盘故障是否会导致服务器死机?
答案1
这是内核崩溃的输出(底部);最有趣的部分在顶部。由于服务器已经重新启动,因此最好的办法是查找 中的错误/var/log/messages
。
答案2
您是否安装了 sysstat(sar
命令)?如果安装了,它可以为您提供有关服务器负载、内存使用情况、磁盘 IOPS 等的非常有用的历史信息。它不会给您一个明确的答案,但了解服务器在内核崩溃之前正在做什么总是有帮助的。
如果您尚未安装它,我会安装它以供将来使用。
答案3
原因可能有很多。硬件故障、驱动程序故障、冷却等等。如果没有完整的堆栈跟踪,很难诊断。
我认为应该检查一些基本事项 -- * 保持软件(包括内核)为最新 * 确保仅使用兼容 Linux 的硬件 * 检查系统/CPU 温度。使用 RAID 控制器实用程序检查 RAID 和 HDD 是否运行良好 * 启用内核核心转储(谷歌搜索说明) * 如果服务器可以脱机,请运行一些压力测试 -- bonnie++/fio/iozone 并通过 sar 捕获数据。
干杯,Chida