我有一个托管的 ubuntu 服务器,有几次它对所有事情都没有反应,直到完成硬重启...我已经提取了日志,但我需要一点帮助来弄清楚它们的含义..以及它们是否真的相关或者如果你认为这可能是一个硬件问题:
系统日志
Mar 1 15:11:01 xxxxxxxx CRON[24473]: (root) CMD (/usr/local/rtm/bin/rtm 35 > /dev/null 2> /dev/null)
Mar 1 15:12:01 xxxxxxxx CRON[24530]: (root) CMD (/usr/local/rtm/bin/rtm 35 > /dev/null 2> /dev/null)
Mar 1 15:13:01 xxxxxxxx CRON[24585]: (root) CMD (/usr/local/rtm/bin/rtm 35 > /dev/null 2> /dev/null)
Mar 1 15:14:01 xxxxxxxx CRON[24654]: (root) CMD (/usr/local/rtm/bin/rtm 35 > /dev/null 2> /dev/null)
Mar 1 15:15:01 xxxxxxxx CRON[24713]: (root) CMD (/usr/local/rtm/bin/rtm 35 > /dev/null 2> /dev/null)
Mar 1 15:16:01 xxxxxxxx CRON[24770]: (root) CMD (/usr/local/rtm/bin/rtm 35 > /dev/null 2> /dev/null)
Mar 1 15:17:01 xxxxxxxx CRON[24827]: (root) CMD (/usr/local/rtm/bin/rtm 35 > /dev/null 2> /dev/null)
Mar 1 15:17:01 xxxxxxxx CRON[24828]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Mar 1 15:17:05 xxxxxxxx postfix/pickup[23311]: 3CFEE2E3CF: uid=0 from=<root>
Mar 1 15:17:05 xxxxxxxx postfix/cleanup[24880]: 3CFEE2E3CF: message-id=<[email protected]>
Mar 1 15:17:05 xxxxxxxx postfix/qmgr[3886]: 3CFEE2E3CF: from=<[email protected]>, size=2080, nrcpt=1 (queue active)
Mar 1 15:17:05 xxxxxxxx postfix/smtp[24882]: 3CFEE2E3CF: to=<[email protected]>, relay=xxxxxxxxxxxxxx.dyndns.org[xxx.xxx.xxx.xxx]:25, delay=0.56, delays=0.08/0/0.21/0.26, dsn=2.6.0, status=sent (250 2.6.0 <[email protected]> Queued mail for delivery)
Mar 1 15:17:05 xxxxxxxx postfix/qmgr[3886]: 3CFEE2E3CF: removed
Mar 1 15:18:01 xxxxxxxx CRON[24897]: (root) CMD (/usr/local/rtm/bin/rtm 35 > /dev/null 2> /dev/null)
Mar 1 15:19:01 xxxxxxxx CRON[24944]: (root) CMD (/usr/local/rtm/bin/rtm 35 > /dev/null 2> /dev/null)
Mar 1 15:20:01 xxxxxxxx CRON[24999]: (root) CMD (/usr/local/rtm/bin/rtm 35 > /dev/null 2> /dev/null)
Mar 1 15:21:01 xxxxxxxx CRON[25046]: (root) CMD (/usr/local/rtm/bin/rtm 35 > /dev/null 2> /dev/null)
Mar 1 16:02:40 xxxxxxxx kernel: imklog 4.6.4, log source = /proc/kmsg started.
Mar 1 16:02:40 xxxxxxxx rsyslogd: [origin software="rsyslogd" swVersion="4.6.4" x-pid="3425" x-info="http://www.rsyslog.com"] (re)start
Mar 1 16:02:40 xxxxxxxx rsyslogd: rsyslogd's groupid changed to 103
Mar 1 16:02:40 xxxxxxxx rsyslogd: rsyslogd's userid changed to 101
Mar 1 16:02:40 xxxxxxxx rsyslogd-2039: Could no open output pipe '/dev/xconsole' [try http://www.rsyslog.com/e/2039 ]
Mar 1 16:02:40 xxxxxxxx kernel: Initializing cgroup subsys cpuset
Mar 1 16:02:40 xxxxxxxx kernel: Linux version 3.2.13-grsec-xxxx-grs-ipv6-64 ([email protected]) (gcc version 4.3.2 (Debian 4.3.2-1.1) ) #1 SMP Thu Mar 29 09:48:59 UTC 2012
Mar 1 16:02:40 xxxxxxxx kernel: Command line: root=/dev/sda1 console=tty0 BOOT_IMAGE=bzImage-2.6-xxxx-grs-ipv6-64
Mar 1 16:02:40 xxxxxxxx kernel: BIOS-provided physical RAM map:
Mar 1 16:02:40 xxxxxxxx kernel: BIOS-e820: 0000000000000000 - 000000000009d800 (usable)
Mar 1 16:02:40 xxxxxxxx kernel: BIOS-e820: 000000000009d800 - 00000000000a0000 (reserved)
Mar 1 16:02:40 xxxxxxxx kernel: BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
Mar 1 16:02:40 xxxxxxxx kernel: BIOS-e820: 0000000000100000 - 00000000df790000 (usable)
Mar 1 16:02:40 xxxxxxxx kernel: BIOS-e820: 00000000df790000 - 00000000df79e000 (ACPI data)
Mar 1 16:02:40 xxxxxxxx kernel: BIOS-e820: 00000000df79e000 - 00000000df7d0000 (ACPI NVS)
Mar 1 16:02:40 xxxxxxxx kernel: BIOS-e820: 00000000df7d0000 - 00000000df7e0000 (reserved)
Mar 1 16:02:40 xxxxxxxx kernel: BIOS-e820: 00000000df7ec000 - 00000000f0000000 (reserved)
Mar 1 16:02:40 xxxxxxxx kernel: BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
Mar 1 16:02:40 xxxxxxxx kernel: BIOS-e820: 00000000ffc00000 - 0000000100000000 (reserved)
Mar 1 16:02:40 xxxxxxxx kernel: BIOS-e820: 0000000100000000 - 0000000620000000 (usable)
Mar 1 16:02:40 xxxxxxxx kernel: NX (Execute Disable) protection: active
Mar 1 16:02:40 xxxxxxxx kernel: DMI present.
正如您所看到的,服务器在 cronjob 运行后立即停止...这里没有运行复杂的作业。
您能给我提供一些关于诊断该问题的建议吗?
谢谢
答案1
rsyslogd-2039:无法打开输出管道‘/dev/xconsole’[尝试http://www.rsyslog.com/e/2039]
这可能是你使用的 Ubuntu 版本中的一个已确认的错误,所以我希望这就是导致问题的原因,并首先尝试解决它。
你可以升级来解决这个问题,或者尝试一下这里的建议,这将编辑您的/etc/rsyslog.d/50-default.conf
文件(或运行apt-get upgrade
)。
如果做不到这一点,请停止运行在服务器挂起之前发生的 cron 作业,并查看它,看看它正在做什么,可能会导致您的服务器挂起。如果没有其他问题,修复该rsyslog
错误可能会让您捕获一些有用的日志信息,这些信息可以为您指明正确的方向。