Nagios 服务正在运行,网站错误:无法读取主机和服务状态信息

Nagios 服务正在运行,网站错误:无法读取主机和服务状态信息

Nagios 从 3.5.1 升级到 4.0.8

我想在 nagios 支持论坛中询问这个问题,但是一小时后,我没有收到设置我的帐户的确认电子邮件......

nagios 似乎可以作为服务正常运行,但 Web CGI 无法正常工作,apache 的 error.log 和 nagios.log 中均无错误。我检查了权限,并查看了一些包含此嵌入错误的 C 代码:

哎呀!错误:无法读取主机和服务状态信息!

几乎 nagios 主页左侧的每个菜单都会出现上述相同的错误。

nagios.log 在启动和停止时看起来像这样,从初始化开始:

[1431102009] Nagios 4.0.8 starting... (PID=27779)
[1431102009] Local time is Fri May 08 13:20:09 ADT 2015
[1431102009] LOG VERSION: 2.0
[1431102009] qh: Socket '/usr/local/nagios/var/rw/query.sh' successfully initialized
[1431102009] qh: core query handler registered
[1431102009] nerd: Channel hostchecks registered successfully
[1431102009] nerd: Channel servicechecks registered successfully
[1431102009] nerd: Channel opathchecks registered successfully
[1431102009] nerd: Fully initialized and ready to rock!
[1431102009] wproc: Successfully registered manager as @wproc with query handler
[1431102009] wproc: Registry request: name=Core Worker 27785;pid=27785
[1431102009] wproc: Registry request: name=Core Worker 27786;pid=27786
[1431102009] wproc: Registry request: name=Core Worker 27782;pid=27782
[1431102009] wproc: Registry request: name=Core Worker 27781;pid=27781
[1431102009] wproc: Registry request: name=Core Worker 27783;pid=27783
[1431102009] wproc: Registry request: name=Core Worker 27784;pid=27784
[1431102009] Successfully launched command file worker with pid 27787
[1431102022] Caught SIGTERM, shutting down...
[1431102022] Successfully shutdown... (PID=27779)
[1431102022] Event broker module 'NERD' deinitialized successfully.

使用 -v 运行是干净的:

# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

Nagios Core 4.0.8
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 08-12-2014
License: GPL

Website: http://www.nagios.org
Reading configuration data...
   Read main config file okay...
   Read object config files okay...

Running pre-flight check on configuration data...

Checking objects...
        Checked 816 services.
        Checked 826 hosts.
        Checked 11 host groups.
        Checked 0 service groups.
        Checked 18 contacts.
        Checked 13 contact groups.
        Checked 61 commands.
        Checked 6 time periods.
        Checked 0 host escalations.
        Checked 0 service escalations.
Checking for circular paths...
        Checked 826 hosts
        Checked 0 service dependencies
        Checked 0 host dependencies
        Checked 6 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...

Total Warnings: 0
Total Errors:   0

Things look okay - No serious problems were detected during the pre-flight check

此外,check_nagios 表示我们运行正常:

# /usr/local/nagios/libexec/check_nagios /var/log/nagios.log 5 '/usr/local/nagios/bin/nagios'
NAGIOS OK: 8 processes, status log updated 11 seconds ago

一种可能性是,错误意味着它无法访问 nagios.cfg 文件。我已检查该路径是否在该路径上的所有目录中都为“其他”(以涵盖 apache 用户)。无论如何,如果存在权限问题,则会导致 apache 错误。我已经研究了几个小时,但找不到故障点,也找不到发生了什么变化。

主页还在 Nagios Core 徽标下显示“无法获取进程状态”。这是由于在 main.php 中运行 statusjson.cgi 导致的 - 不确定它在查看什么,但是当我从 main.php 手动运行 CGI 查询 (cgi-bin/statusjson.cgi?query=programstatus) 时,页面是空白的。我在 Google 上搜索过这个问题,也在 nagios 论坛上搜索过,但其他人似乎都有一些日志错误可以提供更多线索。

我确实有一个异常......

我发现另一个 nagios.log 每次服务启动时都会只改变几行内容:

# cat /usr/local/nagios/var/nagios.log
[1431088940] Error: Cannot open main configuration file '/' for reading!
[1431088940] Error: Failed to process config file '/'. Aborting

也许 init 或 cfg 文件出了点​​问题,但我找不到。作为另一项测试,我可以 su 到 nagios 并手动运行守护进程。

su - nagios
[nagios@atlas ~]$ /usr/local/nagios/bin/nagios /usr/local/nagios/etc/nagios.cfg

Nagios Core 4.0.8
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 08-12-2014
License: GPL

Website: http://www.nagios.org
Nagios 4.0.8 starting... (PID=23234)
Local time is Fri May 08 13:45:12 ADT 2015
nerd: Channel hostchecks registered successfully
nerd: Channel servicechecks registered successfully
nerd: Channel opathchecks registered successfully
nerd: Fully initialized and ready to rock!
wproc: Successfully registered manager as @wproc with query handler
wproc: Registry request: name=Core Worker 23235;pid=23235
wproc: Registry request: name=Core Worker 23236;pid=23236
wproc: Registry request: name=Core Worker 23237;pid=23237
wproc: Registry request: name=Core Worker 23238;pid=23238
wproc: Registry request: name=Core Worker 23239;pid=23239
wproc: Registry request: name=Core Worker 23240;pid=23240
Successfully launched command file worker with pid 23241

我希望这可以避免 init 脚本中出现任何异常。它不会触及 /usr/local/nagios/var/nagios.log(预期),但它不会更改来自网站 cgis 的错误。另一个线索是,当像这样手动启动 nagios 时,我在主机和状态项的屏幕上看不到任何日志记录。如果我启动 init,nagios 日志中会出现一些关于某些主机性能、抖动和常见喋喋不休的警告,但当以 nagios 用户身份从命令行启动时,它不会再说上述内容了。

答案1

这个问题最终确实出现在了 nagios 核心支持论坛上,并在那里得到了解决。

http://support.nagios.com/forum/viewtopic.php?f=7&t=32795

在这个特殊情况下,我们缺少配置条目

state_retention 状态文件

但是有很多不同类型的错误也可能导致以“Whoops!”开头的 Web 界面错误。

相关内容