我正在使用被动警报对 Nagios 进行一些监控。我遇到了一些奇怪的行为:Nagios 收到了被动警报,但 Nagios 坚持认为这些警报已经过时。
这里有一些日志;为什么 Nagios 在刚刚收到结果SERVICE ALERT
时会继续生成?OK
[1527969438] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;ldap-uat-sh.example.com;ldap_base;0;OK
[1527969440] PASSIVE SERVICE CHECK: ldap-uat-sh.example.com;ldap_base;0;OK
[1527969440] SERVICE ALERT: ldap-uat-sh.example.com;ldap_base;OK;HARD;6;OK
[1527969440] SERVICE ALERT: ldap-uat-sh.example.com;ldap_base;CRITICAL;SOFT;1;CRITICAL: Passive check is stale
[1527969440] SERVICE ALERT: ldap-uat-sh.example.com;ldap_base;CRITICAL;SOFT;2;CRITICAL: Passive check is stale
...
[1527969440] SERVICE ALERT: ldap-uat-sh.example.com;ldap_base;CRITICAL;HARD;6;CRITICAL: Passive check is stale
[1527969851] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;ldap-uat-sh.example.com;ldap_base;0;OK
[1527969855] PASSIVE SERVICE CHECK: ldap-uat-sh.example.com;ldap_base;0;OK
[1527969855] SERVICE ALERT: ldap-uat-sh.example.com;ldap_base;OK;HARD;6;OK
[1527969855] SERVICE ALERT: ldap-uat-sh.example.com;ldap_base;CRITICAL;SOFT;1;CRITICAL: Passive check is stale
[1527969855] SERVICE ALERT: ldap-uat-sh.example.com;ldap_base;CRITICAL;SOFT;2;CRITICAL: Passive check is stale
...
[1527969860] SERVICE ALERT: ldap-uat-sh.example.com;ldap_base;CRITICAL;HARD;6;CRITICAL: Passive check is stale
[1527970279] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;ldap-uat-sh.example.com;ldap_base;0;OK
[1527970280] PASSIVE SERVICE CHECK: ldap-uat-sh.example.com;ldap_base;0;OK
[1527970280] SERVICE ALERT: ldap-uat-sh.example.com;ldap_base;OK;HARD;6;OK
[1527970285] SERVICE ALERT: ldap-uat-sh.example.com;ldap_base;CRITICAL;SOFT;1;CRITICAL: Passive check is stale
[1527970285] SERVICE ALERT: ldap-uat-sh.example.com;ldap_base;CRITICAL;SOFT;2;CRITICAL: Passive check is stale
...
[1527970295] SERVICE ALERT: ldap-uat-sh.example.com;ldap_base;CRITICAL;HARD;6;CRITICAL: Passive check is stale
以下是相关配置:
define service {
use ldap-nprod-service-template
hostgroup_name ldap-aws-uat-all-hostgroup
service_description ldap_base
active_checks_enabled 0
passive_checks_enabled 1
check_freshness 1
freshness_threshold 900
check_command check_freshness_critical
}
define host {
use ldap-nprod-host-template
host_name ldap-uat-sh.example.com
alias ldap-uat-sh.example.com
address ldap-uat-sh.example.com
check_command check_dummy_host
}
define hostgroup {
hostgroup_name ldap-aws-uat-all-hostgroup
alias LDAP AWS UAT ALL Group
members ldap-uat-sh.example.com
}
答案1
我从 Nagios 中取出有问题的监视器,重新启动 Nagios,然后重新添加监视器。这样就解决了问题。
我的猜测是 Nagios 在判断何时发生抖动时存在一个错误,而它接收被动警报的时间可能会使其陷入这种奇怪的状态。