Nagios 收到 OK 被动警报但仍报告“被动检查已过时”

Nagios 收到 OK 被动警报但仍报告“被动检查已过时”

我正在使用被动警报对 Nagios 进行一些监控。我遇到了一些奇怪的行为:Nagios 收到了被动警报,但 Nagios 坚持认为这些警报已经过时。

这里有一些日志;为什么 Nagios 在刚刚收到结果SERVICE ALERT时会继续生成?OK

[1527969438] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;ldap-uat-sh.example.com;ldap_base;0;OK
[1527969440] PASSIVE SERVICE CHECK: ldap-uat-sh.example.com;ldap_base;0;OK
[1527969440] SERVICE ALERT: ldap-uat-sh.example.com;ldap_base;OK;HARD;6;OK
[1527969440] SERVICE ALERT: ldap-uat-sh.example.com;ldap_base;CRITICAL;SOFT;1;CRITICAL: Passive check is stale
[1527969440] SERVICE ALERT: ldap-uat-sh.example.com;ldap_base;CRITICAL;SOFT;2;CRITICAL: Passive check is stale
...
[1527969440] SERVICE ALERT: ldap-uat-sh.example.com;ldap_base;CRITICAL;HARD;6;CRITICAL: Passive check is stale
[1527969851] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;ldap-uat-sh.example.com;ldap_base;0;OK
[1527969855] PASSIVE SERVICE CHECK: ldap-uat-sh.example.com;ldap_base;0;OK
[1527969855] SERVICE ALERT: ldap-uat-sh.example.com;ldap_base;OK;HARD;6;OK
[1527969855] SERVICE ALERT: ldap-uat-sh.example.com;ldap_base;CRITICAL;SOFT;1;CRITICAL: Passive check is stale
[1527969855] SERVICE ALERT: ldap-uat-sh.example.com;ldap_base;CRITICAL;SOFT;2;CRITICAL: Passive check is stale
...
[1527969860] SERVICE ALERT: ldap-uat-sh.example.com;ldap_base;CRITICAL;HARD;6;CRITICAL: Passive check is stale
[1527970279] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;ldap-uat-sh.example.com;ldap_base;0;OK
[1527970280] PASSIVE SERVICE CHECK: ldap-uat-sh.example.com;ldap_base;0;OK
[1527970280] SERVICE ALERT: ldap-uat-sh.example.com;ldap_base;OK;HARD;6;OK
[1527970285] SERVICE ALERT: ldap-uat-sh.example.com;ldap_base;CRITICAL;SOFT;1;CRITICAL: Passive check is stale
[1527970285] SERVICE ALERT: ldap-uat-sh.example.com;ldap_base;CRITICAL;SOFT;2;CRITICAL: Passive check is stale
...
[1527970295] SERVICE ALERT: ldap-uat-sh.example.com;ldap_base;CRITICAL;HARD;6;CRITICAL: Passive check is stale

以下是相关配置:

define service {
    use                     ldap-nprod-service-template
    hostgroup_name          ldap-aws-uat-all-hostgroup
    service_description     ldap_base
    active_checks_enabled   0          
    passive_checks_enabled  1          
    check_freshness         1          
    freshness_threshold     900        
    check_command           check_freshness_critical
}

define host {
    use         ldap-nprod-host-template
    host_name   ldap-uat-sh.example.com
    alias       ldap-uat-sh.example.com
    address     ldap-uat-sh.example.com
    check_command check_dummy_host
}

define hostgroup {
    hostgroup_name  ldap-aws-uat-all-hostgroup
    alias           LDAP AWS UAT ALL Group
    members         ldap-uat-sh.example.com
}

答案1

我从 Nagios 中取出有问题的监视器,重新启动 Nagios,然后重新添加监视器。这样就解决了问题。

我的猜测是 Nagios 在判断何时发生抖动时存在一个错误,而它接收被动警报的时间可能会使其陷入这种奇怪的状态。

相关内容