我通过 Nagios Core 3.5.1 使用 ping 和 https 检查 WAN 上的服务器。这是主机警报历史记录。
June 23, 2015 18:00
Service Ok[06-23-2015 18:13:47] SERVICE ALERT: webserver;PING;OK;HARD;3;PING OK - Packet loss = 0%, RTA = 33.72 ms
Service Ok[06-23-2015 18:13:40] SERVICE ALERT: webserver;HTTPS;OK;HARD;3;HTTP OK: HTTP/1.1 200 OK - 359 bytes in 0.201 second response time
Host Up[06-23-2015 18:06:29] HOST ALERT: webserver;UP;SOFT;8;PING OK - Packet loss = 0%, RTA = 33.92 ms
Host Down[06-23-2015 18:05:25] HOST ALERT: webserver;DOWN;SOFT;7;CRITICAL - Time to live exceeded (1.2.)
Host Down[06-23-2015 18:04:19] HOST ALERT: webserver;DOWN;SOFT;6;PING CRITICAL - Packet loss = 100%
Service Critical[06-23-2015 18:03:53] SERVICE ALERT: webserver;PING;CRITICAL;HARD;3;PING CRITICAL - Packet loss = 100%
Host Down[06-23-2015 18:03:49] HOST ALERT: webserver;DOWN;SOFT;5;PING CRITICAL - Packet loss = 100%
Service Critical[06-23-2015 18:03:49] SERVICE ALERT: webserver;HTTPS;CRITICAL;HARD;3;CRITICAL - Socket timeout after 10 seconds
Host Down[06-23-2015 18:02:19] HOST ALERT: webserver;DOWN;SOFT;4;(Host check timed out after 30.01 seconds)
Service Critical[06-23-2015 18:01:53] SERVICE ALERT: webserver;PING;CRITICAL;SOFT;2;PING CRITICAL - Packet loss = 100%
Service Critical[06-23-2015 18:01:49] SERVICE ALERT: webserver;HTTPS;CRITICAL;SOFT;2;CRITICAL - Socket timeout after 10 seconds
Host Down[06-23-2015 18:01:48] HOST ALERT: webserver;DOWN;SOFT;3;(Host check timed out after 30.01 seconds)
Host Down[06-23-2015 18:00:18] HOST ALERT: webserver;DOWN;SOFT;2;PING CRITICAL - Packet loss = 100%
June 23, 2015 17:00
Service Critical[06-23-2015 17:59:53] SERVICE ALERT: webserver;PING;CRITICAL;SOFT;1;PING CRITICAL - Packet loss = 100%
Service Critical[06-23-2015 17:59:49] SERVICE ALERT: webserver;HTTPS;CRITICAL;SOFT;1;CRITICAL - Socket timeout after 10 seconds
Host Down[06-23-2015 17:58:48] HOST ALERT: webserver;DOWN;SOFT;1;(Host check timed out after 30.02 seconds)
Service Ok[06-23-2015 17:29:48] SERVICE ALERT: webserver;PING;OK;SOFT;2;PING OK - Packet loss = 0%, RTA = 34.72 ms
所以,17点29分一切正常。 17:58 点到 18:05 点是数据包丢失 = 100% 且套接字超时。
我的问题是,为什么我没有收到通知?
前几天和今天我都收到“警告”通知,但我从未收到“严重”通知。
这是我的联系方式.cfg
define contact{
contact_name nagiosadmin ; Short name of user
use generic-contact ; Inherit default values from generic-contact template (defined above)
alias Nagios Admin ; Full name of user
email user@localhost ; <<***** CHANGE THIS TO YOUR EMAIL ADDRESS ******
}
这是我的 templates.cfg
define contact{
name generic-contact ; The name of this contact template
service_notification_period 24x7 ; service notifications can be sent anytime
host_notification_period 24x7 ; host notifications can be sent anytime
service_notification_options w,u,c,r,f,s ; send notifications for all service states, flapping events, and scheduled downtime events
host_notification_options d,u,r,f,s ; send notifications for all host states, flapping events, and scheduled downtime events
service_notification_commands notify-service-by-email ; send service notifications via email
host_notification_commands notify-host-by-email ; send host notifications via email
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL CONTACT, JUST A TEMPLATE!
}
答案1
您没有收到通知是因为所有报告的状态都是SOFT
。通知仅针对各州发出HARD
。
您需要查看服务配置并检查max_check_attempts
服务的值。