我们的 Nagios 监视器遇到了一些挑战check_icmp
……我们的网络受到微下击的影响,微下击可能会使通过防火墙的流量在 1 到 2 毫秒内丢失。我们正在解决通过防火墙的微下击问题,但微下击实际上触发了来自 Nagios 的虚假主机停机警报……
Sun Jul 14 00:00:37 CDT 2013 [1373778037] HOST ALERT: host1;DOWN;SOFT;1;CRITICAL - 105.195.240.6: rta nan, lost 100%
Sun Jul 14 00:00:37 CDT 2013 [1373778037] HOST ALERT: host2;DOWN;SOFT;1;CRITICAL - 105.195.115.33: rta nan, lost 100%
Sun Jul 14 00:00:37 CDT 2013 [1373778037] HOST ALERT: host3;DOWN;SOFT;1;CRITICAL - 105.193.26.8: rta nan, lost 100%
Sun Jul 14 00:00:37 CDT 2013 [1373778037] HOST ALERT: host4;DOWN;SOFT;1;CRITICAL - 105.193.221.73: rta nan, lost 100%
Sun Jul 14 00:00:37 CDT 2013 [1373778037] HOST ALERT: host5;DOWN;SOFT;1;CRITICAL - 105.194.18.91: rta nan, lost 100%
原因是check_icmp
使用了荒谬的数据包间隔默认值...默认数据包间隔如此之低,以至于整个 ping 周期可以容纳在通过防火墙的一个微突发的空间内...这就是我们在使用时看到的情况check_icmp -n 5 -t 3 -v 10.19.26.29
[mpenning@target1 ~]$ sudo tshark -i eth0 icmp and host nagios.domain.local
[sudo] password for mpenning:
Running as user "root" and group "root". This could be dangerous.
Capturing on eth0
0.000000 10.19.20.16 -> 10.19.26.29 ICMP Echo (ping) request
0.000028 10.19.26.29 -> 10.19.20.16 ICMP Echo (ping) reply
0.000348 10.19.20.16 -> 10.19.26.29 ICMP Echo (ping) request
0.000358 10.19.26.29 -> 10.19.20.16 ICMP Echo (ping) reply
0.000572 10.19.20.16 -> 10.19.26.29 ICMP Echo (ping) request
0.000581 10.19.26.29 -> 10.19.20.16 ICMP Echo (ping) reply
0.000792 10.19.20.16 -> 10.19.26.29 ICMP Echo (ping) request
0.000801 10.19.26.29 -> 10.19.20.16 ICMP Echo (ping) reply
0.001017 10.19.20.16 -> 10.19.26.29 ICMP Echo (ping) request
0.001025 10.19.26.29 -> 10.19.20.16 ICMP Echo (ping) reply
虽然check_icmp
有一个-i
据称可以控制数据包间间距的开关,但由于某种原因,它不允许 500ms 的数据包间距......即使我以这样的方式运行它check_icmp -n 5 -t 3 -i 2000 -v 10.19.26.29
,时间也不会发生实质性变化......
[mpenning@target1 ~]$ sudo tshark -i eth0 icmp and host nagios.domain.local
Running as user "root" and group "root". This could be dangerous.
Capturing on eth0
0.000000 10.19.20.16 -> 105.19.26.29 ICMP Echo (ping) request
0.000018 10.19.26.29 -> 105.19.20.16 ICMP Echo (ping) reply
0.000327 10.19.20.16 -> 105.19.26.29 ICMP Echo (ping) request
0.000338 10.19.26.29 -> 105.19.20.16 ICMP Echo (ping) reply
0.000540 10.19.20.16 -> 105.19.26.29 ICMP Echo (ping) request
0.000552 10.19.26.29 -> 105.19.20.16 ICMP Echo (ping) reply
0.000813 10.19.20.16 -> 105.19.26.29 ICMP Echo (ping) request
0.000824 10.19.26.29 -> 105.19.20.16 ICMP Echo (ping) reply
0.001075 10.19.20.16 -> 105.19.26.29 ICMP Echo (ping) request
0.001087 10.19.26.29 -> 105.19.20.16 ICMP Echo (ping) reply
有没有办法强制 nagioscheck_icmp
或check_ping
方法将 ping 之间的数据包间隔增加到 500 毫秒?我知道我可以要求 nagios 为每个主机发送 5000 次 ping,但这似乎真的浪费了系统和网络资源,只是为了解决这个问题。
答案1
check_icmp 提供了一些可能有用的命令行调整。从命令行运行 check_icmp -h 可了解更多信息。
-i
max packet interval (currently 80.000ms)
-I
max target interval (currently 0.000ms)
-m
number of alive hosts required for success
-l
TTL on outgoing packets (currently 0)
-t
timeout value (seconds, currently 10)
答案2
据我了解
-i 最大数据包间隔(当前为 80.000ms)
-i 2000(2.000 毫秒)
-i 80000(80.000 毫秒)
-i 500000 (500.000 毫秒)