我有这样的环境:
- 掌握
- 一些卫星被分配给主
- 许多代理被分配给卫星,一些代理被分配给主代理(没有卫星)。
所有系统都已准备就绪,PKI 设置已完成。此外,大多数默认检查(apt、磁盘、cpu)都在运行,我可以看到主服务器上的当前状态。现在,我已经开始实施自定义检查(例如 check_eth 来监控网络流量)。我已将脚本发布到所有主机,并在所有主机上定义了以下命令:
object CheckCommand "check_eth" {
import "plugin-check-command"
command = [ "/usr/bin/sudo", PluginDir + "/check_eth" ]
arguments = {
"-w" = {
value = "$eth_warning$"
description = "Percent free/used when to warn"
required = true
}
"-c" = {
value = "$eth_critical$"
description = "Percent free/used when critical"
required = true
}
"-i" = {
value = "$eth_interface$"
description = "Given network interface"
required = true
}
}
vars.eth_interface = "enp0s31f6"
vars.eth_warning = "2048G"
vars.eth_critical = "4096G"
}
我可以在所有主机上运行该脚本。在主服务器、卫星服务器和所有直接分配给主服务器的主机上,检查的响应都是可见的。在所有具有父级=卫星服务器的主机上,状态为未知。这就是我的问题...为什么?
宿主对象如下:
# master: /etc/icinga2/zones.conf
object Endpoint "monitor.domain" {
}
object Zone "master" {
endpoints = [ "monitor.domain" ]
}
object Endpoint "satellite1.domain" {
host = "<ip>"
port = "<port>"
}
object Zone "satellite1.domain" {
parent = "master"
endpoints = [ "satellite1.domain" ]
}
卫星配置如下:
# master: /etc/icinga2/zones.d/satellite1.domain/hosts.conf
object Host "satellite1.domain" {
import "generic-host"
check_command = "hostalive"
zone = "master"
address = "<ipv4>"
address6 = "<ipv6>"
vars.agent_endpoint = name
...
}
object Host "agent1.domain" {
import "generic-host"
check_command = "hostalive"
zone = "satellite1.domain"
address = "<ipv4>"
address6 = "<ipv6>"
vars.agent_endpoint = name
...
}
...
卫星内部的区域(包括端点)也在主服务器上定义:
# master: /etc/icinga2/zones.d/satellite1.domain/zones.conf
object Zone "agent1.domain" {
parent = "satellite1.domain"
endpoints = [ "agent1.domain" ]
}
object Endpoint "agent1.domain" {
host = "<ip>"
port = "<port>"
}
现在将命令应用到主机(也在主机上定义)
# master: /etc/icinga2/zones.d/satellite1.domain/services.conf
apply Service "Network Traffic" {
import "generic-service"
check_command = "check_eth"
command_endpoint = host_name
assign where host.name == "satellite1.domain"
}
apply Service "Network Traffic" {
import "generic-service"
check_command = "check_eth"
command_endpoint = host_name
assign where host.name == "agent1.domain"
}
我错过了什么?
答案1
啊,现在我发现了问题。检查命令定义包含一个默认值,该默认值eth_interface
存在于卫星和主服务器上。但虚拟机有另一个接口。如果我删除检查命令默认变量并为每个主机对象分配该变量,一切都很好。