在 Icinga Master-Satellite-Agent 基础设施中向代理添加服务检查

在 Icinga Master-Satellite-Agent 基础设施中向代理添加服务检查

我有这样的环境:

  • 掌握
  • 一些卫星被分配给主
  • 许多代理被分配给卫星,一些代理被分配给主代理(没有卫星)。

所有系统都已准备就绪,PKI 设置已完成。此外,大多数默认检查(apt、磁盘、cpu)都在运行,我可以看到主服务器上的当前状态。现在,我已经开始实施自定义检查(例如 check_eth 来监控网络流量)。我已将脚本发布到所有主机,并在所有主机上定义了以下命令:

object CheckCommand "check_eth" {
  import "plugin-check-command"
  command = [ "/usr/bin/sudo", PluginDir + "/check_eth" ]
 
  arguments       = {
   "-w" = {
      value                     = "$eth_warning$"
      description               = "Percent free/used when to warn"
      required                  = true
    }
    "-c" = {
      value                     = "$eth_critical$"
      description               = "Percent free/used when critical"
      required                  = true
    }
    "-i" = {
      value                     = "$eth_interface$"
      description               = "Given network interface"
      required                  = true
    }
  }

  vars.eth_interface  = "enp0s31f6"
  vars.eth_warning  = "2048G"
  vars.eth_critical = "4096G"
}

我可以在所有主机上运行该脚本。在主服务器、卫星服务器和所有直接分配给主服务器的主机上,检查的响应都是可见的。在所有具有父级=卫星服务器的主机上,状态为未知。这就是我的问题...为什么?

宿主对象如下:

# master: /etc/icinga2/zones.conf

object Endpoint "monitor.domain" {
}

object Zone "master" {
  endpoints = [ "monitor.domain" ]
}

object Endpoint "satellite1.domain" {
    host = "<ip>"
    port = "<port>"
}

object Zone "satellite1.domain" {
    parent = "master"
    endpoints = [ "satellite1.domain" ]
}

卫星配置如下:

# master: /etc/icinga2/zones.d/satellite1.domain/hosts.conf

object Host "satellite1.domain" {
    import "generic-host"
    check_command = "hostalive"
    zone = "master"

    address = "<ipv4>"
    address6 = "<ipv6>"
    
    vars.agent_endpoint = name
    ...
}

object Host "agent1.domain" {
    import "generic-host"
    check_command = "hostalive"
    zone = "satellite1.domain"

    address = "<ipv4>"
    address6 = "<ipv6>"
    
    vars.agent_endpoint = name
    ...
}
...

卫星内部的区域(包括端点)也在主服务器上定义:

# master: /etc/icinga2/zones.d/satellite1.domain/zones.conf
object Zone "agent1.domain" {
    parent = "satellite1.domain"
    endpoints = [ "agent1.domain" ]
}

object Endpoint "agent1.domain" {
    host = "<ip>"
    port = "<port>"
}

现在将命令应用到主机(也在主机上定义)

# master: /etc/icinga2/zones.d/satellite1.domain/services.conf

apply Service "Network Traffic" {
  import "generic-service"

  check_command = "check_eth"
  command_endpoint = host_name

  assign where host.name == "satellite1.domain"
}

apply Service "Network Traffic" {
  import "generic-service"

  check_command = "check_eth"
  command_endpoint = host_name

  assign where host.name == "agent1.domain"
}

我错过了什么?

答案1

啊,现在我发现了问题。检查命令定义包含一个默认值,该默认值eth_interface存在于卫星和主服务器上。但虚拟机有另一个接口。如果我删除检查命令默认变量并为每个主机对象分配该变量,一切都很好。

相关内容