Nagios 仅在 2 次检查后才将主机状态设置为 CRITICAL

2024-5-30 • tag-icon

我有一台 nagios 服务器，它也是一台备份服务器。该服务器从我网络中的 30 多个网络设备接收自动备份文件。网络设备每小时发送一次备份文件，但不是同时发送。我有一个简单的脚本来检查过去 30 分钟内是否创建了备份文件：

#! /bin/bash

PROGNAME=`basename $0`
PROGPATH=`echo $0 | sed -e 's,[\\/][^\\/][^\\/]*$,,'`

. $PROGPATH/utils.sh

if [ "$1" = "" ]
 then
  echo -e " Use : $PROGNAME -- Ex : $PROGNAME /etc/hosts \n "
  exit $STATE_UNKNOWN
fi

if [[ -z `find /backupdir/ -name "$1*" -mmin -30 -type f` ]]
 then
  echo "CRITICAL - $1 : backup not working for the last hour" 
  exit $STATE_CRITICAL
 else
  echo "OK : $1 config backup is working  "
  exit $STATE_OK
fi

因为在 30 分钟内可能有一些设备已经及时备份，有没有办法将检查服务设置为仅在一小时内检查 2 次后才设置为 CRITICAL 状态？我试过了，但似乎不起作用：

 # 'check backup'
 define service {
         hostgroup_name                  ciscos
         service_description             auto backup config check
         check_command                   check_cisco_backup
         use                             generic-service
         normal_check_interval           30
         max_check_attempts              4
         retry_check_interval            4
         notification_interval           60
 }

我的声誉不够，无法对您的回复发表评论。以下示例是为了澄清我的问题：

- router R1 backing up config file to nagios server N1 at the first minute of every hour 
- R2 -> N1 at 31st minute of every hour 
- I want N1 to run 'auto backup config check' service every 30 minute, 
- so at the first time the service run, apparently one of the two routers will be checked as CRITICAL and the other is OK, and the second the service run, the former OK one will be CRITICAL and vice versa

请看看您是否可以帮助定义服务或以最优的方式修改脚本。

答案1

您使用哪个版本的 nagios？我推测，如果您在每次检查时都以严重状态退出，则不会有任何升级。您可以通过 STATE_WARNING 退出并使用检查升级 cf：Nagios 根据服务状态检查服务频率

答案1

相关内容