自 Debian 升级以来,nagios 服务器上的 check_nrpe 命令不再起作用

自 Debian 升级以来,nagios 服务器上的 check_nrpe 命令不再起作用

昨天我将一台服务器从 Debian 9 升级到了 Debian 10。这台服务器由 nagios 监控。升级后,我收到一条警报,状态为未知,内容如下:

“卷组 array03-0 无效或未使用“-v Volumegroup”指定”,再见。false

该服务是 VG array03-0 的使用情况,其命令是 check_nrpe!check_vgs_array03-0。此服务的目标是当阵列上的存储几乎已满时生成警报。

check_nrpe命令是标准的:

# 'check_NRPE' command definition
define command{
        command_name check_nrpe
        command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
        }

如果我没记错的话,这意味着我在受监管服务器上的 /etc/nagios/nrpe.cfg 中有一个 check_vgs_array03-0 命令。让我们看看它,它在这里:

命令[check_vgs_array03-0]=/usr/lib/nagios/plugins/check_vg_size -w 20 -c 10 -v array03-0

如果我只是在受监督的服务器上输入此命令,则不会出现任何错误,它可以正常工作。

VG array03-0 OK 可用空间为 805 GB;| array03-0=805GB;20;10;0;19155

例如,如果我输入一个不存在的卷组名称,就会出现错误。

check_vg_size插件脚本如下:

#!/bin/bash
#check_vg_size
#set -x
# Plugin for Nagios
# Written by M. Koettenstorfer ([email protected])
# Some additions by J. Schoepfer ([email protected])
# Major changes into functions and input/output values J. Veverka ([email protected])
# Last Modified: 2012-11-06
#
# Description:
#
# This plugin will check howmany space in volume groups is free

# Nagios return codes
STATE_OK=0
STATE_WARNING=1
STATE_CRITICAL=2
STATE_UNKNOWN=3
STATE_DEPENDENT=4

SERVICEOUTPUT=""
SERVICEPERFDATA=""

PROGNAME=$(basename $0)

vgs_bin=`/usr/bin/whereis -b -B /sbin /bin /usr/bin /usr/sbin -f vgs | awk '{ print $2 }'`
_vgs="$vgs_bin --units=g"

bc_bin=`/usr/bin/whereis -b -B /sbin /bin /usr/bin /usr/sbin -f bc | awk '{ print $2 }'`

exitstatus=$STATE_OK #default
declare -a volumeGroups;
novg=0; #number of volume groups
allVG=false; #Will we use all volume groups we can find on system?
inPercent=false; #Use percentage for comparison?

unitsGB="GB"
unitsPercent="%"
units=$unitsGB

########################################################################
### DEFINE FUNCTIONS
########################################################################

print_usage() {
        echo "Usage: $PROGNAME  -w <min size warning level in gb> -c <min size critical level in gb> -v <volumegroupname> [-a] [-p]"
        echo "If '-a' and '-v' are specified: all volumegroups defined by -v will be ommited and the remaining groups which are found on system are checked"
        echo "If '-p' is specified: the warning and critical levels are represented as the percent space left on device"
    echo ""
}

print_help() {
        print_usage
        echo ""
        echo "This plugin will check how much space is free in volume groups"
        echo "usage: "
        exit $STATE_UNKNOWN
}


checkArgValidity () {
# Check arguments for validity
        if [[ -z $critlevel || -z $warnlevel ]] # Did we get warn and crit values?
        then
                echo "You must specify a warning and critical level"
                print_usage
                exitstatus=$STATE_UNKNOWN
                exit $exitstatus
        elif [ $warnlevel -le $critlevel ] # Do the warn/crit values make sense?
        then
        if [ $inPercent != 'true' ]
        then
            echo "CRITICAL value of $critlevel GB is less than WARNING level of $warnlevel GB"
            print_usage
            exitstatus=$STATE_UNKNOWN
            exit $exitstatus
        else
            echo "CRITICAL value of $critlevel % is higher than WARNING level of $warnlevel %"
            print_usage
            exitstatus=$STATE_UNKNOWN
            exit $exitstatus
        fi
        fi
}

#Does volume group actually exist?
volumeGroupExists () {
        local volGroup="$@"
        VGValid=$($_vgs 2>/dev/null | grep "$volGroup" | wc -l )

        if [[  -z "$volGroup" ||  $VGValid = 0 ]]
        then
                echo "Volumegroup $volGroup wasn't valid or wasn't specified"
                echo "with \"-v Volumegroup\", bye."
                echo false
                return 1
        else
                #The volume group exists
                echo true
                return 0
        fi
}

getNumberOfVGOnSystem () {
        local novg=$($_vgs 2>/dev/null | wc -l)
        let novg--
        echo $novg
}

getAllVGOnSystem () {
        novg=$(getNumberOfVGOnSystem)
        local found=false;
        for (( i=0; i < novg; i++)); do
                volumeGroups[$i]=$($_vgs | tail -n  $(($i+1)) | head -n 1 | awk '{print $1}')
                found=true;
        done
        if ( ! $found ); then
                echo "$found"
                echo "No Volumegroup wasn't valid or wasn't found"
                exit $STATE_UNKNOWN
        fi
}

getColumnNoByName () {
        columnName=$1
        result=$($_vgs 2>/dev/null | head -n1 | awk -v name=$columnName '
                BEGIN{}
                        { for(i=1;i<=NF;i++){
                              if ($i ~ name)
                                  {print i } }
                        }')

        echo $result
}

convertToPercent () {
#$1 = xx%
#$2 = 100%
    # Make values numbers only
        local input="$(echo $1 | sed 's/g//i')"
        local max="$(echo $2 | sed 's/g//i')"
        local onePercent='';
        local freePercent='';
        if [ -x "$bc_bin" ] ; then
                onePercent=$( echo "scale=2; $max / 100" | bc );
                freePercent=$( echo "$input / $onePercent" | bc );
        else
                freePercent=$(perl -e "print int((($max-$input)*100/$max))")
        fi
        echo $freePercent;
        return 0;
}

getSizesOfVolume () {
        volumeName="$1";
        #Check the actual sizes
        cnFree=`getColumnNoByName "VFree"`;
        cnSize=`getColumnNoByName "VSize"`;
        freespace=`$_vgs $volumeName 2>/dev/null | awk -v n=$cnFree '/[0-9]/{print $n}' | sed -e 's/[\.,\,].*//'`;
        fullspace=`$_vgs $volumeName 2>/dev/null | awk -v n=$cnSize '/[0-9]/{print $n}' | sed -e 's/[\.,\,].*//'`;

        if ( $inPercent ); then
        #Convert to Percents
                freespace="$(convertToPercent $freespace $fullspace)"
        fi
}

setExitStatus () {
        local status=$1
        local volGroup="$2"
        local formerStatus=$exitstatus

        if [ $status -gt $formerStatus ]
        then
                formerStatus=$status
        fi

        if [ $status = $STATE_UNKNOWN ] ; then
                SERVICEOUTPUT="${volGroup}"
                exitstatus=$STATE_UNKNOWN
                return
        fi

        if [ "$freespace" -le "$critlevel" ]
        then
                SERVICEOUTPUT=$SERVICEOUTPUT" VG $volGroup CRITICAL Available space is $freespace $units;"
                exitstatus=$STATE_CRITICAL
        elif [ "$freespace" -le "$warnlevel" ]
        then
                SERVICEOUTPUT=$SERVICEOUTPUT"VG $volGroup WARNING Available space is $freespace $units;"
                exitstatus=$STATE_WARNING
        else
                SERVICEOUTPUT=$SERVICEOUTPUT"VG $volGroup OK Available space is $freespace $units;"
                exitstatus=$STATE_OK
        fi

        SERVICEPERFDATA="$SERVICEPERFDATA $volGroup=$freespace$units;$warnlevel;$critlevel"
        if [ $inPercent != 'true' ] ; then

                SERVICEPERFDATA="${SERVICEPERFDATA};0;$fullspace"
        fi

        if [ $formerStatus -gt $exitstatus ]
        then
                exitstatus=$formerStatus
        fi
}


checkVolumeGroups () {
checkArgValidity
        for (( i=0; i < novg; i++ )); do
                local status="$STATE_OK"
                local currentVG="${volumeGroups[$i]}"

                local groupExists="$(volumeGroupExists "$currentVG" )"

                if [ "$groupExists" = 'true' ]; then
                        getSizesOfVolume "$currentVG"
                        status=$STATE_OK
                else
                        status=$STATE_UNKNOWN
                        setExitStatus $status "${groupExists}"
                        break
                fi

                setExitStatus $status "$currentVG"
        done
}

########################################################################
### RUN PROGRAM
########################################################################


########################################################################
#Read input values
while getopts ":w:c:v:h:ap" opt ;do
        case $opt in
                h)
                        print_help;
                        exit $exitstatus;
                        ;;
                w)
                        warnlevel=$OPTARG;
                        ;;
                c)
                        critlevel=$OPTARG;
                        ;;
                v)
                        if ( ! $allVG ); then
                                volumeGroups[$novg]=$OPTARG;
                                let novg++;
                        fi
                        ;;
                a)
                        allVG=true;
                        getAllVGOnSystem;
                        ;;
                p)
                        inPercent=true;
                        units=$unitsPercent
                        ;;
                \?)
                        echo "Invalid option: -$OPTARG" >&2
                        ;;
        esac
done

checkVolumeGroups


echo $SERVICEOUTPUT"|"$SERVICEPERFDATA
exit $exitstatus

II 将另一个参数(另一个脚本)用于 check_nrpe 命令,它起作用了。

例如 :

root@nagiosserver:/usr/local/nagios# /usr/local/nagios/libexec/check_nrpe -H srv-supervised04 -c check_load OK - 平均负载:3.79,2.99,1.83|load1=3.790;25.000;30.000;0; load5=2.990;20.000;25.000;0; load15=1.830;15.000;20.000;0;

VG array03-0 确实存在:

root@srv-supervised04:/usr/lib/nagios/plugins# vgdisplay --- 卷组 --- VG 名称 array03-0 系统 ID 格式
lvm2 元数据区域 1 元数据序列号 34 VG 访问 读/写 VG 状态 可调整大小 最大 LV 0 当前 LV 5 打开 LV 4 最大 PV
0 当前 PV 1 活动 PV 1 VG 大小
<18,71 TiB PE 大小 4,00 MiB 总 PE
4903887 分配 PE / 大小 4697600 / <17,92 TiB 可用 PE / 大小 206287 / <805,81 GiB VG UUID
OgzAMF-DGbW-3t3L-Wk7k-gY1g-s6fH-zYEKad

所以。VG 确实存在。check_vg_size 插件在本地使用时有效,check_nrpe 命令在与另一个插件一起使用时在 nagios 服务器上有效,但 check_vg_size 在 nagios 服务器上无效。错误消息显然是 array03-0 不存在,而它确实存在。我没有更改所有文件中的任何内容。它出现在 Debian 从 9 更新到 10 时(在安装过程中,我决定保留我的 nrpe.cfg 修改文件)。

有人知道它来自哪里吗?Debian 版本?也许是新版 bash?Nagios 服务器(仍然是 Debian 9)和受监管服务器(Debian 10)之间不兼容?

答案1

嗯,我认为我们遇到了一个共同的问题,NRPE、Nagios 和类似的工具在非特权用户上运行nagios,您正在以身份测试插件和命令root

目前我不确定从 Debian 9 到 10 的 LVM 数据是否有任何变化,但在较新的系统中,您肯定需要 root 才能查看 LVM 信息:

$ /sbin/lvs
  WARNING: Running as a non-root user. Functionality may be unavailable.
  /run/lock/lvm/P_global:aux: open failed: Permission denied

人们通常通过允许 Nagios 用户通过 sudo 执行某些命令来解决此问题:

nagios ALL=(root) NOPASSWD: /usr/lib/nagios/plugins/check_vg_size

请在用户下测试插件nagios并尝试 sudo

相关内容