我真的希望有人能为我提供解决方案。我使用 check_memory 检查我的远程主机,但无论我怎么尝试,Nagios 仪表板上都会出现“NRPE 无法读取输出”的信息。
我的nrpe.cfg:
command[check_memory]=/usr/local/nagios/libexec/check_memory -w 90 -c 5 -f
我的服务检查.cfg
define service {
use generic-service
host_name host.server.local
service_description Memory Usage
check_command check_nrpe!check_memory!
}
memory_check 文件位于 /usr/local/nagios/libexec,来自:交换中心
#!/usr/bin/env bash
#Set script name
SCRIPT=`basename ${BASH_SOURCE[0]}`
#Set default values
optMW=95
optMC=98
optSW=95
optSC=98
# help function
function printHelp {
echo -e \\n"Help for $SCRIPT"\\n
echo -e "Basic usage: $SCRIPT -w {warning} -c {critical} -W {warning} -C {critical}"\\n
echo "Command switches are optional, default values for warning is 95% and critical is 98%"
echo "-w - Sets warning value for Memory Usage. Default is 95%"
echo "-c - Sets critical value for Memory Usage. Default is 98%"
echo "-W - Sets warning value for Swap Usage. Default is 95%"
echo "-C - Sets critical value for Swap Usage. Default is 98%"
echo -e "-h - Displays this help message"\\n
echo -e "Example: $SCRIPT -w 80 -c 90 -W 40 -C 60"\\n
exit 1
}
# regex to check is OPTARG an integer
re='^[0-9]+$'
while getopts :w:c:W:C:h FLAG; do
case $FLAG in
w)
if ! [[ $OPTARG =~ $re ]] ; then
echo "error: Not a number" >&2; exit 1
else
optMW=$OPTARG
fi
;;
c)
if ! [[ $OPTARG =~ $re ]] ; then
echo "error: Not a number" >&2; exit 1
else
optMC=$OPTARG
fi
;;
W)
if ! [[ $OPTARG =~ $re ]] ; then
echo "error: Not a number" >&2; exit 1
else
optSW=$OPTARG
fi
;;
C)
if ! [[ $OPTARG =~ $re ]] ; then
echo "error: Not a number" >&2; exit 1
else
optSC=$OPTARG
fi
;;
h)
printHelp
;;
\?)
echo -e \\n"Option - $OPTARG not allowed."
printHelp
exit 2
;;
esac
done
shift $((OPTIND-1))
array=( $(cat /proc/meminfo | egrep 'MemTotal|MemFree|Buffers|Cached|SwapTotal|SwapFree' |awk '{print $1 " " $2}' |tr '\n' ' ' |tr -d ':' |awk '{ printf("%i %i %i %i %i %i %i", $2, $4, $6, $8, $10, $12, $14) }') )
memTotal_k=${array[0]}
memTotal_b=$(($memTotal_k*1024))
memFree_k=${array[1]}
memFree_b=$(($memFree_k*1024))
memBuffer_k=${array[2]}
memBuffer_b=$(($memBuffer_k*1024))
memCache_k=${array[3]}
memCache_b=$(($memCache_k*1024))
memTotal_m=$(($memTotal_k/1024))
memFree_m=$(($memFree_k/1024))
memBuffer_m=$(($memBuffer_k/1024))
memCache_m=$(($memCache_k/1024))
memUsed_b=$(($memTotal_b-$memFree_b-$memBuffer_b-$memCache_b))
memUsed_m=$(($memTotal_m-$memFree_m-$memBuffer_m-$memCache_m))
memUsedPrc=$((($memUsed_b*100)/$memTotal_b))
swapTotal_k=${array[5]}
swapTotal_b=$(($swapTotal_k*1024))
swapFree_k=${array[6]}
swapFree_b=$(($swapFree_k*1024))
swapUsed_k=$(($swapTotal_k-$swapFree_k))
swapUsed_b=$(($swapUsed_k*1024))
swapTotal_m=$(($swapTotal_k/1024))
swapFree_m=$(($swapFree_k/1024))
swapUsed_m=$(($swapTotal_m-$swapFree_m))
if [ $swapTotal_k -eq 0 ]; then
swapUsedPrc=0
else
swapUsedPrc=$((($swapUsed_k*100)/$swapTotal_k))
fi
message="[MEMORY] Total: $memTotal_m MB - Used: $memUsed_m MB - $memUsedPrc% [SWAP] Total: $swapTotal_m MB - Used: $swapUsed_m MB - $swapUsedPrc% | MTOTAL=$memTotal_b;;;; MUSED=$memUsed_b;;;; MCACHE=$memCache_b;;;; MBUFFER=$memBuffer_b;;;; STOTAL=$swapTotal_b;;;; SUSED=$swapUsed_b;;;;"
if [ $memUsedPrc -ge $optMC ] || [ $swapUsedPrc -ge $optSC ]; then
echo -e $message
$(exit 2)
elif [ $memUsedPrc -ge $optMW ] || [ $swapUsedPrc -ge $optSW ]; then
echo -e $message
$(exit 1)
else
echo -e $message
$(exit 0)
fi
我的 tar -f /var/log/messages:
nagios nagios: SERVICE NOTIFICATION: admins;host.server.local;Memory Usage;UNKNOWN;notify-service-by-email;NRPE: Unable to read output
当我进行“强制”检查时,/var/log/messages 显示以下内容:
nagios nagios: EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;host.server.local;Memory Usage;1612190795
我在远程主机上的命令行上手动检查:
[root@host libexec]# ./check_memory -w 80 -c 95
[MEMORY] Total: 1828 MB - Used: 360 MB - 19% [SWAP] Total: 2047 MB - Used: 0 MB - 0% | MTOTAL=1917046784;;;; MUSED=376434688;;;; MCACHE=119762944;;;; MBUFFER=2158592;;;; STOTAL=2147479552;;;; SUSED=0;;;;
手动检查nagios服务器:
[root@nagios libexec]# ./check_nrpe -H host check_memory
NRPE v4.0.3
如果有人能指出我做错的地方,我将不胜感激。如果需要其他信息,请告诉我。
答案1
检查以哪个用户身份运行NRPE守护进程,然后从该用户下的命令行重新启动脚本。