脚本执行速度慢

2024-5-24 • tag-icon

我有一个奇怪的问题...

我的脚本：

#!/bin/bash
SECONDS=0

lock_file=/tmp/$0.lock

[ -e $lock_file ] && exit

touch $lock_file

echo "<<<db2_hadr_check>>>"
instances=( db2inst1 db2inst2 )

plog() {
        lfile=/tmp/$0.log

        echo "[`date +%T`] $1" >> $lfile
}

for instance in "${instances[@]}"; do
        plog "get dbname for instance $instance"
        name=$(su - $instance -c 'echo $dbname')
        plog "get hadr info for instance $instance"
        arr=($(su - $instance -c 'db2pd -db $dbname -hadr | egrep -i "HADR_CONNECT_STATUS |HADR_LOG_GAP" | cut -d "=" -f 2'))
        plog "check result for instance $instance"

        s=${arr[0]}
        lg=${arr[1]}

        [ -z $s ] && s="DISCONNECTED"
        [ -z $lg ] && lg=0

        plog "return result for instance $instance"

        #dbname hadr_conn_status log_gap
        echo "$name $s $lg"
done
plog "Execution time: $SECONDS"
plog "############################################################"
rm $lock_file

我在我的测试服务器（Fedora 版本 21）上手动运行这个脚本...

从我的手动运行记录：

[11:46:36] get dbname for instance db2inst1
[11:46:37] get hadr info for instance db2inst1
[11:46:37] check result for instance db2inst1
[11:46:37] return result for instance db2inst1
[11:46:37] get dbname for instance db2inst2
[11:46:38] get hadr info for instance db2inst2
[11:46:39] check result for instance db2inst2
[11:46:39] return result for instance db2inst2
[11:46:39] Execution time: 3
[11:46:39] ############################################################

之后，我在我的监控服务器（nagios + check_mk）中添加了此检查。如果我运行 check_mk_agent （然后调用我的脚本），我会得到以下执行时间：

real    0m2.923s
user    0m0.755s
sys     0m1.982s

还行吧。

这是我的问题...

使用 check_mk 服务器执行记录：

[11:48:38] get dbname for instance db2inst1
[11:49:04] get hadr info for instance db2inst1
[11:49:30] check result for instance db2inst1
[11:49:30] return result for instance db2inst1
[11:49:30] get dbname for instance db2inst2
[11:49:55] get hadr info for instance db2inst2
[11:50:21] check result for instance db2inst2
[11:50:21] return result for instance db2inst2
[11:50:21] Execution time: 103
[11:50:21] ############################################################

当从 check_mk (nagios) 调用此脚本时，它会运行103秒！

有人可以解释一下吗？

附加信息#1：

check_mk 服务器使用 xinetd 在主机上执行 check_mk_agent

配置文件（/etc/xinetd.d/check_mk）：

service check_mk
{
        type           = UNLISTED
        port           = 6556
        socket_type    = stream
        protocol       = tcp
        wait           = no
        user           = root
        server         = /usr/bin/check_mk_agent

        log_on_success =

        disable        = no
}

/etc/pam.d/su：

#%PAM-1.0
auth            sufficient      pam_rootok.so
# Uncomment the following line to implicitly trust users in the "wheel" group.
#auth           sufficient      pam_wheel.so trust use_uid
# Uncomment the following line to require a user to be in the "wheel" group.
#auth           required        pam_wheel.so use_uid
auth            substack        system-auth
auth            include         postlogin
account         sufficient      pam_succeed_if.so uid = 0 use_uid quiet
account         include         system-auth
password        include         system-auth
session         include         system-auth
session         include         postlogin
session         optional        pam_xauth.so

附加信息#2：

在@Lambert 评论后我这样做了：

在 /etc/hosts 中的 nagios 服务器上，我添加了 db2 服务器 IP 在 db2 服务器 /etc/hosts 上，我添加了 nagios 服务器 IP

我又尝试了两件事：

在纳吉奥斯服务器：

命令：

time ssh root@db2server /usr/bin/check_mk_agent

结果：

...

real    0m5.917s
user    0m0.025s
sys     0m0.028s

命令：

time telnet db2server 6556

结果：

...

real    0m51.859s
user    0m0.005s
sys     0m0.014s

我运行了多次，结果相同......

在 db2server 上/var/日志/消息我找到了这个：

Oct 30 14:48:55 db2server su: (to db2inst1) root on pts/0 <- 1. command
Oct 30 14:50:59 db2server su: (to db2inst1) root on none  <- 2. command

附加信息#3：

命令：

time ssh root@db2server telnet localhost 6556

结果：

real    0m7.510s
user    0m0.029s
sys     0m0.034s

有任何想法吗？

相关内容