当脚本在不同会话中同时运行时，从 while 循环中错误退出

2024-6-3 • tag-icon

我有 2 个不同的脚本：A1.sh和A2.sh。它们用于为不同的应用程序启动中间件服务。 ieA1.sh将启动一个应用程序服务，并将A2.sh启动其他应用程序服务。它们在同一主机 (AIX) 上运行。

由于服务需要一些时间（大约 7 到 15 分钟）才能启动，因此我在两个脚本中都有以下函数。它检查日志并等待直到服务启动，或者如果服务在该时间段内未启动，则在 1000 秒后超时。如果按顺序运行这些脚本，则它们可以正常工作。但是，如果我A1.sh在一个会话中运行脚本并打开另一个会话（同一主机）并运行A2.sh脚本，则其中一个脚本会因超时而失败（尽管服务启动并在后台运行）。此超时不正确，即尚未过去 1000 秒。下面是代码

### wait_for_log
### This wait for a goal message on a specified log, if this is't found the message
### for a timeout period trigger a error message on script log.
###
### usage: wait_for_log [ log_name ] [ start | stop ] [ app_name ] [ timeout ] [ goal_message ]
wait_for_log() {

    FILE_NAME=$1
    ACTION=$2
    APP_NAME=$3
    GOAL_MESSAGE=$5
    GOAL_MESSAGE2=$6
    TIMEOUT=$4

    ELAPSED_TIME=0
    START_TIME=$SECONDS

    alert "info" "${ACTION^^} ${APP_NAME^^}"
    alert "info" "Waiting for ${APP_NAME} ${ACTION}." -n

    tail -0lf $FILE_NAME | while read -t $TIMEOUT LOGLINE
    do
            echo -n "."

            if [ ! -z "$GOAL_MESSAGE2" ]; then
                    if [[ "${LOGLINE}" == *$GOAL_MESSAGE2* ]]; then
                            ps -ef | grep "[t]ail " | awk {'print $2'} | xargs kill
                            return 2
                    fi
            fi

            if [[ "${LOGLINE}" == *$GOAL_MESSAGE* ]]; then
                    ps -ef | grep "[t]ail " | awk {'print $2'} | xargs kill
                    return 2
            fi
    done
    EXIT_CODE=$?
    ELAPSED_TIME=$(($SECONDS - $START_TIME))

    if [ $EXIT_CODE -eq 2 ];then
            printf "\e[1;32m[OK]\e[0m\n"
            alert "success" "${APP_NAME} took ${ELAPSED_TIME}s to ${ACTION}."
            GLOBAL_ELAPSED_TIME=$((GLOBAL_ELAPSED_TIME + ELAPSED_TIME))
            RETVAL=0
            return 0
    fi

    printf "\e[1;31m[FAIL]\e[0m\n"
    alert "error" "${APP_NAME} ${ACTION} failure, exceed the ${ELAPSED_TIME}s timeout to ${ACTION}."

    RETVAL=1
    exit_script $ACTION

}

2 个脚本的 FILE_NAME 不同。其中一个脚本失败，如下所示。

<Info>    START RPM
Inside wait for log proc, recieved r2TIMEOUT value: 1000
<Info>    STARTING NODEMANAGER
<Info>    Waiting for NodeManager starting..[FAIL]
<Error>   NodeManager starting failure, exceed the 6s timeout to starting.
<Error>   Ocurred an ERROR when RPM trying to starting.

知道同时运行时 while 循环有什么问题吗？

答案1

我认为，您对“tail”输出的 grepps将是原因。尝试 grep for $FILE_NAME，这样您就不会意外杀死其他脚本尾部进程，而是杀死正确循环的那个进程。

答案1

相关内容