从后台进程获取退出代码的可靠方法,同时监视并在必要时终止它

从后台进程获取退出代码的可靠方法,同时监视并在必要时终止它

我想出了一个我认为可以做到这一点的设置,但它不起作用:

#!/bin/bash

echo "Launching a background process that may take hours to finish.."
myprog &
pid=$!
retval=
##At this time pid should hold the process id of myprog
echo "pid=${pid}"

{
    ##check if the process is still running
    psl=$(ps -f -p ${pid} | grep -E "\bmyprog\b")
    killit=
    while [[ ! -z ${psl} ]]
    do
        ##if a file named "kill_flag" is detected, kill the process
        if [[ -e "kill_flag" ]]
        then
            killit=YES
            break
        fi
        #check every 3 seconds
        sleep 3
        psl=$(ps -f -p ${pid} | grep -E "\bmyprog\b")
    done

    ##killit not set, normal exit, read from fd5
    if [[ -z ${killit} ]]
    then
        read <&5 retval
  else
    ##kill here, the wait will return and the sub process ends
    kill ${pid}
  fi

} 5< <( wait ${pid} > /dev/null 2>&1; echo $? )

echo "retval=$retval"

第一次运行似乎一切都很好,我可以通过 终止该进程touch kill_flag,否则它会等到 myprog 正常完成。但后来我注意到我总是在 retval 中得到-1。 myprog 返回 0,这是正常运行所确认的。进一步调查表明,“ echo $?”部分是在脚本启动后立即执行的,而不是在 wait 命令退出后执行的。我想知道这是怎么回事。我对 bash 还很陌生。

答案1

wait只能对当前 shell 进程的子进程起作用。解释内部代码的子 shell<(...)不能等待姊妹进程。

等待必须由启动 pid 的同一个 shell 进程来完成。用zsh而不是bash(这里假设没有其他后台作业运行):

cmd & pid=$!
while (($#jobstates)) {
  [[ -e killfile ]] && kill $pid
  sleep 3
}
wait $pid; echo $?

答案2

找出一个可行的版本:

#!/bin/bash
export retval=
##At this time pid should hold the process id of myprog
{
    ##This is the subshell that launched and monitoring myprog
    subsh=$!

    ##Since myprog is probably the only child process of this subsh, it should be pretty safe
    pid=$(ps -f --ppid ${subsh} | grep -E "\bmyprog\b" | gawk '{print $2}' )
    ##check if the process is still running
    psl=$(ps -f -p ${pid} | grep -E "\bmyprog\b")
    killit=
    while [[ ! -z ${psl} ]]
    do
        ##if a file named "kill_flag" is detected, kill the process
        if [[ -e "kill_flag" ]]
        then
            killit=YES
            break
        fi
        #check every 3 seconds
        sleep 3
        psl=$(ps -f -p ${pid} | grep -E "\bmyprog\b")
    done

    ##killit not set, normal exit, read from fd5
    if [[ -z ${killit} ]]
    then
        read <&5 retval
  else
    ##kill here, the wait will return and the sub process ends
    kill ${pid}
  fi

} 5< <( myprog >>logfile 2>&1; echo $? )

echo "retval=$retval"

唯一烦人的是,当我用信号量杀死 myprog 时,由于进程替换已死,会出现错误,但它很容易被捕获。

相关内容