我想出了一个我认为可以做到这一点的设置,但它不起作用:
#!/bin/bash
echo "Launching a background process that may take hours to finish.."
myprog &
pid=$!
retval=
##At this time pid should hold the process id of myprog
echo "pid=${pid}"
{
##check if the process is still running
psl=$(ps -f -p ${pid} | grep -E "\bmyprog\b")
killit=
while [[ ! -z ${psl} ]]
do
##if a file named "kill_flag" is detected, kill the process
if [[ -e "kill_flag" ]]
then
killit=YES
break
fi
#check every 3 seconds
sleep 3
psl=$(ps -f -p ${pid} | grep -E "\bmyprog\b")
done
##killit not set, normal exit, read from fd5
if [[ -z ${killit} ]]
then
read <&5 retval
else
##kill here, the wait will return and the sub process ends
kill ${pid}
fi
} 5< <( wait ${pid} > /dev/null 2>&1; echo $? )
echo "retval=$retval"
第一次运行似乎一切都很好,我可以通过 终止该进程touch kill_flag
,否则它会等到 myprog 正常完成。但后来我注意到我总是在 retval 中得到-1。 myprog 返回 0,这是正常运行所确认的。进一步调查表明,“ echo $?
”部分是在脚本启动后立即执行的,而不是在 wait 命令退出后执行的。我想知道这是怎么回事。我对 bash 还很陌生。
答案1
wait
只能对当前 shell 进程的子进程起作用。解释内部代码的子 shell<(...)
不能等待姊妹进程。
等待必须由启动 pid 的同一个 shell 进程来完成。用zsh
而不是bash
(这里假设没有其他后台作业运行):
cmd & pid=$!
while (($#jobstates)) {
[[ -e killfile ]] && kill $pid
sleep 3
}
wait $pid; echo $?
答案2
找出一个可行的版本:
#!/bin/bash
export retval=
##At this time pid should hold the process id of myprog
{
##This is the subshell that launched and monitoring myprog
subsh=$!
##Since myprog is probably the only child process of this subsh, it should be pretty safe
pid=$(ps -f --ppid ${subsh} | grep -E "\bmyprog\b" | gawk '{print $2}' )
##check if the process is still running
psl=$(ps -f -p ${pid} | grep -E "\bmyprog\b")
killit=
while [[ ! -z ${psl} ]]
do
##if a file named "kill_flag" is detected, kill the process
if [[ -e "kill_flag" ]]
then
killit=YES
break
fi
#check every 3 seconds
sleep 3
psl=$(ps -f -p ${pid} | grep -E "\bmyprog\b")
done
##killit not set, normal exit, read from fd5
if [[ -z ${killit} ]]
then
read <&5 retval
else
##kill here, the wait will return and the sub process ends
kill ${pid}
fi
} 5< <( myprog >>logfile 2>&1; echo $? )
echo "retval=$retval"
唯一烦人的是,当我用信号量杀死 myprog 时,由于进程替换已死,会出现错误,但它很容易被捕获。