如何在脚本中停止“tail -f”并在满足特定条件时退出？

2024-6-20 • tag-icon

我正在尝试编写一个用于管理超级计算机上的作业的脚本。细节并不重要，但关键是脚本tail -f一旦出现就会启动到文件。现在这将永远运行，但我想在检测到作业完成后彻底停止它并退出脚本。

不幸的是我被困住了。我尝试了多种解决方案，但它们都没有退出脚本，即使在检测到作业退出后它仍然继续运行。下面的版本似乎是最合乎逻辑的版本，但这个版本也将永远运行。

我应该如何解决这个问题？我对 bash 很熟悉，但还不是很高级。

#!/bin/bash

# get the path to the job script, print help if not passed
jobscr="$1"
[[ -z "$jobscr" ]] && echo "Usage: submit-and-follow [script to submit]" && exit -2

# submit job via SLURM (the job secluder), and get the
# job ID (4-5-digit number) from it's output, exit if failed
jobmsg=$(sbatch "$jobscr")
ret=$?
echo "$jobmsg"
if [ ! $ret -eq 0 ]; then exit $ret; fi
jobid=$(echo "$jobmsg" | cut -d " " -f 4)

# get the stdout and stderr file the job is using, we will log them in another 
# file while we `tail -f` them (this is neccessary due to a file corruption 
# bug in the supercomputer, just assume it makes sense)
outf="$(scontrol show job $jobid | awk -F= '/StdOut=/{print $2}')"
errf="$(scontrol show job $jobid | awk -F= '/StdErr=/{print $2}')"
logf="${outf}.bkp"

# wait for job to start
echo "### Waiting for job $jobid to start..."
until [ -f "$outf" ]; do sleep 5; done


# ~~~~ HERE COMES THE PART IN QUESTION ~~~~ #

# Once it started, start printing the content of stdout and stderr 
# and copy them into the log file
echo "### Job $jobid started, stdout and stderr:"
tail -f -n 100000 $outf $errf | tee $logf &
tail_pid=$! # catch the pid of the child process

# watch for job end (the job id disappears from the queue; consider this 
# detection working), and kill the tail process
while : ; do
    sleep 5
    if [[ -z "$(squeue | grep $jobid)" ]]; then
        echo "### Job $jobid finished!"
        kill -2 $tail_pid
        break
    fi  
done

我还尝试了tail主进程中的另一个版本，并且while循环在子进程中运行，一旦作业结束，它就会杀死主进程，但它没有成功。不管怎样，脚本永远不会终止。

答案1

感谢@Paul_Pedant 的评论，我设法找到了问题。正如我在原始脚本中通过管道传递tail到的那样，包含的 PID ，而不是，因此 only被杀死。后者得到了，但显然不足以阻止它。tee$!teetailtee$SIGPIPE

解决方案在以下答案中：https://stackoverflow.com/a/8048493/5099168

在我的脚本中实现，相关行采用以下形式：

tail -f -n 100000 $outf $errf > >(tee $logf) & 
tail_pid=$!

答案1

相关内容