我创建了一个使用进程调用循环处理文件的脚本。我检查该调用的退出代码,看看是否应该移动文件(成功时)。问题是,当进程因异常而失败时,它永远不会退出。我该如何检测发生的异常,以便让脚本继续处理下一个文件?
脚本的相关部分
# Stream data
sstableloader -d $3 $tablepathfull
# On success, move data to target dir
if [[ $? != 0 ]]; then
echo "Error: Table failed - $tablepathfull"
else
echo "Table OK - $tablepathfull"
trgtdir="$2/$hostname/$keyspacename/$typename/$timestamp/$keyspacename/$tablename"
mkdir -p $trgtdir
mv $tablepathfull/* $trgtdir
rmdir $tablepathfull
fi
如果没有“官方”方式,是否有可能捕获进程调用的输出(见下文),并在发生异常时简单地终止该进程?
异常输出
Exception in thread "STREAM-OUT-/XX.XX.XXX.88" Exception in thread "STREAM-OUT-/XX.XX.XXX.92" java.lang.NullPointerException
at org.apache.cassandra.streaming.ConnectionHandler$MessageHandler.signalCloseDone(ConnectionHandler.java:249)
at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:375)
at java.lang.Thread.run(Thread.java:744)
java.lang.NullPointerException
at org.apache.cassandra.streaming.ConnectionHandler$MessageHandler.signalCloseDone(ConnectionHandler.java:249)
at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:375)
at java.lang.Thread.run(Thread.java:744)
答案1
我能想到的唯一解决方法是使用子流程和文件:
TEMP_FILE='/tmp/some_file.txt'
function load_table() {
if [ $# -lt 2 ]; then
printf "1" > "${TEMP_FILE}"
return 1
fi
local param1="$1"
local table_full_path="$2"
local exit_code
# Stream data
sstableloader -d "${param1}" "${table_full_path}" >> "${TEMP_FILE}"
exit_code=$?
printf "\n%s" "${exit_code}" >> "${TEMP_FILE}"
}
function is_process_running() {
if [ $# -eq 0 ]; then
return 1
fi
local process_id="$1"
ps aux | sed -r 's/[ ]+/ /g' | cut -d' ' -f2 | grep -q "${process_id}"
return $?
}
function exceptions_count() {
local count=$(tail -10 "${TEMP_FILE}" | grep -c "Exception")
return $count
}
…
load_table "$3" "${tablepathfull}" &
# Given you have one subprocess only.. get the pid of the first subprocess in the list
job_pids=( $(jobs -p) )
load_table_job_pid=${job_pids[0]}
while is_process_running "${load_table_job_pid}" && exceptions_count -eq 0; do
sleep 5
done
exit_code=0
if is_process_running "${load_table_job_pid}"; then
local load_table_job_gid=$(ps x -o "%p %r %y %x %c " | sed -r -e 's/[ ]+/ /g' -e 's/^[ ]+//g' | grep -E "^${load_table_job_pid} " | cut -d' ' -f2)
kill -TERM -$load_table_job_gid >/dev/null 2>&1
exit_code=1
else
exit_code=$(tail -1 "${TEMP_FILE}")
fi
rm -f "${TEMP_FILE}"
# Your code
# On success, move data to target dir
if [ $exit_code -ne 0 ]; then
echo "Error: Table failed - $tablepathfull"
else
echo "Table OK - $tablepathfull"
trgtdir="$2/$hostname/$keyspacename/$typename/$timestamp/$keyspacename/$tablename"
mkdir -p $trgtdir
mv $tablepathfull/* $trgtdir
rmdir $tablepathfull
fi
您可以通过添加重试计数或其他方式来改进代码。