我有一个执行的脚本每一分钟由 Alpine 容器中的 cron 执行。该脚本可能运行很长时间,这就是它自行处理锁定的原因。
刚才我注意到 busyboxcrond
避免在长时间运行的实例的前 10 分钟内执行脚本,只有在这之后才会恢复预期的计划。通过从脚本中记录来验证此行为;crond: user abc: process already running: sync.sh
在预计触发脚本的上述期间,cron(见下文)也会记录此行为。
为什么会这样?其逻辑在哪里记录?
$ crond --help
BusyBox v1.36.1 (2023-07-27 17:12:24 UTC) multi-call binary.
Usage: crond [-fbS] [-l N] [-d N] [-L LOGFILE] [-c DIR]
-f Foreground
-b Background (default)
-S Log to syslog (default)
-l N Set log level. Most verbose 0, default 8
-d N Set log level, log to stderr
-L FILE Log to FILE
-c DIR Cron dir. Default:/var/spool/cron/crontabs
$ cat /etc/crontabs/abc
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
MAILTO=""
# m h dom mon dow command
# ============================================================================
*/1 * * * * sync.sh
Cron 在容器中启动/usr/sbin/crond -f -l 8 -L /dev/stdout -c /etc/crontabs
# from docker host:
$ docker logs my-container
usermod: no changes
crond: crond (busybox 1.36.1) started, log level 8
crond: USER abc pid 27 cmd sync.sh
crond: user abc: process already running: sync.sh
crond: USER root pid 158 cmd run-parts /etc/periodic/15min
crond: user abc: process already running: sync.sh
crond: user abc: process already running: sync.sh
crond: user abc: process already running: sync.sh
crond: user abc: process already running: sync.sh
crond: user abc: process already running: sync.sh
crond: user abc: process already running: sync.sh
crond: user abc: process already running: sync.sh
crond: user abc: process already running: sync.sh
/ around here 10 minute mark is hit, and crond starts invoking sync.sh again /
crond: USER abc pid 193 cmd sync.sh
sendmail: can't connect to remote host (127.0.0.1): Connection refused
crond: USER abc pid 219 cmd sync.sh
sendmail: can't connect to remote host (127.0.0.1): Connection refused
crond: USER abc pid 245 cmd sync.sh
sendmail: can't connect to remote host (127.0.0.1): Connection refused
上述脚本的精简版复制版:
#!/usr/bin/env bash
#####################################
readonly SELF="${0##*/}"
DIR="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd)"
JOB_ID="test-$$"
LOG="/config/${SELF}.log"
STATE_FILE=/tmp/test.state
#####################################
_prepare_locking() { eval "exec 9>\"/tmp/test.lock\""; }
exlock_now() { flock -xn 9; }
_log() {
local lvl msg
readonly lvl="$1"
readonly msg="$2"
echo -e "[$(date '+%F %T')] [$JOB_ID]\t$lvl $msg" | tee -a "$LOG" >&2
return 0
}
info() {
_log INFO "$*"
}
update_statefile() {
local time time_d
_write_state() {
echo -n "$time" > "$STATE_FILE"
}
time="$(date +%s)"
if [[ -s "$STATE_FILE" ]]; then
time_d="$((time - $(cat -- "$STATE_FILE")))"
if [[ "$time_d" -ge 600 ]]; then
info "$SELF has been running for at least ${time_d}s..."
exit 1
fi
else
_write_state
fi
info 'unable to obtain lock, process is already running'
exit 0 # exit, not return
}
#### ENTRY ####
_prepare_locking || exit 1
exlock_now || update_statefile
info "$SELF starting up..."
sleep 20m # block
exit 0
编辑: 到目前为止只能得出这样的结论:这是 busybox 的 cron 异常——此评论在相关的线程中似乎证实了这一点。
在 crontab 中明确使用后台进程&
可以消除此现象。但缺点是,通过以下方式终止进程组kill -- -pid
不再有效:
container-hostname:/# ps -ef | grep sync.sh
34 abc 0:00 bash /usr/local/sbin/sync.sh
1633 root 0:00 grep sync.sh
container-hostname:/# kill -9 -- -34
bash: kill: (-34) - No such process
答案1
答案2
我可以确认此行为OpenWrt 23.05.3。
要验证是否使用了此 crontab:
* * * * * sleep 48 ; echo 48
* * * * * sleep 49 ; echo 49
* * * * * sleep 51 ; echo 51
使用时logread -f
你大概会看到后面的命令可能只会每隔一分钟出现一次...