容错 crond 替换

容错 crond 替换

我正在寻找 crond 的替代品,或者可能是它的扩展。

我绝对想要拥有的功能是容错。例如,如果由于计算机在指定时间未开机(例如由于电源故障)而导致作业无法运行,或者任务未能成功运行(即 rc!=0)(例如由于无法访问互联网) ) 那么相关软件应定期重试,直到下一次计划运行,此时它将继续其常规操作;假设这次运行成功。

其他值得拥有的功能:

  • 通过 REST 接口等进行远程控制
  • 更好的日志记录

如果没有这样的软件可用,有人可以给我指出正确的方向,哪个是更好的想法:现有软件的扩展或从头开始编写一些东西?

答案1

我有几项工作需要每天至少运行一次。我所做的是每小时(或更频繁)启动这些作业的脚本,脚本本身通过检查光盘上的状态文件来检查它们是否已经运行。

如果状态文件存在并且是最新的,则脚本退出。

如果该文件太旧(即最后一次写入前一天)或不存在,则脚本将运行并在成功终止时写入状态文件。

如果您无法将此功能构建到现有程序中,则可以简单地制作包装脚本来检查程序是否必须运行,必要时调用该程序,并在成功时(退出值,解析的输出)写入状态文件。


/usr/local/bin/catchup.simple:

#! /usr/bin/env python

"""
first parameter is a path to a file /..../daily/some_name
That is a status/script file and the /daily/ indicates it needs to run at least
once a day (after reboot, after midnight).

The rest of the parameters is the command executed and its parameters.
If there are no more parameters beyond the first the actual status
file is /..../daily/some_name.status and is expected to be updated by calling
the /....daily/some_name script (which has to be executable). That
script doesn't need to know about the frequency and gets called with
the status file as first (and only) argument.

Valid directory names and their functioning:

   /daily/  run once a day (UTC)
   /hourly/ run once an hour

The actual scheduling and frequency to check if running is necessary, is
done using a crontab entry:

CU=/usr/local/bin/catchup.simple
CUD=/root/catchup

# month, hour, day_of_month, month day_of_week command
*/5 * * * * $CU $CUD/daily/getlogs curl ....

If mulitple days (or hours) have gone by, no runs are made for skipped
days.

If subprocess.check_output() fails the status file is not updated.
"""

import sys
import datetime
import subprocess

verbose = False  # set to True to debug

def main():
    if len(sys.argv) < 2:
        print 'not enough parameters for', sys.argv[0]
        return
    if len(sys.argv) == 2:
        status_file_name = sys.argv[1] + '.status'
        cmd = [sys.argv[1]]
    else:
        status_file_name = sys.argv[1]
        cmd = sys.argv[2:]

    freq = sys.argv[1].rsplit('/', 2)[-2]
    if verbose:
        print 'cmd', cmd
        print 'status', status_file_name
        print 'frequency', freq
    try:
        last_status = datetime.datetime.strptime(
            open(status_file_name).read().split('.')[0],
            "%Y-%m-%dT%H:%M:%S",
        )
    except (IOError, ValueError):
        last_status = datetime.datetime(2000, 1, 1)

    now = datetime.datetime.utcnow().replace(microsecond=0)
    if verbose:
        print last_status
        print 'now', now.isoformat()
    if freq == 'daily':
        if last_status.date() < now.date():
            subprocess.check_output(cmd)
        elif verbose:
            print 'already done today'
    elif freq == 'hourly':
        if last_status.date() < now.date() or \
           last_status.date() == now.date() and \
           last_status.hour < now.hour:
           subprocess.check_output(cmd)
        elif verbose:
            print 'already done this hour'

    with open(status_file_name, 'w') as fp:
        fp.write(now.isoformat())

if __name__ == "__main__":
    main()

相关内容