自动重启失败的后台进程的最佳实践？

2024-6-9 • tag-icon

我有一些软件（consul）在后台作为代理运行，我想确保它始终运行。这是我用来启动代理的命令：

nohup consul agent -server -bootstrap-expect 1 \
-data-dir /tmp/consul \
-bind=$(hostname -i) \
-client=0.0.0.0 \
-node=$(hostname) \
-config-dir /etc/consul.d \
-ui-dir /opt/consul/ &

现在我有一个运行的检查/etc/rc.local：

#!/bin/sh -e
while true; do
    if [ -z "$(ps aux | grep "consul agent" | grep -v grep)" ]; then 
        sh /etc/rc.local2; 
    fi;
    sleep 3;
done
exit 0

如果领事代理停止，则/etc/rc.local2启动并运行：

#!/bin/sh -e
nohup consul agent -server -bootstrap-expect 1 \
-data-dir /tmp/consul \
-bind=$(hostname -i) \
-client=0.0.0.0 \
-node=$(hostname) \
-config-dir /etc/consul.d \
-ui-dir /opt/consul/ &

exit 0

该系统可以工作，问题是它需要我提供的服务器（所有运行各种其他形式的 consul 服务器和客户端）重新启动才能真正生效。即使我运行sudo nohup /etc/rc.local &consul 仍然会偶尔失败。

这只是我创建的一个解决方案，但我肯定知道这不是我能使用的最佳解决方案。检查并确保此进程始终运行的最佳方法是什么？

相关内容