Monit:如何最佳地监控 URL

Monit:如何最佳地监控 URL

我的 Web 服务器运行的是 nginx 和 php5-fpm。如果出现问题,通常 php5-fpm 会挂断,导致“网关错误”服务器错误。当然,我不知道 nginx 是否有一天也会崩溃。

当发生某些事情时,两个进程(及其线程)通常都存在并且需要重新启动。我对当前问题的原因不太感兴趣,但想重新启动这两个进程。为此,我创建了两个 bash 脚本 /etc/monit/webserver.start.sh 和 /etc/monit/webserver.stop.sh。

这是我的 monit 配置文件(在 conf.d 中):

check process webserver with pidfile /var/run/nginx.pid
   start program = "/etc/monit/webserver.start.sh"
   stop program  = "/etc/monit/webserver.stop.sh"
   if failed (url https://www.myserver.com/example/ and content == 'test string' and timeout 20 seconds)
     then alert
   if failed (url https://www.myserver.com/example/ and content == 'test string' and timeout 20 seconds)
     for 2 cycles
     then restart
   if failed (url https://www.myserver.com/example/ and content == 'test string' and timeout 20 seconds)
     for 4 cycles
     then exec "/sbin/reboot"

这并非完全错误,但仍存在一些问题:

  1. 实际上,我并不想监控nginx这里的进程,而是监控端口/URL。我可以使用其他检查来代替check process吗?
  2. 要在 1 次失败、2 次失败和 4 次失败后执行不同的操作,我需要三个if failed条件,从而产生三个服务器请求。有没有办法在每个周期运行一个请求并在不同次数的失败后执行不同的活动?

我尝试从官方 monit 参考资料中寻找答案,但显然,我不明白该来源中描述的可能性。因此,我非常感谢您的建议。

更新

在花了一些时间阅读 monit 手册页后(我认为它比在线手册结构更好),我发现了这种优化:

CHECK HOST webserver WITH ADDRESS 127.0.0.1
  START PROGRAM = "/etc/monit/webserver.start.sh"
  STOP PROGRAM  = "/etc/monit/webserver.stop.sh"
  IF NOT EXIST THEN ALERT
  IF FAILED (url https://www.mydomain.tld/example/ and content == 'test content' and timeout 20 seconds)
    FOR 2 CYCLES
    THEN RESTART
  IF 2 RESTARTS WITHIN 5 CYCLES
    THEN EXEC "/sbin/reboot"

此修改不包括第一次 URL 失败时的警报(这里的解决方法是使用虚拟的启动/停止命令),但可以在 2 次失败后重新启动,并且可以在 4 次失败后重新启动 - 仅需一个服务器请求。

它仍然不完美。如果有人知道如何做得更好,仍然很感激您的建议 :) 谢谢!

更新

经过一些测试后,我不要IF 2 REsTARTS WITHIN...建议对二阶操作使用 monit 的超时功能 ( )。在某些情况下,似乎在重启后会重新运行超时操作。就我而言,这导致了多次重启:

[CET Dec 28 05:59:50] error    : skipping queued event /var/monit/id - unknown data format
[CET Dec 28 05:59:50] error    : skipping queued event /var/monit/state - unknown data format
[CET Dec 30 03:10:52] error    : 'webserver' failed protocol test [HTTP] at INET[www.myserver.com/example/] via TCPSSL -- HTTP: Error receiving data -- Resource temporarily unavailable
[CET Jan  1 03:08:10] error    : 'webserver' failed protocol test [HTTP] at INET[www.myserver.com/example/] via TCPSSL -- HTTP: Error receiving data -- Resource temporarily unavailable
[CET Jan  1 03:09:30] error    : 'webserver' failed protocol test [HTTP] at INET[www.myserver.com/example/] via TCPSSL -- HTTP: Error receiving data -- Resource temporarily unavailable
[CET Jan  1 03:09:31] info     : 'webserver' trying to restart
[CET Jan  1 03:09:31] info     : 'webserver' stop: /etc/monit/webserver.stop.sh
[CET Jan  1 03:09:31] info     : 'webserver' start: /etc/monit/webserver.start.sh
[CET Jan  1 03:10:31] error    : 'webserver' failed, cannot open a connection to INET[www.myserver.com/example/] via TCPSSL
[CET Jan  1 03:10:31] info     : 'webserver' trying to restart
[CET Jan  1 03:10:31] info     : 'webserver' stop: /etc/monit/webserver.stop.sh
[CET Jan  1 03:10:31] info     : 'webserver' start: /etc/monit/webserver.start.sh
[CET Jan  1 03:10:31] error    : 'php-fpm' process is not running
[CET Jan  1 03:10:31] info     : 'php-fpm' trying to restart
[CET Jan  1 03:10:31] info     : 'php-fpm' start: /usr/sbin/service
[CET Jan  1 03:10:31] error    : 'nginx' process is not running
[CET Jan  1 03:10:31] info     : 'nginx' trying to restart
[CET Jan  1 03:10:31] info     : 'nginx' start: /usr/sbin/service
[CET Jan  1 03:11:32] error    : 'webserver' service restarted 2 times within 2 cycles(s) - exec
[CET Jan  1 03:11:32] info     : 'webserver' exec: /sbin/reboot
[CET Jan  1 03:12:24] info     : Starting monit daemon with http interface at [0.0.0.0:2812]
[CET Jan  1 03:12:24] info     : Monit start delay set -- pause for 240s
[CET Jan  1 03:16:24] info     : Starting monit HTTP server at [0.0.0.0:2812]
[CET Jan  1 03:16:24] info     : monit HTTP server started
[CET Jan  1 03:16:24] info     : 'Memory' Monit started
[CET Jan  1 03:16:24] error    : skipping queued event /var/monit/id - unknown data format
[CET Jan  1 03:16:24] error    : skipping queued event /var/monit/state - unknown data format
[CET Jan  1 03:16:24] error    : 'webserver' service restarted 2 times within 2 cycles(s) - exec
[CET Jan  1 03:16:24] info     : 'webserver' exec: /sbin/reboot
[CET Jan  1 03:17:04] info     : Starting monit daemon with http interface at [0.0.0.0:2812]
[CET Jan  1 03:17:04] info     : Monit start delay set -- pause for 240s
[CET Jan  1 03:21:04] info     : Starting monit HTTP server at [0.0.0.0:2812]
[CET Jan  1 03:21:04] info     : monit HTTP server started
[CET Jan  1 03:21:04] info     : 'Memory' Monit started
[CET Jan  1 03:21:04] error    : skipping queued event /var/monit/id - unknown data format
[CET Jan  1 03:21:04] error    : skipping queued event /var/monit/state - unknown data format
[CET Jan  1 03:21:04] error    : 'webserver' service restarted 2 times within 2 cycles(s) - exec
[CET Jan  1 03:21:04] info     : 'webserver' exec: /sbin/reboot
[CET Jan  1 03:21:44] info     : Starting monit daemon with http interface at [0.0.0.0:2812]
[CET Jan  1 03:21:44] info     : Monit start delay set -- pause for 240s
[CET Jan  1 03:25:44] info     : Starting monit HTTP server at [0.0.0.0:2812]
[CET Jan  1 03:25:44] info     : monit HTTP server started
[CET Jan  1 03:25:44] info     : 'Memory' Monit started
[CET Jan  1 03:25:44] error    : skipping queued event /var/monit/id - unknown data format
[CET Jan  1 03:25:44] error    : skipping queued event /var/monit/state - unknown data format
[CET Jan  1 03:25:44] error    : 'webserver' service restarted 2 times within 2 cycles(s) - exec
[CET Jan  1 03:25:44] info     : 'webserver' exec: /sbin/reboot

除非有人有好主意,否则我将改回多个请求。最后,它们并不那么耗时……

燃烧的利奥

答案1

我这里不想监控 nginx 进程,而是监控端口/URL。我可以使用其他检查来代替检查进程吗?

您可以使用主机检查,这是来自 monit 站点的一个示例:

check host mmonit.com with address mmonit.com 
    if failed
        port 80 protocol http
        with http headers [Host: mmonit.com, Cache-Control: no-cache, Cookie: csrftoken=nj1bI3CnMCaiNv4beqo8ZaCfAQQvpgLH]
        and request /monit/ with content = "Monit [0-9.]+"
    then alert

要在 1 次失败、2 次失败和 4 次失败后执行不同的操作,我需要三个 if failed 条件,从而产生三个服务器请求。有没有办法在每个周期运行一个请求并在不同次数的失败后执行不同的活动?

EXEC 可用于执行任意程序并发送警报。如果选择此操作,则必须声明要执行的程序,如果程序需要参数,则必须将程序及其参数括在引号字符串中。您可以选择指定执行程序在启动时应切换到的 uid 和 gid。例如:

exec "/usr/local/tomcat/bin/startup.sh"
    as uid nobody and gid nobody

相关内容