大家早上好,
我经常会收到这些监控警报,大约每天一次或两次:
Connection failed Service amavisd
Date: Wed, 20 Jul 2022 09:04:58
Action: restart
Host: (hidden).com
Description: failed protocol test [SMTP] at [localhost]:10024 [TCP/IP] -- Error receiving data from the mailserver -- Resource temporarily unavailable
Your faithful employee,
Monit
大约 20 秒后,它又恢复运行,并发送以下电子邮件警报:
Connection succeeded Service amavisd
Date: Wed, 20 Jul 2022 09:07:03
Action: alert
Host: (hidden).com
Description: connection succeeded to [localhost]:10024 [TCP/IP]
Your faithful employee,
Monit
对于我来说,噪音有点太大了,因为我每天都会收到很多电子邮件。这个配置可以改进一下吗,让它在重试几次后才提醒我?或者反过来,调查一下邮件服务器返回了什么错误?
这是当前的 Monit 配置:
check process amavisd with pidfile /var/run/amavis/amavisd.pid
group mail
start program = "/etc/init.d/amavis start"
stop program = "/etc/init.d/amavis stop"
if failed port 10024 protocol smtp then restart
if 3 restarts within 3 cycles then alert
if 6 restarts within 6 cycles then timeout
depends on amavisd_bin
depends on amavisd_rc
check file amavisd_bin with path /usr/sbin/amavisd-new
group mail
if failed checksum then unmonitor
if failed permission 755 then unmonitor
if failed uid root then unmonitor
if failed gid root then unmonitor
check file amavisd_rc with path /etc/init.d/amavis
group mail
if failed checksum then unmonitor
if failed permission 755 then unmonitor
if failed uid root then unmonitor
if failed gid root then unmonitor
你能发现问题吗?
谢谢,M。
答案1
在 Monit 中,您可以在规则中对其进行配置,以避免误报或在多次失败后,例如
if failed port 10024 for 3 times within 5 cycles then alert
更多详细信息请参阅文档https://mmonit.com/monit/documentation/monit.html#FAULT-TOLERANCE