Postfix 在 /etc/resolv.conf 准备好之前启动，并且无法解析 DNS

2024-6-2 • tag-icon

Postfix 在 /etc/resolv.conf 准备好之前启动，并且无法解析 DNS

我已经将 Postfix 配置为将所有电子邮件发送到智能主机，它可以连续工作数周而不会出现问题。但它时不时会停止工作并在日志中显示如下消息（前两行表示最后一次成功发送，后两行表示第一条发送失败的消息）：

Nov 24 20:05:30 nextcloud postfix/smtp[443568]: 3882C1B5D8A: to=<xxxxxx>, relay=relay.grnet.gr[83.212.2.185]:587, delay=2.3, delays=1.3/0/0.02/1, dsn=2.0.0, status=sent (250 OK id=1oyGbJ-0001PM-H2)
Nov 24 20:05:30 nextcloud postfix/qmgr[193834]: 3882C1B5D8A: removed
Nov 24 20:44:43 nextcloud postfix/postfix-script[1563]: warning: symlink leaves directory: /etc/postfix/./makedefs.out
Nov 24 20:44:46 nextcloud postfix/postfix-script[1751]: warning: /var/spool/postfix/etc/resolv.conf and /etc/resolv.conf differ
Nov 24 20:44:46 nextcloud postfix/postfix-script[1772]: starting the Postfix mail system
Nov 24 20:44:46 nextcloud postfix/master[1774]: daemon started -- version 3.4.13, configuration /etc/postfix
Nov 24 21:05:19 nextcloud postfix/smtpd[4252]: warning: dict_nis_init: NIS domain name not set - NIS lookups disabled
Nov 24 21:05:19 nextcloud postfix/smtpd[4252]: connect from localhost[127.0.0.1]
Nov 24 21:05:19 nextcloud postfix/smtpd[4252]: E76A51B5819: client=localhost[127.0.0.1]
Nov 24 21:05:20 nextcloud postfix/cleanup[4257]: E76A51B5819: message-id=<[email protected]>
Nov 24 21:05:20 nextcloud postfix/smtpd[4252]: disconnect from localhost[127.0.0.1] ehlo=1 mail=1 rcpt=1 data=1 quit=1 commands=5
Nov 24 21:05:20 nextcloud postfix/qmgr[1776]: E76A51B5819: from=<xxxxxx>, size=37076, nrcpt=1 (queue active)
Nov 24 21:05:21 nextcloud postfix/smtp[4258]: E76A51B5819: to=<xxxxxx>, relay=none, delay=0.36, delays=0.33/0.03/0/0, dsn=4.4.3, status=deferred (Host or domain name not found. Name service error for name=relay.grnet.gr type=MX: Host not found, try again)

随后，它一直处于这种状态（有一次它卡住了大约三周，直到我发现）；所有消息都失败并保留在队列中。当我重新启动它时，它又可以正常工作，发送所有延迟的消息，并正常工作，直到下次出现问题。

我如何配置它以使其具有弹性并且不需要重新启动？

该问题最初发生在 Ubuntu 20.04 上的 Postfix 3.4.13 中，在 22.04 上升级到 Postfix 3.6.4-1ubuntu1.3 的机器上仍然发生这种情况。以下是/etc/postfix/main.cf：

myorigin = /etc/mailname
biff = no
compatibility_level = 3
relayhost = relay.grnet.gr:587
mynetworks = 127.0.0.0/8 [::ffff:127.0.0.0]/104 [::1]/128
inet_interfaces = loopback-only
virtual_alias_maps = hash:/etc/postfix/virtual
masquerade_domains = grnet.gr

编辑 2024-03-07

事实证明我有些地方搞错了。问题实际上是resolv.conf机器重启时没有正确复制。

Postfix 在 chroot 下工作，因此它实际上不是使用/etc/resolv.conf，而是/var/spool/postfix/etc/resolv.conf。Postfix 附带一个脚本，可执行一些管理任务，包括将系统复制resolv.conf到 chroot，并在 Postfix 启动时运行。

我没有注意到的是，问题发生在机器重启后（我不确定它是否总是发生）。显然，复制的 Postfix 脚本resolv.conf在系统有机会正确设置之前运行。随后重新启动 Postfix 解决了该问题。

有几份报告称这种情况发生，原因各不相同。目前我还不知道这种情况的原因。

编辑 2024-03-10

我提高了syslog时间戳的精度，并重启了系统。结果发现，postfix 启动脚本/etc/resolv.conf在准备好之前大约半秒钟就复制完毕了：

lrwxrwxrwx 1 root root  29 2018-08-24 11:37:47.299687838 +0300 /etc/resolv.conf -> ../run/resolvconf/resolv.conf
-rw-r--r-- 1 root root 328 2024-03-10 13:05:17.268000000 +0200 /run/resolvconf/resolv.conf
-rw-r--r-- 1 root root   0 2024-03-10 13:05:16.756000000 +0200 /var/spool/postfix/etc/resolv.conf

Postfix 在日志中的第一个踪迹要晚得多：

2024-03-10T13:05:21.516151+02:00 nextcloud postfix[1418]: Postfix is running with backwards-compatible default settings
...
2024-03-10T13:05:38.486795+02:00 nextcloud postfix/postfix-script[1832]: warning: /var/spool/postfix/etc/resolv.conf and /etc/resolv.conf differ

这台机器使用 dhcp 获取其 ip 地址，有趣的是，这似乎发生在/etc/resolv.conf(13:05:17.44) 之后。这可能无关紧要，因为resolv.conf指示systemd-resolved在上使用localhost:53。

答案1

您指定中继主机的方式使其对指定名称进行 MX 查找并使用那名称来查找要连接到 587 的服务器地址。这就是为什么它显示“MX：未找到主机，请重试”。

如果relay.grnet.gr是 MX 服务器的文字名称（而不是存在指定 MX 服务器的 MX 记录的域名），则将其放入方括号中以抑制此 MX 查找并使用 A 记录：

relayhost = [relay.grnet.gr]:587

另请关注man 5 postconf

编辑 2024-03-07

编辑 2024-03-10

答案1

相关内容