我的 NTP 服务器工作了几个小时,然后就停止工作了,并且所有主机都显示“reach: 0”,如下所示:
remote refid st t when poll reach delay offset jitter
==============================================================================
64-250-105-227. .PPS. 1 u 9h 1024 0 66.644 5.476 0.000
如果我重新启动ntpd
,它们会再次正常工作大约 8 个小时,但最终会恢复到这样的状态。 tcpdump
显示它们仍在正常发送和接收数据包(路由有点奇怪,因为我们的 ISP 阻止了 NTP 流量,但我们有另一种解决方法,即使用一些基于策略的路由和运行 OpenVPN 的访客):
12:05:43.513183 IP (tos 0xc0, ttl 64, id 57760, offset 0, flags [DF], proto UDP (17), length 76)
pvelocalhost.ntp > 64-250-105-227.ethoplex.com.ntp: [bad udp cksum 0x40e6 -> 0x6cec!] NTPv4, length 48
Client, Leap indicator: (0), Stratum 2 (secondary reference), poll 10 (1024s), precision -23
Root Delay: 0.066635, Root dispersion: 0.601440, Reference-ID: 64-250-105-227.ethoplex.com
Reference Timestamp: 3696656842.987997412 (2017/02/21 03:07:22)
Originator Timestamp: 3696656843.552259385 (2017/02/21 03:07:23)
Receive Timestamp: 3696656843.580105364 (2017/02/21 03:07:23)
Transmit Timestamp: 3696689143.513155341 (2017/02/21 12:05:43)
Originator - Receive Timestamp: +0.027845976
Originator - Transmit Timestamp: +32299.960896015
12:05:43.513708 IP (tos 0xc0, ttl 63, id 57760, offset 0, flags [DF], proto UDP (17), length 76)
gateway.example.com.ntp > 64-250-105-227.ethoplex.com.ntp: [udp sum ok] NTPv4, length 48
Client, Leap indicator: (0), Stratum 2 (secondary reference), poll 10 (1024s), precision -23
Root Delay: 0.066635, Root dispersion: 0.601440, Reference-ID: 64-250-105-227.ethoplex.com
Reference Timestamp: 3696656842.987997412 (2017/02/21 03:07:22)
Originator Timestamp: 3696656843.552259385 (2017/02/21 03:07:23)
Receive Timestamp: 3696656843.580105364 (2017/02/21 03:07:23)
Transmit Timestamp: 3696689143.513155341 (2017/02/21 12:05:43)
Originator - Receive Timestamp: +0.027845976
Originator - Transmit Timestamp: +32299.960896015
12:05:43.573035 IP (tos 0x8, ttl 52, id 38657, offset 0, flags [DF], proto UDP (17), length 76)
64-250-105-227.ethoplex.com.ntp > gateway.example.com.ntp: [udp sum ok] NTPv4, length 48
Server, Leap indicator: (0), Stratum 1 (primary reference), poll 10 (1024s), precision -18
Root Delay: 0.000000, Root dispersion: 0.001205, Reference-ID: PPS^@
Reference Timestamp: 3696689128.863678634 (2017/02/21 12:05:28)
Originator Timestamp: 3696689143.513155341 (2017/02/21 12:05:43)
Receive Timestamp: 3696689143.547838270 (2017/02/21 12:05:43)
Transmit Timestamp: 3696689143.548149943 (2017/02/21 12:05:43)
Originator - Receive Timestamp: +0.034682918
Originator - Transmit Timestamp: +0.034994553
12:05:43.573264 IP (tos 0x8, ttl 51, id 38657, offset 0, flags [DF], proto UDP (17), length 76)
64-250-105-227.ethoplex.com.ntp > pvelocalhost.ntp: [udp sum ok] NTPv4, length 48
Server, Leap indicator: (0), Stratum 1 (primary reference), poll 10 (1024s), precision -18
Root Delay: 0.000000, Root dispersion: 0.001205, Reference-ID: PPS^@
Reference Timestamp: 3696689128.863678634 (2017/02/21 12:05:28)
Originator Timestamp: 3696689143.513155341 (2017/02/21 12:05:43)
Receive Timestamp: 3696689143.547838270 (2017/02/21 12:05:43)
Transmit Timestamp: 3696689143.548149943 (2017/02/21 12:05:43)
Originator - Receive Timestamp: +0.034682918
Originator - Transmit Timestamp: +0.034994553
长话短说,你可以看到数据包离开并朝着目的地前进64-240-105-227.ethoplex.com.ntp
,你可以看到我们以相同的方式返回响应。第一个 UDP 校验和不正确,可能是因为 TOE,但在gateway
伪装成源 IP 并重新计算数据包的校验和后,一切似乎都解决了。
发生了什么事?除了设置 cron 作业每隔几个小时重新启动 NTP 之外,我还有什么选择?