背景故事:我有几个内部 startum 1 NTP 时钟,带有 GPS 接收器,还有 2 个公共 NTP 服务器,它们在 VMware ESXi 之上虚拟化,从 S1 时钟获取时间并将其分发。除此之外,与其他公共服务器相比,此设置运行良好,并且提供了良好的时间。
问题:当我重新启动虚拟机时,它们无法正常开始同步,并卡在未同步状态。以下是重新启动后的 ntpq -p 输出。
root@server:~$ ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
192.168.1.40 .GPS. 1 u 27 64 3 1.533 -258.43 5948.73
192.168.2.40 .GPS. 1 u 24 64 3 1.118 -258.47 6138.19
192.168.3.42 .GPS. 1 u 24 64 3 0.709 -258.42 5655.02
194.100.49.151 194.100.49.134 2 u 22 64 3 8.124 -258.74 7131.65
gbg1.ntp.se .PPS. 1 u 26 64 3 21.856 -258.43 4876.90
ntp2.sptime.se .PPS. 1 u 23 64 3 19.991 -258.42 7764.97
ntp1.sptime.se .PPS. 1 u 27 64 3 20.489 -258.41 8574.46
如果我运行 ntp service restart 我会得到以下结果:
root@server:~$ ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
192.168.1.40 .GPS. 1 u 2 64 1 1.517 -258.45 0.065
192.168.2.40 .GPS. 1 u 1 64 1 1.126 -258.46 0.025
192.168.3.42 .GPS. 1 u 2 64 1 0.719 -258.42 0.020
194.100.49.151 194.100.49.134 2 u 5 64 1 8.041 -258.72 0.000
gbg1.ntp.se .PPS. 1 u 6 64 1 21.839 -258.41 0.000
ntp2.sptime.se .PPS. 1 u 4 64 1 19.968 -258.41 0.000
ntp1.sptime.se .PPS. 1 u 3 64 1 20.418 -258.43 0.000
一秒钟后,它又走了一步:
root@server:~$ ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
192.168.1.40 .STEP. 16 u 2 64 0 0.000 0.000 0.000
192.168.2.40 .STEP. 16 u 2 64 0 0.000 0.000 0.000
192.168.3.42 .STEP. 16 u 8 64 0 0.000 0.000 0.000
194.100.49.151 194.100.49.134 2 u - 64 1 7.976 -0.261 0.000
gbg1.ntp.se .PPS. 1 u - 64 1 21.840 0.060 0.000
ntp2.sptime.se .STEP. 16 u 6 64 0 0.000 0.000 0.000
ntp1.sptime.se .STEP. 16 u 6 64 0 0.000 0.000 0.000
之后我们恢复正常运作:
root@server:~$ ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
192.168.1.40 .GPS. 1 u 1 64 1 1.474 0.044 0.017
*192.168.2.40 .GPS. 1 u 1 64 1 1.102 0.030 0.005
192.168.3.42 .GPS. 1 u 1 64 1 0.674 0.049 0.009
194.100.49.151 194.100.49.134 2 u 8 64 1 7.976 -0.261 0.000
gbg1.ntp.se .PPS. 1 u 8 64 1 21.840 0.060 0.000
ntp2.sptime.se .PPS. 1 u 6 64 1 19.979 0.059 0.000
ntp1.sptime.se .PPS. 1 u 5 64 1 20.440 0.048 0.000
因此,看起来重启后系统时钟会偏离相当远,这是可以预料到的,但为什么 ntpd 不会崩溃而只是调整时钟,这对我来说有点难以理解。
这是我的 ntp.conf
tinker panic 0
# /etc/ntp.conf, configuration for ntpd; see ntp.conf(5) for help
driftfile /var/lib/ntp/ntp.drift
# Enable this if you want statistics to be logged.
statsdir /var/log/ntpstats/
statistics loopstats peerstats clockstats
filegen loopstats file loopstats type day enable
filegen peerstats file peerstats type day enable
filegen clockstats file clockstats type day enable
# You do need to talk to an NTP server or two (or three).
#server ntp.your-provider.example
# pool.ntp.org maps to about 1000 low-stratum NTP servers. Your server will
# pick a different set every time it starts up. Please consider joining the
# pool: <http://www.pool.ntp.org/join.html>
server 192.168.1.40 iburst
server 192.168.2.40 iburst
server 192.168.3.42 iburst
server time1.mikes.fi
server ntp1.gbg.netnod.se
server ntp2.sptime.se
server ntp1.sptime.se
# Access control configuration; see /usr/share/doc/ntp-doc/html/accopt.html for
# details. The web page <http://support.ntp.org/bin/view/Support/AccessRestrictions>
# might also be helpful.
#
# Note that "restrict" applies to both servers and clients, so a configuration
# that might be intended to block requests from certain clients could also end
# up blocking replies from your own upstream servers.
# By default, exchange time with everybody, but don't allow configuration.
restrict -4 default kod notrap nomodify nopeer noquery
restrict -6 default kod notrap nomodify nopeer noquery
# Local users may interrogate the ntp server more closely.
restrict 127.0.0.1
restrict ::1
# Clients from this (example!) subnet have unlimited access, but only if
# cryptographically authenticated.
#restrict 192.168.123.0 mask 255.255.255.0 notrust
# If you want to provide time to your local subnet, change the next line.
# (Again, the address is an example only.)
#broadcast 192.168.123.255
# If you want to listen to time broadcasts on your local subnet, de-comment the
# next lines. Please do this only if you trust everybody on the network!
#disable auth
#broadcastclient
答案1
ntpd 默认步进阈值为 0.125 秒,第一个数据包后的恐慌阈值为 1000 秒。换句话说,超出设计条件包括偏移跳跃 15 分钟以上。
您捕获了初始数据包、步骤以及最终的对等选择。由于 NTP 算法的工作方式,即使您使用该iburst
选项,也需要一两分钟才能建立。到达 3 表示到目前为止只收到了两个数据包。如果您没有丢弃 NTP 数据包,请等待更长时间。
如果初始偏移或步进不可接受,您可以等到 ntpd 或操作系统报告同步。对于 Linux 上的 systemd,请尝试依赖于systemd-time-wait-sync.service
。