NTP 服务器在启动时不同步

2024-6-1 • tag-icon

背景故事：我有几个内部 startum 1 NTP 时钟，带有 GPS 接收器，还有 2 个公共 NTP 服务器，它们在 VMware ESXi 之上虚拟化，从 S1 时钟获取时间并将其分发。除此之外，与其他公共服务器相比，此设置运行良好，并且提供了良好的时间。

问题：当我重新启动虚拟机时，它们无法正常开始同步，并卡在未同步状态。以下是重新启动后的 ntpq -p 输出。

root@server:~$ ntpq -p
 remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 192.168.1.40    .GPS.            1 u   27   64    3    1.533  -258.43 5948.73
 192.168.2.40    .GPS.            1 u   24   64    3    1.118  -258.47 6138.19
 192.168.3.42    .GPS.            1 u   24   64    3    0.709  -258.42 5655.02
 194.100.49.151  194.100.49.134   2 u   22   64    3    8.124  -258.74 7131.65
 gbg1.ntp.se     .PPS.            1 u   26   64    3   21.856  -258.43 4876.90
 ntp2.sptime.se  .PPS.            1 u   23   64    3   19.991  -258.42 7764.97
 ntp1.sptime.se  .PPS.            1 u   27   64    3   20.489  -258.41 8574.46

如果我运行 ntp service restart 我会得到以下结果：

root@server:~$ ntpq -p
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 192.168.1.40    .GPS.            1 u    2   64    1    1.517  -258.45   0.065
 192.168.2.40    .GPS.            1 u    1   64    1    1.126  -258.46   0.025
 192.168.3.42    .GPS.            1 u    2   64    1    0.719  -258.42   0.020
 194.100.49.151  194.100.49.134   2 u    5   64    1    8.041  -258.72   0.000
 gbg1.ntp.se     .PPS.            1 u    6   64    1   21.839  -258.41   0.000
 ntp2.sptime.se  .PPS.            1 u    4   64    1   19.968  -258.41   0.000
 ntp1.sptime.se  .PPS.            1 u    3   64    1   20.418  -258.43   0.000

一秒钟后，它又走了一步：

root@server:~$ ntpq -p
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 192.168.1.40    .STEP.          16 u    2   64    0    0.000    0.000   0.000
 192.168.2.40    .STEP.          16 u    2   64    0    0.000    0.000   0.000
 192.168.3.42    .STEP.          16 u    8   64    0    0.000    0.000   0.000
 194.100.49.151  194.100.49.134   2 u    -   64    1    7.976   -0.261   0.000
 gbg1.ntp.se     .PPS.            1 u    -   64    1   21.840    0.060   0.000
 ntp2.sptime.se  .STEP.          16 u    6   64    0    0.000    0.000   0.000
 ntp1.sptime.se  .STEP.          16 u    6   64    0    0.000    0.000   0.000

之后我们恢复正常运作：

root@server:~$ ntpq -p
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 192.168.1.40    .GPS.            1 u    1   64    1    1.474    0.044   0.017
*192.168.2.40    .GPS.            1 u    1   64    1    1.102    0.030   0.005
 192.168.3.42    .GPS.            1 u    1   64    1    0.674    0.049   0.009
 194.100.49.151  194.100.49.134   2 u    8   64    1    7.976   -0.261   0.000
 gbg1.ntp.se     .PPS.            1 u    8   64    1   21.840    0.060   0.000
 ntp2.sptime.se  .PPS.            1 u    6   64    1   19.979    0.059   0.000
 ntp1.sptime.se  .PPS.            1 u    5   64    1   20.440    0.048   0.000

因此，看起来重启后系统时钟会偏离相当远，这是可以预料到的，但为什么 ntpd 不会崩溃而只是调整时钟，这对我来说有点难以理解。

这是我的 ntp.conf

tinker panic 0
# /etc/ntp.conf, configuration for ntpd; see ntp.conf(5) for help

driftfile /var/lib/ntp/ntp.drift


# Enable this if you want statistics to be logged.
statsdir /var/log/ntpstats/

statistics loopstats peerstats clockstats
filegen loopstats file loopstats type day enable
filegen peerstats file peerstats type day enable
filegen clockstats file clockstats type day enable


# You do need to talk to an NTP server or two (or three).
#server ntp.your-provider.example

# pool.ntp.org maps to about 1000 low-stratum NTP servers.  Your server will
# pick a different set every time it starts up.  Please consider joining the
# pool: <http://www.pool.ntp.org/join.html>
server 192.168.1.40  iburst
server 192.168.2.40 iburst
server 192.168.3.42 iburst
server time1.mikes.fi
server ntp1.gbg.netnod.se
server ntp2.sptime.se
server ntp1.sptime.se

# Access control configuration; see /usr/share/doc/ntp-doc/html/accopt.html for
# details.  The web page <http://support.ntp.org/bin/view/Support/AccessRestrictions>
# might also be helpful.
#
# Note that "restrict" applies to both servers and clients, so a configuration
# that might be intended to block requests from certain clients could also end
# up blocking replies from your own upstream servers.

# By default, exchange time with everybody, but don't allow configuration.
restrict -4 default kod notrap nomodify nopeer noquery
restrict -6 default kod notrap nomodify nopeer noquery

# Local users may interrogate the ntp server more closely.
restrict 127.0.0.1
restrict ::1

# Clients from this (example!) subnet have unlimited access, but only if
# cryptographically authenticated.
#restrict 192.168.123.0 mask 255.255.255.0 notrust


# If you want to provide time to your local subnet, change the next line.
# (Again, the address is an example only.)
#broadcast 192.168.123.255

# If you want to listen to time broadcasts on your local subnet, de-comment the
# next lines.  Please do this only if you trust everybody on the network!
#disable auth
#broadcastclient

答案1

ntpd 默认步进阈值为 0.125 秒，第一个数据包后的恐慌阈值为 1000 秒。换句话说，超出设计条件包括偏移跳跃 15 分钟以上。

您捕获了初始数据包、步骤以及最终的对等选择。由于 NTP 算法的工作方式，即使您使用该iburst选项，也需要一两分钟才能建立。到达 3 表示到目前为止只收到了两个数据包。如果您没有丢弃 NTP 数据包，请等待更长时间。

如果初始偏移或步进不可接受，您可以等到 ntpd 或操作系统报告同步。对于 Linux 上的 systemd，请尝试依赖于systemd-time-wait-sync.service。

答案1

相关内容