Chrony 3.1 拒绝与 ntp 服务器同步

Chrony 3.1 拒绝与 ntp 服务器同步

我有 70 台装有 CentOS 7.2 和 chrony 版本 2.1.1 的机器,与我的 NTP 服务器协议 v3 完美同步。

最近我添加了 30 台机器 CentOS 7.4 and chrony version 3.1,但是这 30 台机器拒绝同步,我遵循了所有故障排除程序,但我完全不知道如何解决这个问题。命令输出:

chronyc tracking
Reference ID    : 00000000 ()
Stratum         : 0
Ref time (UTC)  : Thu Jan 01 00:00:00 1970
System time     : 0.000000013 seconds fast of NTP time
Last offset     : +0.000000000 seconds
RMS offset      : 0.000000000 seconds
Frequency       : 11.390 ppm fast
Residual freq   : +0.000 ppm
Skew            : 0.000 ppm
Root delay      : 1.000000000 seconds
Root dispersion : 1.000000000 seconds
Update interval : 0.0 seconds
Leap status     : Not synchronised


chronyc sources
210 Number of sources = 1
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
^? 172.17.172.220                4   7   377   644   -11.6s[ -11.6s] +/- 8147ms



tcpdump -n -i lo port 323 [Note: I applied "chronyc sources" in other terminal but nothing captured, in the working machines it capture some packets!]

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo, link-type EN10MB (Ethernet), capture size 262144 bytes
^C
0 packets captured
0 packets received by filter
0 packets dropped by kernel


 tcpdump -n -i eno2  port 123
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eno2, link-type EN10MB (Ethernet), capture size 262144 bytes
15:03:09.870958 IP 192.168.0.100.44841 > 172.17.172.220.ntp: NTPv4, Client, length 48
15:03:10.112707 IP 172.17.172.220.ntp > 192.168.0.100.44841: NTPv3, Server, length 48
15:11:45.678320 IP 192.168.0.100.46832 > 172.17.172.220.ntp: NTPv4, Client, length 48
15:11:45.892482 IP 172.17.172.220.ntp > 192.168.0.100.46832: NTPv3, Server, length 48
15:20:22.634981 IP 192.168.0.100.41310 > 172.17.172.220.ntp: NTPv4, Client, length 48
15:20:22.871226 IP 172.17.172.220.ntp > 192.168.0.100.41310: NTPv3, Server, length 48
15:28:55.820943 IP 192.168.0.100.39143 > 172.17.172.220.ntp: NTPv4, Client, length 48
15:28:55.873988 IP 172.17.172.220.ntp > 192.168.0.100.39143: NTPv3, Server, length 48
15:37:35.840998 IP 192.168.0.100.57333 > 172.17.172.220.ntp: NTPv4, Client, length 48
15:37:35.913139 IP 172.17.172.220.ntp > 192.168.0.100.57333: NTPv3, Server, length 48
15:46:15.814980 IP 192.168.0.100.56932 > 172.17.172.220.ntp: NTPv4, Client, length 48
15:46:15.882518 IP 172.17.172.220.ntp > 192.168.0.100.56932: NTPv3, Server, length 48
15:54:48.587705 IP 192.168.0.100.33711 > 172.17.172.220.ntp: NTPv4, Client, length 48
15:54:48.632963 IP 172.17.172.220.ntp > 192.168.0.100.33711: NTPv3, Server, length 48
^C
14 packets captured
14 packets received by filter
0 packets dropped by kernel

chronyc activity
200 OK
1 sources online
0 sources offline
0 sources doing burst (return to online)
0 sources doing burst (return to offline)
0 sources with unknown address


chronyc ntpdata  172.17.172.220
Remote address  : 172.17.172.220 (AC11ACDC)
Remote port     : 123
Local address   : 192.168.0.100 (C0A80064)
Leap status     : Normal
Version         : 3
Mode            : Server
Stratum         : 4
Poll interval   : 8 (256 seconds)
Precision       : -6 (0.015625000 seconds)
Root delay      : 0.031219 seconds
Root dispersion : 8.063156 seconds
Reference ID    : AC11AC88 ()
Reference time  : Sun Nov 12 09:21:36 2017
Offset          : +11.719727516 seconds
Peer delay      : 0.215471357 seconds
Peer dispersion : 0.015626255 seconds
Response time   : 0.000000000 seconds
Jitter asymmetry: -0.47
NTP tests       : 111 111 1101
Interleaved     : No
Authenticated   : No
TX timestamping : Kernel
RX timestamping : Kernel
Total TX        : 35
Total RX        : 35
Total valid RX  : 35


chronyc serverstats
NTP packets received       : 0
NTP packets dropped        : 0
Command packets received   : 6
Command packets dropped    : 0
Client log records dropped : 0

我应该做什么来修复

参考 ID:00000000 ()
层:0
收到的 NTP 数据包:0

我已经重新启动了整个操作系统,尝试了所有 chronyc 命令,例如 makestep 和 waitsync。但没有任何作用。我还尝试查找报告的错误,但找不到任何相关的错误。

请注意,firewalld 已禁用。 /etc/chrony.conf 是来自正在运行的 70 台机器的精确副本。

更新:
通过激活 tcpdump 的详细模式,似乎 chrony 3.1 时间戳已损坏,即使尝试chronyc makestep 1 -1它也没有同步,我也运行了调试模式“见下文”:

tcpdump -n -i eno2  port 123 -vvvvv
tcpdump: listening on eno2, link-type EN10MB (Ethernet), capture size 262144 bytes
20:25:15.708374 IP (tos 0x0, ttl 64, id 399, offset 0, flags [DF], proto UDP (17), length 76)
    192.168.0.100.49105 > 172.17.172.220.ntp: [bad udp cksum 0x1a45 -> 0xf15f!] NTPv4, length 48
        Client, Leap indicator:  (0), Stratum 0 (unspecified), poll 6 (64s), precision 32
        Root Delay: 0.000000, Root dispersion: 0.000000, Reference-ID: (unspec)
          Reference Timestamp:  0.000000000
          Originator Timestamp: 3719492661.028820399 (2017/11/12 20:24:21)
          Receive Timestamp:    1089474065.361510029 (2070/08/17 02:09:21)
          Transmit Timestamp:   2540453432.493019109 (1980/07/03 13:30:32)
            Originator - Receive Timestamp:  +1664948700.332689629
            Originator - Transmit Timestamp: -1179039228.535801290
20:25:15.964038 IP (tos 0x0, ttl 122, id 18400, offset 0, flags [none], proto UDP (17), length 76)
    172.17.172.220.ntp > 192.168.0.100.49105: [udp sum ok] NTPv3, length 48
        Server, Leap indicator:  (0), Stratum 4 (secondary reference), poll 6 (64s), precision -6
        Root Delay: 0.031219, Root dispersion: 8.154785, Reference-ID: 172.17.172.136
          Reference Timestamp:  3719467375.940868199 (2017/11/12 13:22:55)
          Originator Timestamp: 2540453432.493019109 (1980/07/03 13:30:32)
          Receive Timestamp:    3719492726.471868199 (2017/11/12 20:25:26)
          Transmit Timestamp:   3719492726.471868199 (2017/11/12 20:25:26)
            Originator - Receive Timestamp:  +1179039293.978849090
            Originator - Transmit Timestamp: +1179039293.978849090

调试模式输出:

/usr/sbin/chronyd -d -d
2017-11-12T17:32:37Z main.c:473:(main) chronyd version 3.1 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +SCFILTER +SECHASH +SIGND +ASYNCDNS +IPV6 +DEBUG)
2017-11-12T17:32:37Z conf.c:406:(CNF_ReadFile) Reading /etc/chrony.conf
2017-11-12T17:32:37Z conf.c:572:(CNF_ParseLine) commandkey directive is no longer supported
2017-11-12T17:32:37Z conf.c:572:(CNF_ParseLine) generatecommandkey directive is no longer supported
2017-11-12T17:32:37Z local.c:149:(calculate_sys_precision) Clock precision 0.000000016 (-26)
2017-11-12T17:32:37Z sys_linux.c:317:(get_version_specific_details) Linux kernel major=3 minor=10 patch=0
2017-11-12T17:32:37Z sys_linux.c:338:(get_version_specific_details) hz=100 nominal_tick=10000 max_tick_bias=1000
2017-11-12T17:32:37Z local.c:663:(lcl_RegisterSystemDrivers) Local freq=11.390ppm
2017-11-12T17:32:37Z util.c:1172:(UTI_DropRoot) Dropped root privileges: UID 998 GID 996
2017-11-12T17:32:37Z reference.c:209:(REF_Initialise) Frequency 11.390 +/- 0.031 ppm read from /var/lib/chrony/drift
2017-11-12T17:32:37Z sys_generic.c:251:(update_slew) slew offset=0.000000e+00 corr_rate=0.000000e+00 base_freq=11.389873 total_freq=11.389862 slew_freq=-1.093958e-11 duration=10000.000000 slew_error=1.203354e-13
2017-11-12T17:32:37Z ntp_core.c:1089:(transmit_timeout) Transmit timeout for [172.17.172.220:123]
2017-11-12T17:32:37Z ntp_io.c:831:(NIO_SendPacket) Sent 48 bytes to 172.17.172.220:123 from [UNSPEC] fd 8
2017-11-12T17:32:37Z ntp_io_linux.c:652:(NIO_Linux_ProcessMessage) Received 90 (48) bytes from error queue for 172.17.172.220:123 fd=8 if=3 tss=1
2017-11-12T17:32:37Z ntp_core.c:1994:(update_tx_timestamp) Updated TX timestamp delay=0.000010086
2017-11-12T17:32:38Z ntp_io.c:669:(process_message) Received 48 bytes from 172.17.172.220:123 to 192.168.0.100 fd=8 if=3 tss=1 delay=0.000014398
2017-11-12T17:32:38Z ntp_core.c:1563:(receive_packet) NTP packet lvm=34 stratum=4 poll=6 prec=-6 root_delay=0.031219 root_disp=8.201569 refid=ac11ac88 []
2017-11-12T17:32:38Z ntp_core.c:1568:(receive_packet) reference=1510478575.936134800 origin=3724568162.405584875 receive=1510507968.499134800 transmit=1510507968.499134800
2017-11-12T17:32:38Z ntp_core.c:1570:(receive_packet) offset=10.547374307 delay=0.099570973 dispersion=0.015824 root_delay=0.130790 root_dispersion=8.217393
2017-11-12T17:32:38Z ntp_core.c:1573:(receive_packet) remote_interval=0.000000000 local_interval=0.099570973 server_interval=0.000000000 txs=K rxs=K
2017-11-12T17:32:38Z ntp_core.c:1577:(receive_packet) test123=111 test567=111 testABCD=1111 kod_rate=0 interleaved=0 presend=0 valid=1 good=1 updated=1
2017-11-12T17:32:38Z sources.c:353:(SRC_AccumulateSample) ip=[172.17.172.220] t=1510507957.951760493 ofs=-10.547374 del=0.130790 disp=8.217393 str=4
2017-11-12T17:32:38Z sourcestats.c:658:(SST_GetSelectionData) n=1 off=-10.547374 dist=8.282888 sd=4.000000 first_ago=0.049800 last_ago=0.049800 selok=0
2017-11-12T17:32:38Z sources.c:770:(SRC_SelectSource) badstat=1 sel=0 badstat_reach=1 sel_reach=0 max_reach_ago=0.000000

确认3.1版本中的问题:

通过删除 3.1yum remove chrony并恢复到 chronyd 版本 2.1.1 yum localinstall /home/chrony-2.1.1-1.el7.centos.x86_64.rpm,同步工作完美!

答案1

RH Bugzilla 中也存在类似的 bug,已作为 notabug 关闭。问题是时间服务器质量差和新的 chrony 的默认值更改为不使用它们的组合。

https://bugzilla.redhat.com/show_bug.cgi?id=1525833

“由于时钟同步,服务器被忽略,因为它太不准确。在“chronycsources”输出中,有“+/- 4695ms”,这比默认的最大距离3秒大。在chrony-中添加了maxdistance选项2.2,所以这就是为什么它与 chrony-2.1 一起工作的原因,只有硬编码限制根离散度小于 16 秒。

tcpdump 输出显示 NTP 服务器的根离散度约为 3.6 秒。它是 Windows NTP 服务器吗?您还可以使用“chronyc ntpdata”检查根分散度。

需要在 chrony.conf 中设置更大的 maxdistance,以允许 chronyd 使用服务器进行同步。”

答案2

对于计时 3.1。

我们根据以下线程拼凑了一个解决方案,但为了简洁、易于检查的答案,请尝试以下操作。使用以下命令检查您收到的时间同步的状态(-v 解释列)

chronyc sources -v

最右边的列(例如+/- 10.5s)告诉您从相关服务器收到的时间更新的“估计误差”。

我们的问题是,从 Windows NTP 服务器收到的时间超过了 3 秒的“最大估计误差”阈值(+/- 10 秒),因此 chrony 没有相应地更新系统时间。将我们的服务器设置为使用英国 NTP 池服务器纠正了该问题(+/- 50 毫秒)

答案3

如果您有一个基于 Windows 的 NTP 服务器,也许这将是您的解决方案(它在类似的问题上对我有用):

https://chrony.tuxfamily.org/faq.html

3.4.使用 Windows NTP 服务器? Windows NTP 服务器的一个常见问题是它们报告非常大的根离散度(例如三秒或更长),这导致 chronyd 因服务器太不准确而忽略该服务器。源命令可能会显示有效的测量结果,但未选择服务器进行同步。您可以使用chronyc的ntpdata命令检查服务器的根分散情况。

需要增加 chrony.conf 中的 maxdistance 值以启用与此类服务器的同步。例如:

最大距离 16.0

相关内容