如何修复由 NTP 服务器同步大量机器的时间

Question 1

当您谈论时间的转换时，您通常谈论的是少量的时间。修复是通过调用来执行的adjtime()，或者在 Linux 上可能是adjtimex()。

来自 ntpd 手册页：

   -x     Normally, the time is slewed if the offset is less than the step
          threshold,  which is 128 ms by default, and stepped if above the
          threshold.  This option sets the threshold to 600  s,  which  is
          well  within  the  accuracy  window  to  set the clock manually.
          Note: Since the slew rate of typical Unix kernels is limited  to
          0.5  ms/s,  each  second  of adjustment requires an amortization
          interval of 2000 s.  Thus, an adjustment as much as 600  s  will
          take  almost  14 days to complete.  This option can be used with
          the -g and -q options.  Note: The kernel time discipline is dis‐
          abled with this option.

那么，我怀疑您是否愿意以这种速度等待 7 小时的校正。这将需要一年多的时间。在 Linux 上，32 位系统上的 adjtime 实际上被限制在约 2000 秒的增量内。64 位系统可能不会引起这个问题，但速度这一变化何时生效仍是一个令人担忧的问题。

因此，在 Linux 实现中存在一个阈值，大概在其他实现中也是如此，在此阈值之下，您会得到一个非常缓慢的“摆动”，但在此阈值之上，主机和客户端上的系统时钟将会逐步提高，这可以进行得更快。

还会有另一个阈值，如果主服务器和客户端之间的时间差异太大，客户端将认为存在错误并且不会更新。来自 ntpd 手册页：

   -g     Normally, ntpd exits with a message to the  system  log  if  the
          offset  exceeds the panic threshold, which is 1000 s by default.
          This option allows the time to  be  set  to  any  value  without
          restriction; however, this can happen only once.  If the thresh‐
          old is exceeded after that, ntpd will exit with a message to the
          system log.  This option can be used with the -q and -x options.

请注意，该-g选项几乎肯定不是为守护进程设置的。它通常用作ntpd -gq，在系统启动时一次性运行，或手动运行，其行为与非常相似ntpdate。不过，恐慌阈值可能在编译时可配置，因此请查看操作系统供应商的手册页。

编写一个程序，使用您选择的任何频率和调整大小进行一系列时间调整，这非常简单。您可以在 ntp master 上执行此操作，它会将调整后的时间提供给其客户端，但您需要知道客户端系统将接受的最大调整大小，以及什么最小阈值会导致它们执行非常缓慢的调整。为了安全起见，您应该调查客户端系统上的 ntp 实现。

如果您要更新的系统具有与 Linux 上默认 ntpd 类似的特性，但没有此-x选项，那么您可以使用每 5 秒进行半秒调整的机制，并在大约 3 天内实现同步。进行不跨越秒边界的亚秒级调整可能有助于避免触发两次 cron 作业之类的事情，但您可能会发现某种副作用。

如果您遇到服务器不再同步的情况，那么事情就会变得更加混乱。如果可行的话，我希望监控时差，如果某些服务器不再同步，则自动停止自动定期更新并发出警报。

Answer

当您谈论时间的转换时，您通常谈论的是少量的时间。修复是通过调用来执行的adjtime()，或者在 Linux 上可能是adjtimex()。