使用 NetworkManager 和不使用 NetworkManager 的机器之间的不同 DHCP 行为

使用 NetworkManager 和不使用 NetworkManager 的机器之间的不同 DHCP 行为

有人能解释一下我下面列出的差异吗?也许可以解释一下为什么 NetworkManager 的做法不同。请告诉我们是否可以将 NetworkManager 的行为改为更像非 NetworkManager 场景。

两个 CentOS 7.8 服务器都使用 dhclient,但其中一个由 NetworkManager 控制。每隔几天,两者就会发生相同的交换机/NIC 关闭/启动事件(目前无法控制 - 由于多种原因,而且我们处于远程状态)

使用 NetworkManager 的服务器 #0 在停机/启动中断后立即尝试请求 DHCP。它无法从 DHCP 获得任何响应(另一个交换机问题),然后取消 DHCP 事务并将状态更改为超时。然后它什么也不做,除非重新启动 NetworkManager(显然这只能在控制台上完成)。请参阅下面的整个序列。

未使用 NetworkManager 的服务器#1 在这些停机/启动中断期间恢复正常,似乎它只是在 NIC 停机期间保持其租约,甚至在 NIC 启动时也不续订,并且只是继续使用其 IP!稍后它能够在常规租约超时间隔内续订 DHCP。请参阅下面的整个序列。

请告诉我是否可以将 NetworkManager 的行为改为更像普通的 dhclient。也许可以将其配置为在关闭/启动后仅保留当前租约,并在常规租约超时间隔内续订?谢谢!!

服务器 #0:

-- Last regular DHCP renew:
Feb 26 09:31:21 server0 dhclient[4766]: DHCPREQUEST on enp96s0f0 to 10.20.20.131 port 67 (xid=0x58eefe09)
Feb 26 09:31:21 server0 dhclient[4766]: DHCPACK from 10.20.20.131 (xid=0x58eefe09)
Feb 26 09:31:21 server0 NetworkManager[3701]: <info>  [1614349881.5084] dhcp4 (enp96s0f0):   address 10.20.20.223
Feb 26 09:31:21 server0 NetworkManager[3701]: <info>  [1614349881.5090] dhcp4 (enp96s0f0):   plen 22 (255.255.252.0)
Feb 26 09:31:21 server0 NetworkManager[3701]: <info>  [1614349881.5090] dhcp4 (enp96s0f0):   gateway 10.20.20.1
Feb 26 09:31:21 server0 NetworkManager[3701]: <info>  [1614349881.5090] dhcp4 (enp96s0f0):   lease time 18000
Feb 26 09:31:21 server0 NetworkManager[3701]: <info>  [1614349881.5090] dhcp4 (enp96s0f0):   nameserver '10.20.20.49'
Feb 26 09:31:21 server0 NetworkManager[3701]: <info>  [1614349881.5091] dhcp4 (enp96s0f0):   nameserver '10.20.20.48'
Feb 26 09:31:21 server0 NetworkManager[3701]: <info>  [1614349881.5091] dhcp4 (enp96s0f0):   domain name 'dom.com'
Feb 26 09:31:21 server0 NetworkManager[3701]: <info>  [1614349881.5091] dhcp4 (enp96s0f0): state changed bound -> bound
Feb 26 09:31:21 server0 dhclient[4766]: bound to 10.20.20.223 -- renewal in 8129 seconds.
Feb 26 09:31:21 server0 systemd: Starting Network Manager Script Dispatcher Service...
Feb 26 09:31:21 server0 systemd: Started Network Manager Script Dispatcher Service.
Feb 26 09:31:21 server0 nm-dispatcher: req:1 'dhcp4-change' [enp96s0f0]: new request (4 scripts)
Feb 26 09:31:21 server0 nm-dispatcher: req:1 'dhcp4-change' [enp96s0f0]: start running ordered scripts...
-- Random switch outage:
Feb 26 10:49:10 SERVER0 kernel: i40e 0000:60:00.0 enp96s0f0: NIC Link is Down
Feb 26 10:49:16 SERVER0 NetworkManager[3701]: <info>  [1614354556.8263] device (enp96s0f0): state change: activated -> unavailable (reason 'carrier-changed', sys-iface-state: 'managed')
Feb 26 10:49:16 SERVER0 NetworkManager[3701]: <info>  [1614354556.8467] dhcp4 (enp96s0f0): canceled DHCP transaction, DHCP client pid 4766
Feb 26 10:49:16 SERVER0 NetworkManager[3701]: <info>  [1614354556.8468] dhcp4 (enp96s0f0): state changed bound -> done
Feb 26 10:49:16 SERVER0 NetworkManager[3701]: <info>  [1614354556.8679] manager: NetworkManager state is now CONNECTED_LOCAL
Feb 26 10:49:16 SERVER0 systemd: Starting Network Manager Script Dispatcher Service...
Feb 26 10:49:16 SERVER0 systemd: Started Network Manager Script Dispatcher Service.
Feb 26 10:49:16 SERVER0 nm-dispatcher: req:1 'down' [enp96s0f0]: new request (4 scripts)
Feb 26 10:49:16 SERVER0 nm-dispatcher: req:1 'down' [enp96s0f0]: start running ordered scripts...
Feb 26 10:49:16 SERVER0 nm-dispatcher: req:2 'connectivity-change': new request (4 scripts)
Feb 26 10:49:16 SERVER0 nm-dispatcher: req:2 'connectivity-change': start running ordered scripts...
Feb 26 10:58:46 SERVER0 kernel: i40e 0000:60:00.0 enp96s0f0: NIC Link is Up, 1000 Mbps Full Duplex, Flow Control: None
-- Machine is not accessible
-- NetworkManager tries to recover and request DHCP:
Feb 26 10:58:46 SERVER0 NetworkManager[3701]: <info>  [1614355126.6768] device (enp96s0f0): carrier: link connected
Feb 26 10:58:46 SERVER0 NetworkManager[3701]: <info>  [1614355126.6783] device (enp96s0f0): state change: unavailable -> disconnected (reason 'carrier-changed', sys-iface-state: 'managed')
Feb 26 10:58:46 SERVER0 NetworkManager[3701]: <info>  [1614355126.6823] policy: auto-activating connection 'enp96s0f0' (7bdb7768-49c5-4cc4-a740-ee0a86cd90d5)
Feb 26 10:58:46 SERVER0 NetworkManager[3701]: <info>  [1614355126.6835] device (enp96s0f0): Activation: starting connection 'enp96s0f0' (7bdb7768-49c5-4cc4-a740-ee0a86cd90d5)
Feb 26 10:58:46 SERVER0 NetworkManager[3701]: <info>  [1614355126.6837] device (enp96s0f0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
Feb 26 10:58:46 SERVER0 NetworkManager[3701]: <info>  [1614355126.6844] manager: NetworkManager state is now CONNECTING
Feb 26 10:58:46 SERVER0 NetworkManager[3701]: <info>  [1614355126.6848] device (enp96s0f0): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
Feb 26 10:58:46 SERVER0 NetworkManager[3701]: <info>  [1614355126.7360] device (enp96s0f0): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
Feb 26 10:58:46 SERVER0 NetworkManager[3701]: <info>  [1614355126.7369] dhcp4 (enp96s0f0): activation: beginning transaction (timeout in 45 seconds)
Feb 26 10:58:46 SERVER0 NetworkManager[3701]: <info>  [1614355126.7435] dhcp4 (enp96s0f0): dhclient started with pid 44653
Feb 26 10:58:46 SERVER0 dhclient[44653]: DHCPREQUEST on enp96s0f0 to 255.255.255.255 port 67 (xid=0x161525b4)
Feb 26 10:58:54 SERVER0 dhclient[44653]: DHCPREQUEST on enp96s0f0 to 255.255.255.255 port 67 (xid=0x161525b4)
Feb 26 10:59:13 SERVER0 dhclient[44653]: DHCPDISCOVER on enp96s0f0 to 255.255.255.255 port 67 interval 3 (xid=0x2f70b1a3)
Feb 26 10:59:16 SERVER0 dhclient[44653]: DHCPDISCOVER on enp96s0f0 to 255.255.255.255 port 67 interval 6 (xid=0x2f70b1a3)
Feb 26 10:59:22 SERVER0 dhclient[44653]: DHCPDISCOVER on enp96s0f0 to 255.255.255.255 port 67 interval 9 (xid=0x2f70b1a3)
Feb 26 10:59:31 SERVER0 dhclient[44653]: DHCPDISCOVER on enp96s0f0 to 255.255.255.255 port 67 interval 14 (xid=0x2f70b1a3)
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <warn>  [1614355171.8451] dhcp4 (enp96s0f0): request timed out
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info>  [1614355171.8451] dhcp4 (enp96s0f0): state changed unknown -> timeout
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info>  [1614355171.8540] dhcp4 (enp96s0f0): canceled DHCP transaction, DHCP client pid 44653
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info>  [1614355171.8541] dhcp4 (enp96s0f0): state changed timeout -> done
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info>  [1614355171.8545] device (enp96s0f0): state change: ip-config -> failed (reason 'ip-config-unavailable', sys-iface-state: 'managed')
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info>  [1614355171.8553] manager: NetworkManager state is now CONNECTED_LOCAL
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <warn>  [1614355171.8559] device (enp96s0f0): Activation: failed for connection 'enp96s0f0'
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info>  [1614355171.8563] device (enp96s0f0): state change: failed -> disconnected (reason 'none', sys-iface-state: 'managed')
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info>  [1614355171.8606] policy: auto-activating connection 'enp96s0f0' (7bdb7768-49c5-4cc4-a740-ee0a86cd90d5)
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info>  [1614355171.8615] device (enp96s0f0): Activation: starting connection 'enp96s0f0' (7bdb7768-49c5-4cc4-a740-ee0a86cd90d5)
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info>  [1614355171.8617] device (enp96s0f0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
-- NetworkManager tries to recover and request DHCP again following a different process:
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info>  [1614355171.8624] manager: NetworkManager state is now CONNECTING
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info>  [1614355171.8628] device (enp96s0f0): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info>  [1614355171.9420] device (enp96s0f0): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info>  [1614355171.9429] dhcp4 (enp96s0f0): activation: beginning transaction (timeout in 45 seconds)
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info>  [1614355171.9489] dhcp4 (enp96s0f0): dhclient started with pid 44712
Feb 26 10:59:32 SERVER0 dhclient[44712]: DHCPREQUEST on enp96s0f0 to 255.255.255.255 port 67 (xid=0x5bd6c866)
Feb 26 10:59:36 SERVER0 dhclient[44712]: DHCPREQUEST on enp96s0f0 to 255.255.255.255 port 67 (xid=0x5bd6c866)
Feb 26 10:59:44 SERVER0 dhclient[44712]: DHCPDISCOVER on enp96s0f0 to 255.255.255.255 port 67 interval 5 (xid=0x3ffbeab4)
Feb 26 10:59:49 SERVER0 dhclient[44712]: DHCPDISCOVER on enp96s0f0 to 255.255.255.255 port 67 interval 5 (xid=0x3ffbeab4)
Feb 26 10:59:54 SERVER0 dhclient[44712]: DHCPDISCOVER on enp96s0f0 to 255.255.255.255 port 67 interval 7 (xid=0x3ffbeab4)
Feb 26 10:59:59 SERVER0 NetworkManager[3701]: <info>  [1614355199.5823] device (enp96s0f0): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'managed')
Feb 26 10:59:59 SERVER0 NetworkManager[3701]: <info>  [1614355199.5846] device (enp96s0f0): state change: ip-check -> secondaries (reason 'none', sys-iface-state: 'managed')
Feb 26 10:59:59 SERVER0 NetworkManager[3701]: <info>  [1614355199.5850] device (enp96s0f0): state change: secondaries -> activated (reason 'none', sys-iface-state: 'managed')
Feb 26 10:59:59 SERVER0 NetworkManager[3701]: <info>  [1614355199.5869] manager: NetworkManager state is now CONNECTED_LOCAL
Feb 26 10:59:59 SERVER0 NetworkManager[3701]: <info>  [1614355199.5982] manager: NetworkManager state is now CONNECTED_SITE
Feb 26 10:59:59 SERVER0 NetworkManager[3701]: <info>  [1614355199.5988] policy: set 'enp96s0f0' (enp96s0f0) as default for IPv6 routing and DNS
Feb 26 10:59:59 SERVER0 NetworkManager[3701]: <info>  [1614355199.5992] device (enp96s0f0): Activation: successful, device activated.
Feb 26 10:59:59 SERVER0 NetworkManager[3701]: <info>  [1614355199.6003] manager: NetworkManager state is now CONNECTED_GLOBAL
Feb 26 10:59:59 SERVER0 systemd: Starting Network Manager Script Dispatcher Service...
Feb 26 10:59:59 SERVER0 systemd: Started Network Manager Script Dispatcher Service.
Feb 26 10:59:59 SERVER0 nm-dispatcher: req:1 'up' [enp96s0f0]: new request (4 scripts)
Feb 26 10:59:59 SERVER0 nm-dispatcher: req:1 'up' [enp96s0f0]: start running ordered scripts...
Feb 26 10:59:59 SERVER0 nm-dispatcher: req:2 'connectivity-change': new request (4 scripts)
Feb 26 10:59:59 SERVER0 nm-dispatcher: req:2 'connectivity-change': start running ordered scripts...
Feb 26 11:00:01 SERVER0 dhclient[44712]: DHCPDISCOVER on enp96s0f0 to 255.255.255.255 port 67 interval 14 (xid=0x3ffbeab4)
Feb 26 11:00:15 SERVER0 dhclient[44712]: DHCPDISCOVER on enp96s0f0 to 255.255.255.255 port 67 interval 21 (xid=0x3ffbeab4)
-- NetworkManager cancels and times out and does nothing anymore
Feb 26 11:00:16 SERVER0 NetworkManager[3701]: <warn>  [1614355216.8456] dhcp4 (enp96s0f0): request timed out
Feb 26 11:00:16 SERVER0 NetworkManager[3701]: <info>  [1614355216.8463] dhcp4 (enp96s0f0): state changed unknown -> timeout
Feb 26 11:00:16 SERVER0 NetworkManager[3701]: <info>  [1614355216.8649] dhcp4 (enp96s0f0): canceled DHCP transaction, DHCP client pid 44712
Feb 26 11:00:16 SERVER0 NetworkManager[3701]: <info>  [1614355216.8650] dhcp4 (enp96s0f0): state changed timeout -> done

服务器 #1:

-- Last regular DHCP renew:
Feb 26 10:34:00 server1 dhclient[5252]: DHCPREQUEST on enp96s0f0 to 10.20.20.131 port 67 (xid=0x71bfdb34)
Feb 26 10:34:00 server1 dhclient[5252]: DHCPACK from 10.20.20.131 (xid=0x71bfdb34)
Feb 26 10:34:02 server1 dhclient[5252]: bound to 10.20.20.224 -- renewal in 8195 seconds.
-- Random switch outage:
Feb 26 10:49:10 server1 kernel: i40e 0000:60:00.0 enp96s0f0: NIC Link is Down
Feb 26 10:58:46 server1 kernel: i40e 0000:60:00.0 enp96s0f0: NIC Link is Up, 1000 Mbps Full Duplex, Flow Control: None
-- Machine is accessible during this time!
-- Next regular DHCP renew:
Feb 26 12:50:37 server1 dhclient[5252]: DHCPREQUEST on enp96s0f0 to 10.20.20.131 port 67 (xid=0x71bfdb34)
Feb 26 12:50:37 server1 dhclient[5252]: DHCPACK from 10.20.20.131 (xid=0x71bfdb34)
Feb 26 12:50:39 server1 dhclient[5252]: bound to 10.20.20.224 -- renewal in 8611 seconds.

答案1

在 NetworkManager 中,设备具有整体逻辑状态。这就是您在 中看到的内容nmcli device

如果设备已连接(已激活),则可能无法从 DHCP 获取地址(或者,稍后可能会发生 DHCP 超时)。根据ipv4.dhcp-timeout(您可以将其设置为无穷大),一段时间后 DHCP 将被视为失败。发生这种情况时,设备可能会完全关闭。这取决于设置ipv4.may-fail。如果ipv4.may-fail=no,则 DHCP 故障对激活来说是致命的,设备会关闭。如果不是,只要您有 IPv6 地址,整体状态仍然被认为是良好的。在这种情况下,应该无限期地重试 DHCP,同时设备保持激活/启动状态。

另一方面,如果设备因故障而停机,它将有资格再次自动连接(至少,如果您设置了connection.autoconect=yes)。此自动连接循环最多重复 次connection.autoconnect-retries,然后自动连接被阻止 5 分钟,然后重新启动。

本来应该是这样的。但是对于 CentOS7.8,我不确定这一切是否如我所说的那样有效。你说,“那么它什么也不做,除非重新启动 NetworkManager”。你确定吗?你等得够久了吗?DHCP 失败后,它可能会后退一段时间。你粘贴的日志在那之后就完成了。

调试 NetworkManager 时,调试日志更有用。level=TRACE在 NetworkManager.conf 中配置日志记录。

也许ipv4.may-fail=no会有帮助?那么至少设备会关闭,自动连接循环会再次开始。


顺便说一句,如果您希望 NetworkManager 在拔出电缆时保持设备处于开启状态(因为您似乎喜欢 dhclient),那么请在 中配置“ignore-carrier” man NetworkManager.conf

相关内容