这是我第一次在 Ubuntu 18.04 上使用 systemd-networkd(使用 netplan)配置 DRBD(drbd-utils 8.9.10-2)。
总体而言,设置似乎成功了。我有一个资源正在通过专用接口在两台主机上同步。节点 1 和节点 2 的 NIC 直接连接(没有交换机)。另一个 NIC 用于心跳资源和 Web 服务器等资源。
现在来看看不起作用的部分:
当我拔掉专用 drdb 网络连接的电缆时,节点会进入 StandAlone 模式而不是 WFConnection。日志显示,它尝试进入 WFConnection 但失败了,随后又回到了 StandAlone:
Apr 17 09:08:01 gts-test-node2 systemd-networkd[284]: ens192: Flags change: -UP -LOWER_UP -RUNNING
Apr 17 09:08:01 gts-test-node2 systemd-networkd[284]: Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/network1/link/_33 interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=21 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a
Apr 17 09:08:01 gts-test-node2 systemd-networkd[284]: LLDP: Stopping LLDP client
Apr 17 09:08:01 gts-test-node2 systemd-networkd[284]: ens192: Stopped LLDP.
Apr 17 09:08:01 gts-test-node2 systemd-networkd[284]: ens192: Lost carrier
Apr 17 09:08:01 gts-test-node2 systemd-networkd[284]: ens192: State is configured, dropping config
Apr 17 09:08:01 gts-test-node2 systemd-networkd[284]: ens192: Removing address: 192.168.0.2/24 (valid forever)
Apr 17 09:08:01 gts-test-node2 systemd-timesyncd[388]: Network configuration changed, trying to establish connection.
Apr 17 09:08:01 gts-test-node2 systemd-timesyncd[388]: Synchronized to time server 91.189.94.4:123 (ntp.ubuntu.com).
Apr 17 09:08:02 gts-test-node2 corosync[480]: Apr 17 09:08:02 warning [TOTEM ] Incrementing problem counter for seqid 7082 iface 192.168.0.2 to [1 of 10]
Apr 17 09:08:02 gts-test-node2 corosync[480]: [TOTEM ] Incrementing problem counter for seqid 7082 iface 192.168.0.2 to [1 of 10]
Apr 17 09:08:02 gts-test-node2 corosync[480]: Apr 17 09:08:02 warning [TOTEM ] Incrementing problem counter for seqid 7084 iface 192.168.0.2 to [2 of 10]
Apr 17 09:08:02 gts-test-node2 corosync[480]: [TOTEM ] Incrementing problem counter for seqid 7084 iface 192.168.0.2 to [2 of 10]
Apr 17 09:08:02 gts-test-node2 kernel: drbd storage1: PingAck did not arrive in time.
Apr 17 09:08:02 gts-test-node2 kernel: drbd storage1: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
Apr 17 09:08:02 gts-test-node2 kernel: drbd storage1: ack_receiver terminated
Apr 17 09:08:02 gts-test-node2 kernel: drbd storage1: Terminating drbd_a_storage1
Apr 17 09:08:03 gts-test-node2 kernel: drbd storage1: Connection closed
Apr 17 09:08:03 gts-test-node2 kernel: drbd storage1: conn( NetworkFailure -> Unconnected )
Apr 17 09:08:03 gts-test-node2 kernel: drbd storage1: receiver terminated
Apr 17 09:08:03 gts-test-node2 kernel: drbd storage1: Restarting receiver thread
Apr 17 09:08:03 gts-test-node2 kernel: drbd storage1: receiver (re)started
Apr 17 09:08:03 gts-test-node2 kernel: drbd storage1: conn( Unconnected -> WFConnection )
Apr 17 09:08:03 gts-test-node2 kernel: drbd storage1: bind before listen failed, err = -99
Apr 17 09:08:03 gts-test-node2 kernel: drbd storage1: conn( WFConnection -> Disconnecting )
Apr 17 09:08:03 gts-test-node2 kernel: drbd storage1: Connection closed
Apr 17 09:08:03 gts-test-node2 kernel: drbd storage1: conn( Disconnecting -> StandAlone )
Apr 17 09:08:03 gts-test-node2 kernel: drbd storage1: State change failed: Need a connection to start verify or resync
Apr 17 09:08:03 gts-test-node2 kernel: drbd storage1: mask = 0x1f0 val = 0x80
Apr 17 09:08:03 gts-test-node2 kernel: drbd storage1: old_conn:StandAlone wanted_conn:WFConnection
Apr 17 09:08:03 gts-test-node2 kernel: drbd storage1: receiver terminated
Apr 17 09:08:03 gts-test-node2 kernel: drbd storage1: Terminating drbd_r_storage1
这个错误对我来说是新的,在 Ubuntu 16.04 中从未发生过,其中ifupdown
使用了 而不是systemd-networkd
:
监听前绑定失败,错误 = -99
如果我systemctl stop systemd-networkd
在拔掉电缆之前,行为是正确的 --> 它保留在 WFConnection 中,但我希望能够使用 systemd-networkd 而不是重新配置所有内容以回到“旧方式”。
用两种不同的方式模拟电缆断开,结果相同:
ip link set ens190 down
或者干脆物理断开接口。
有人知道是什么过程systemd-networkd
(或者可能networkd-dispatcher
是其他过程?)导致了这种错误行为吗?我在互联网上找不到与此主题相关的任何内容。
非常感谢您的帮助。
答案1
我也遇到过这个问题。使用 networkd-dispatch 记录接口状态时,我发现当没有运营商时,接口的 IP 配置会被剥离,导致 drbd 将所有资源设为独立。systemd 的这种行为似乎毫无帮助地颠覆了关于在这种情况下应该发生什么的各种既定想法。
https://github.com/systemd/systemd/commit/a9cc0189aa69a5fa1bcbacbdc69740fa3e5353db已添加,但 bionic systemd 包中未包含该选项。该选项可能还需要通过 netplan 公开,以便生成的单元设置所需的行为。
可能的解决方法:
- 将 drbd IP 分配给网桥,并将复制接口添加到网桥
drbdadm connect all
在适当的阶段使用 networkd-dispatcher
(我认为 systemd 的行为完全是错误的。当可以让应用程序绑定到特定地址时,我不希望该地址因为对等网络设备重新启动、电缆拔出等而消失。)