Linux 网络设备掉线并循环重启

Linux 网络设备掉线并循环重启

我正在研究一个相当独特的堆栈,因此非常欢迎任何诊断/调试策略。

设置:Ubuntu 18.04 桌面,带 2 个端口的 PCIe 网卡。两个以太网摄像头直接插入卡中。这为它们提供了一个 169.254.xy 链接本地地址,并且这在一段时间内运行良好。

我曾尝试使用 dnsmasq 在这些端口上充当 DHCP,这样我就可以为摄像头分配一个静态 IP(因为 docker macvlan 的原因)。这也奏效了,但后来证明没有必要,所以我禁用了 dnsmasq。

现在,当设备插入时,系统进入循环,反复启用和禁用接口。journalctl 看起来像这样:

Nov  4 12:12:36 hostname NetworkManager[28979]: <info>  [1572887556.0123] device (ethn0): carrier: link connected
Nov  4 12:12:36 hostname kernel: [ 5705.109602] ixgbe 0000:65:00.0 ethn0: NIC Link is Up 1 Gbps, Flow Control: None
Nov  4 12:12:36 hostname NetworkManager[28979]: <info>  [1572887556.0130] device (ethn0): state change: unavailable -> disconnected (reason 'carrier-changed', sys-iface-state: 'managed')
Nov  4 12:12:36 hostname NetworkManager[28979]: <info>  [1572887556.0160] policy: auto-activating connection 'Wired connection 3'
Nov  4 12:12:36 hostname NetworkManager[28979]: <info>  [1572887556.0200] device (ethn0): Activation: starting connection 'Wired connection 3' (0ae083fb-03b4-3782-a069-7aa48780f65b)
Nov  4 12:12:36 hostname NetworkManager[28979]: <info>  [1572887556.0207] device (ethn0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
Nov  4 12:12:36 hostname NetworkManager[28979]: <info>  [1572887556.0223] device (ethn0): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
Nov  4 12:12:36 hostname NetworkManager[28979]: <info>  [1572887556.0306] device (ethn0): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
Nov  4 12:12:36 hostname NetworkManager[28979]: <info>  [1572887556.0313] dhcp4 (ethn0): activation: beginning transaction (timeout in 45 seconds)
Nov  4 12:12:36 hostname NetworkManager[28979]: <info>  [1572887556.0347] dhcp4 (ethn0): dhclient started with pid 3304
Nov  4 12:12:36 hostname dhclient[3304]: DHCPDISCOVER on ethn0 to 255.255.255.255 port 67 interval 3 (xid=0x16ff5155)
Nov  4 12:12:37 hostname kernel: [ 5706.148368] ixgbe 0000:65:00.0 ethn0: NIC Link is Down
Nov  4 12:12:37 hostname avahi-daemon[492]: Joining mDNS multicast group on interface ethn0.IPv6 with address fe80::7e02:198c:dd14:f845.
Nov  4 12:12:37 hostname avahi-daemon[492]: New relevant interface ethn0.IPv6 for mDNS.
Nov  4 12:12:37 hostname avahi-daemon[492]: Registering new address record for fe80::7e02:198c:dd14:f845 on ethn0.*.
Nov  4 12:12:39 hostname dhclient[3304]: DHCPDISCOVER on ethn0 to 255.255.255.255 port 67 interval 4 (xid=0x16ff5155)
Nov  4 12:12:43 hostname NetworkManager[28979]: <info>  [1572887563.0543] device (ethn0): state change: ip-config -> unavailable (reason 'carrier-changed', sys-iface-state: 'managed')
Nov  4 12:12:43 hostname NetworkManager[28979]: <info>  [1572887563.0707] dhcp4 (ethn0): canceled DHCP transaction, DHCP client pid 3304
Nov  4 12:12:43 hostname NetworkManager[28979]: <info>  [1572887563.0707] dhcp4 (ethn0): state changed unknown -> done
Nov  4 12:12:43 hostname avahi-daemon[492]: Withdrawing address record for fe80::7e02:198c:dd14:f845 on ethn0.
Nov  4 12:12:43 hostname avahi-daemon[492]: Leaving mDNS multicast group on interface ethn0.IPv6 with address fe80::7e02:198c:dd14:f845.
Nov  4 12:12:43 hostname avahi-daemon[492]: Interface ethn0.IPv6 no longer relevant for mDNS.
Nov  4 12:12:43 hostname NetworkManager[28979]: <info>  [1572887563.3914] device (ethn0): carrier: link connected
Nov  4 12:12:43 hostname NetworkManager[28979]: <info>  [1572887563.3922] device (ethn0): state change: unavailable -> disconnected (reason 'carrier-changed', sys-iface-state: 'managed')
Nov  4 12:12:43 hostname kernel: [ 5712.488688] ixgbe 0000:65:00.0 ethn0: NIC Link is Up 1 Gbps, Flow Control: None
Nov  4 12:12:43 hostname NetworkManager[28979]: <info>  [1572887563.3954] policy: auto-activating connection 'Wired connection 3'

然后就一直循环。我不知道哪个服务出了问题,不过看起来 NetworkManager 好像在做一些愚蠢的事情。重启、重启 networkmanager、重新启用 dnsmasq,到目前为止都无法解决问题,我不知道下一步该去哪里找。

答案1

所以这成了一个有趣的问题,可以放在“怪异错误汇总”中。事实证明,这个问题是由电源不足引起的。当相机插入电源时,内核会看到连接并启用接口。这会告诉相机开始与计算机通信,计算机会拉动足够的额外电源来降低电压,从而重置相机。相机重置,导致内核认为接口已拔出。相机尝试重新连接,然后重复此循环。

故事的寓意是:抽象概念会泄露秘密。先将所有选项(无论多么愚蠢)都纳入规则,然后逐一排除它们。

相关内容