KVM/Qemu LAN 桥接网络解决了数据包丢失问题

KVM/Qemu LAN 桥接网络解决了数据包丢失问题

概括:在桥接 KVM/QEMU 配置中,发往客户虚拟机的网络数据包无法到达那里。

配置:主机是最新的 Ubuntu 20.04.2 LTS 服务器;客户机是 3 台虚拟机中的任意一台,一台非常老旧的 16.04 Ubuntu 服务器、一台老旧的 Ubuntu 20.04 桌面和一台全新的 Ubuntu 21.04 桌面。前两台虚拟机是从非桥接、NAT 转换而来的,第三台虚拟机是在指定桥接网络的情况下创建的。最终,虚拟机将通过 DHCP 从主 LAN 获取其 IP 地址,但目前为了获得更好的调试信息,它们使用的是静态 IP 地址。

主机桥定义,,/etc/netplan/01-netcfg.yaml(这是众多尝试之一):

# This file describes the network interfaces available on your system
# For more information, see netplan(5).
network:
  version: 2
  renderer: networkd
  ethernets:
    enp3s0:
      dhcp4: no
  bridges:
    br0:
#      interfaces: [ enp3s0 ]
      dhcp4: yes
#      dhcp6: no
#      link-local: [ ]
      interfaces:
        - enp3s0
#      parameters:
#        stp: true
#        forward-delay: 4

虚拟的东西/etc/libvirt/qemu/networks/br0.xml

<!--
WARNING: THIS IS AN AUTO-GENERATED FILE. CHANGES TO IT ARE LIKELY TO BE
OVERWRITTEN AND LOST. Changes to this xml configuration should be made using:
  virsh net-edit br0
or other application using the libvirt API.
-->

<network>
  <name>br0</name>
  <uuid>40a8752c-d074-4802-bae8-b0aef95d9c99</uuid>
  <forward mode='bridge'/>
  <bridge name='br0'/>
</network>

注意:已经尝试了许多版本的 bridge .xml 文件,包括不同的名称,不同的参考资料使用不同的技术。Ubuntu Serverguide 参考上面说了名字和桥名必须相同,但是其他参考文献中并没有相同的内容。使用 nano 创建裸文件后,执行以下命令:

virsh net-define br0.xml
virsh net-autostart br0
virsh net-start br0

用于添加和配置它。默认的 NAT 方式是从自动启动目录中取消链接,以免。最终,它未定义。结果:

$ virsh net-list --all
 Name      State      Autostart   Persistent
----------------------------------------------
 br0       active     yes         yes

此时,重新启动后根本没有 iptables 规则。但是虚拟机无法访问网络。请注意,一些参考资料提到了 br_netfilter 模块的特殊 iptables 规则和特殊属性,所有这些都已尝试过。这个问题已经够长了,不打算详细介绍这里尝试过的所有变体。

调试详细信息:无论配置如何,基本问题始终相同,虚拟机目的地数据包似乎没有到达主机,至少从 tcpdump 的角度来看是这样。但是,广播类型的数据包确实到达并到达了客户端虚拟机。

本示例将在 LAN 上使用 192.168.111.59(MAC:52:54:00:60:ea:0e)、16.04 服务器 VM 和 192.168.111.132(raspberry-pi)。20.04 主机服务器位于 192.168.111.136。网络掩码为 24 位,即 255.255.255.0。网关和 DHCP 服务器是 Debian 服务器(顺便说一下,桥接客户 VM 在该服务器上运行良好)。

在 ping 期间从 raspberry-pi 看到的第一个 tpcudmp:

doug@rpi2:~ $ sudo tcpdump -n -tttt -i eth0 ether host 52:54:00:60:ea:0e
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
2021-04-23 08:33:19.363553 ARP, Request who-has 192.168.111.1 tell 192.168.111.59, length 46
2021-04-23 08:33:19.487239 IP 192.168.111.132 > 192.168.111.59: ICMP echo request, id 27848, seq 14, length 64
2021-04-23 08:33:20.363542 ARP, Request who-has 192.168.111.1 tell 192.168.111.59, length 46
2021-04-23 08:33:20.527250 IP 192.168.111.132 > 192.168.111.59: ICMP echo request, id 27848, seq 15, length 64
2021-04-23 08:33:21.567215 IP 192.168.111.132 > 192.168.111.59: ICMP echo request, id 27848, seq 16, length 64
2021-04-23 08:33:22.607228 IP 192.168.111.132 > 192.168.111.59: ICMP echo request, id 27848, seq 17, length 64
2021-04-23 08:33:23.372351 ARP, Request who-has 192.168.111.1 tell 192.168.111.59, length 46
2021-04-23 08:33:23.647228 IP 192.168.111.132 > 192.168.111.59: ICMP echo request, id 27848, seq 18, length 64
2021-04-23 08:33:24.371431 ARP, Request who-has 192.168.111.1 tell 192.168.111.59, length 46

通过所有 ARP 活动可以看出,虚拟机可以正常发送数据包。但是它从不回复任何内容。现在让我们从主机观察相同的活动,注意 tcpdump 输出对于任何接口 br0、enp3s0 或 vnet0 都是相同的。

$ sudo tcpdump -n -tttt -i br0 ether host 52:54:00:60:ea:0e
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on br0, link-type EN10MB (Ethernet), capture size 262144 bytes
2021-04-23 08:40:38.837608 ARP, Request who-has 192.168.111.1 tell 192.168.111.59, length 28
2021-04-23 08:40:39.837159 ARP, Request who-has 192.168.111.1 tell 192.168.111.59, length 28
2021-04-23 08:40:40.837122 ARP, Request who-has 192.168.111.1 tell 192.168.111.59, length 28
2021-04-23 08:40:43.842985 ARP, Request who-has 192.168.111.1 tell 192.168.111.59, length 28
2021-04-23 08:40:44.840895 ARP, Request who-has 192.168.111.1 tell 192.168.111.59, length 28
2021-04-23 08:40:45.840991 ARP, Request who-has 192.168.111.1 tell 192.168.111.59, length 28
2021-04-23 08:40:48.848508 ARP, Request who-has 192.168.111.1 tell 192.168.111.59, length 28
2021-04-23 08:40:49.848895 ARP, Request who-has 192.168.111.1 tell 192.168.111.59, length 28
2021-04-23 08:40:50.848871 ARP, Request who-has 192.168.111.1 tell 192.168.111.59, length 28
2021-04-23 08:40:51.514011 ARP, Reply 192.168.111.59 is-at 52:54:00:60:ea:0e, length 28
2021-04-23 08:40:52.928400 ARP, Reply 192.168.111.59 is-at 52:54:00:60:ea:0e, length 28
2021-04-23 08:40:53.853881 ARP, Request who-has 192.168.111.1 tell 192.168.111.59, length 28
2021-04-23 08:40:54.852472 ARP, Request who-has 192.168.111.1 tell 192.168.111.59, length 28

偶尔观察一下,虚拟机确实会做出响应,但稍后我们会看到它响应的是广播数据包。似乎还有一个问题是 192.168.111.1 没有响应。确实如此,而且出于某种原因,在 tcpdump 级别看不到数据包。还请注意,没有来自 raspberry-pi 的 ICMP 数据包。现在,显示网关响应(这是另一台计算机上的“br0”。编辑:替换为更好的捕获示例,因此时间戳不同):

$ sudo tcpdump -n -tttt -e -i br0 ether host 52:54:00:60:ea:0e
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on br0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
2021-04-23 22:25:17.434415 52:54:00:60:ea:0e > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has 192.168.111.1 tell 192.168.111.59, length 46
2021-04-23 22:25:17.434432 xx:xx:xx:xx:xx:xx > 52:54:00:60:ea:0e, ethertype ARP (0x0806), length 42: Reply 192.168.111.1 is-at xx:xx:xx:xx:xx:xx, length 28
2021-04-23 22:25:20.440843 52:54:00:60:ea:0e > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has 192.168.111.1 tell 192.168.111.59, length 46
2021-04-23 22:25:20.440859 xx:xx:xx:xx:xx:xx > 52:54:00:60:ea:0e, ethertype ARP (0x0806), length 42: Reply 192.168.111.1 is-at xx:xx:xx:xx:xx:xx, length 28
2021-04-23 22:25:21.438316 52:54:00:60:ea:0e > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has 192.168.111.1 tell 192.168.111.59, length 46
2021-04-23 22:25:21.438332 xx:xx:xx:xx:xx:xx > 52:54:00:60:ea:0e, ethertype ARP (0x0806), length 42: Reply 192.168.111.1 is-at xx:xx:xx:xx:xx:xx, length 28
2021-04-23 22:25:22.438266 52:54:00:60:ea:0e > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has 192.168.111.1 tell 192.168.111.59, length 46
2021-04-23 22:25:22.438283 xx:xx:xx:xx:xx:xx > 52:54:00:60:ea:0e, ethertype ARP (0x0806), length 42: Reply 192.168.111.1 is-at xx:xx:xx:xx:xx:xx, length 28
2021-04-23 22:25:25.446312 52:54:00:60:ea:0e > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has 192.168.111.1 tell 192.168.111.59, length 46
2021-04-23 22:25:25.446329 xx:xx:xx:xx:xx:xx > 52:54:00:60:ea:0e, ethertype ARP (0x0806), length 42: Reply 192.168.111.1 is-at xx:xx:xx:xx:xx:xx, length 28
2021-04-23 22:25:26.446195 52:54:00:60:ea:0e > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has 192.168.111.1 tell 192.168.111.59, length 46
2021-04-23 22:25:26.446211 xx:xx:xx:xx:xx:xx > 52:54:00:60:ea:0e, ethertype ARP (0x0806), length 42: Reply 192.168.111.1 is-at xx:xx:xx:xx:xx:xx, length 28

观察虚拟机传出的数据包,ARP 确实完成了。我不知道如何从通过 VNC 与之通信的虚拟机中复制和粘贴,但它显示一些已完成但过时的 ARP 条目作为对命令的响应ip neigh,tcpdump 显示一些来自 LAN 的 ARP 和广播数据包。

其他信息(与本问题无关的MAC已被隐藏):

$ brctl show br0
bridge name     bridge id               STP enabled     interfaces
br0             8000.3c7c3f0d9983       no              enp3s0
                                                        vnet0
$ brctl showmacs br0
port no mac addr                is local?       ageing timer
  1     xx:xx:xx:xx:xx:xx       no                 0.00
  1     3c:7c:3f:0d:99:83       yes                0.00
  1     3c:7c:3f:0d:99:83       yes                0.00
  2     52:54:00:60:ea:0e       no                 1.68
  1     xx:xx:xx:xx:xx:xx       no                 2.14
  1     xx:xx:xx:xx:xx:xx       no                36.84
  1     xx:xx:xx:xx:xx:xx       no                89.57
  1     xx:xx:xx:xx:xx:xx       no               226.51
  1     xx:xx:xx:xx:xx:xx       no                13.28
  1     xx:xx:xx:xx:xx:xx       no               165.68
  1     xx:xx:xx:xx:xx:xx       no               165.68
  1     xx:xx:xx:xx:xx:xx       no               265.02
  1     xx:xx:xx:xx:xx:xx       no                27.62
  2     fe:54:00:60:ea:0e       yes                0.00
  2     fe:54:00:60:ea:0e       yes                0.00

$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master br0 state UP group default qlen 1000
    link/ether 3c:7c:3f:0d:99:83 brd ff:ff:ff:ff:ff:ff
3: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 3c:7c:3f:0d:99:83 brd ff:ff:ff:ff:ff:ff
    inet 192.168.111.136/24 brd 192.168.111.255 scope global dynamic br0
       valid_lft 51547sec preferred_lft 51547sec
7: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master br0 state UNKNOWN group default qlen 1000
    link/ether fe:54:00:60:ea:0e brd ff:ff:ff:ff:ff:ff

编辑:有趣的是,来自我的 D-Link AC2600 Wi-Fi 千兆路由器(配置为交换机)的所有 ARP 数据包始终出现在主机上并到达虚拟机并得到回复:

$ sudo tcpdump -n -tttt -e -i br0 ether host aa:aa:aa:aa:aa:aa
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on br0, link-type EN10MB (Ethernet), capture size 262144 bytes
2021-04-23 22:45:51.463524 aa:aa:aa:aa:aa:aa > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has 192.168.111.59 (ff:ff:ff:ff:ff:ff) tell 192.168.111.58, length 46
2021-04-23 22:45:51.463631 52:54:00:60:ea:0e > aa:aa:aa:aa:aa:aa, ethertype ARP (0x0806), length 42: Reply 192.168.111.59 is-at 52:54:00:60:ea:0e, length 28
2021-04-23 22:46:51.466955 aa:aa:aa:aa:aa:aa > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has 192.168.111.59 (ff:ff:ff:ff:ff:ff) tell 192.168.111.58, length 46
2021-04-23 22:46:51.467030 52:54:00:60:ea:0e > aa:aa:aa:aa:aa:aa, ethertype ARP (0x0806), length 42: Reply 192.168.111.59 is-at 52:54:00:60:ea:0e, length 28
2021-04-23 22:47:51.466889 aa:aa:aa:aa:aa:aa > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has 192.168.111.59 (ff:ff:ff:ff:ff:ff) tell 192.168.111.58, length 46
2021-04-23 22:47:51.466965 52:54:00:60:ea:0e > aa:aa:aa:aa:aa:aa, ethertype ARP (0x0806), length 42: Reply 192.168.111.59 is-at 52:54:00:60:ea:0e, length 28
2021-04-23 22:48:51.479096 aa:aa:aa:aa:aa:aa > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has 192.168.111.59 (ff:ff:ff:ff:ff:ff) tell 192.168.111.58, length 46
2021-04-23 22:48:51.479178 52:54:00:60:ea:0e > aa:aa:aa:aa:aa:aa, ethertype ARP (0x0806), length 42: Reply 192.168.111.59 is-at 52:54:00:60:ea:0e, length 28

编辑 3-新的配置测试:为了减少变量的数量,我们做了以下工作:

  • 一台即将退役的 Ubuntu 16.04 服务器已启动,提供了一个新的隔离局域网。
  • 主机 Ubuntu 20.04 服务器直接连接到 16.04 服务器的 LAN 侧 NIC。根本不需要交换机,只有一条长以太网电缆。
  • 一切都经过了测试,似乎运行良好。通过 ssh 从我的主 LAN 访问所有内容,通过我的主要静态 WAN IP 传出,然后通过我的测试 WAN 静态 IP 返回到旧的 16.04 服务器。然后从那里到 20.04 主机服务器进行链式 ssh 会话。
  • Ubuntu 16.04 VM客户端已在主机上启动。
  • 尝试从旧的 16.04 网关服务器向客户端执行“ping”操作。
  • 结果与原始配置相同。

网关旧 16.04 服务器上的 tcpdump:

doug@DOUG-64:~$ sudo tcpdump -n -tttt -e -i enp2s0 ether host 52:54:00:60:ea:0e
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on enp2s0, link-type EN10MB (Ethernet), capture size 262144 bytes
2021-04-26 15:10:00.701941 52:54:00:60:ea:0e > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has 192.168.111.1 tell 192.168.111.59, length 46
2021-04-26 15:10:00.701965 00:19:b9:0d:af:fa > 52:54:00:60:ea:0e, ethertype ARP (0x0806), length 42: Reply 192.168.111.1 is-at 00:19:b9:0d:af:fa, length 28
2021-04-26 15:10:01.699156 52:54:00:60:ea:0e > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has 192.168.111.1 tell 192.168.111.59, length 46
2021-04-26 15:10:01.699169 00:19:b9:0d:af:fa > 52:54:00:60:ea:0e, ethertype ARP (0x0806), length 42: Reply 192.168.111.1 is-at 00:19:b9:0d:af:fa, length 28
2021-04-26 15:10:02.699141 52:54:00:60:ea:0e > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has 192.168.111.1 tell 192.168.111.59, length 46
2021-04-26 15:10:02.699154 00:19:b9:0d:af:fa > 52:54:00:60:ea:0e, ethertype ARP (0x0806), length 42: Reply 192.168.111.1 is-at 00:19:b9:0d:af:fa, length 28
2021-04-26 15:10:05.707404 52:54:00:60:ea:0e > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has 192.168.111.1 tell 192.168.111.59, length 46
2021-04-26 15:10:05.707417 00:19:b9:0d:af:fa > 52:54:00:60:ea:0e, ethertype ARP (0x0806), length 42: Reply 192.168.111.1 is-at 00:19:b9:0d:af:fa, length 28
2021-04-26 15:10:06.707097 52:54:00:60:ea:0e > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has 192.168.111.1 tell 192.168.111.59, length 46
2021-04-26 15:10:06.707110 00:19:b9:0d:af:fa > 52:54:00:60:ea:0e, ethertype ARP (0x0806), length 42: Reply 192.168.111.1 is-at 00:19:b9:0d:af:fa, length 28
2021-04-26 15:10:07.707094 52:54:00:60:ea:0e > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has 192.168.111.1 tell 192.168.111.59, length 46
2021-04-26 15:10:07.707107 00:19:b9:0d:af:fa > 52:54:00:60:ea:0e, ethertype ARP (0x0806), length 42: Reply 192.168.111.1 is-at 00:19:b9:0d:af:fa, length 28

20.04主机服务器上的tcpdump:

doug@s19:~$ sudo tcpdump -n -tttt -e -i br0 ether host 52:54:00:60:ea:0e
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on br0, link-type EN10MB (Ethernet), capture size 262144 bytes
2021-04-26 15:11:35.801771 52:54:00:60:ea:0e > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.168.111.1 tell 192.168.111.59, length 28
2021-04-26 15:11:36.800497 52:54:00:60:ea:0e > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.168.111.1 tell 192.168.111.59, length 28
2021-04-26 15:11:37.800491 52:54:00:60:ea:0e > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.168.111.1 tell 192.168.111.59, length 28
2021-04-26 15:11:40.807062 52:54:00:60:ea:0e > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.168.111.1 tell 192.168.111.59, length 28
2021-04-26 15:11:41.804469 52:54:00:60:ea:0e > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.168.111.1 tell 192.168.111.59, length 28
2021-04-26 15:11:42.804444 52:54:00:60:ea:0e > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.168.111.1 tell 192.168.111.59, length 28
2021-04-26 15:11:45.812553 52:54:00:60:ea:0e > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.168.111.1 tell 192.168.111.59, length 28
2021-04-26 15:11:46.812405 52:54:00:60:ea:0e > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.168.111.1 tell 192.168.111.59, length 28
2021-04-26 15:11:47.812398 52:54:00:60:ea:0e > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.168.111.1 tell 192.168.111.59, length 28

附注:从我的链接 ssh 会话到 20.04 主机服务器,我可以再次链接并顺利 ssh 到 VM 客户端。

结论:Ubuntu 20.04 服务器上的链路层出了问题,以至于 tcpdump 甚至无法“看到”传入的数据包,也无法到达 VM 客户机。图表

编辑4:与 Christian Ehrhardt 提供的信息相比,我的系统上的潜在差异是 br0 MAC 列表可能不正确,第一个字节被替换。注意:删除了不相关的 MAC,正在运行 3 个虚拟机:

doug@s19:~$ brctl showmacs br0
port no mac addr                is local?       ageing timer
  1     3c:7c:3f:0d:99:83       yes                0.00  <<< enp3s0, br0
  1     3c:7c:3f:0d:99:83       yes                0.00  <<< enp3s0, br0
  4     52:54:00:22:2f:dc       no                 5.15  <<< VM 3
  2     52:54:00:60:ea:0e       no                 3.29  <<< VM 1
  3     52:54:00:60:ea:3e       no                12.67  <<< VM 2
  4     fe:54:00:22:2f:dc       yes                0.00  <<< vnet2
  4     fe:54:00:22:2f:dc       yes                0.00  <<< vnet2
  2     fe:54:00:60:ea:0e       yes                0.00  <<< vnet0
  2     fe:54:00:60:ea:0e       yes                0.00  <<< vnet0
  3     fe:54:00:60:ea:3e       yes                0.00  <<< vnet1
  3     fe:54:00:60:ea:3e       yes                0.00  <<< vnet1

无论出于什么原因,Christian 没有显示 vnet,或者无论在他的系统上它叫什么,都没有显示与桥梁的连接(我不知道它是否相关):

$ brctl showmacs br0
port no mac addr        is local?   ageing timer
  2 52:54:00:48:40:69   no         2.36   <- Guest
  1 52:54:00:95:e4:2a   no         0.00   <- outside system
  1 52:54:00:9b:9b:0e   yes        0.00   <- Host
  1 52:54:00:9b:9b:0e   yes        0.00   <- Host

编辑5:与EDIT 4类似的数据,但是来自运行2个VM的Debian服务器,运行正常:

doug@s15:~$ sudo brctl showmacs br0
port no mac addr                is local?       ageing timer
  1     52:54:00:22:2f:dc       no                17.85
  2     52:54:00:27:1b:5e       no                18.48  <<< VM 1
  3     52:54:00:27:1b:ae       no                 2.14  <<< VM 2
  1     f4:8c:eb:c8:08:a0       no                18.48
  2     fe:71:fa:75:16:93       yes                0.00  <<< tap0 (VM1)
  2     fe:71:fa:75:16:93       yes                0.00  <<< tap0
  3     fe:e1:c5:2a:c7:e3       yes                0.00  <<< tap1 (VM2)
  3     fe:e1:c5:2a:c7:e3       yes                0.00  <<< tap1

编辑6:来自的信息networkctl,需要注意的是 Debian 服务器显示“Master: br0”,而 Ubuntu 服务器没有:

乌本图:

doug@s19:~$ networkctl
IDX LINK   TYPE     OPERATIONAL SETUP
  1 lo     loopback carrier     unmanaged
  2 enp3s0 ether    enslaved    configured
  3 br0    bridge   routable    configured
  4 vnet0  ether    carrier     unmanaged

4 links listed.
doug@s19:~$ man networkctl
doug@s19:~$ networkctl^Cnetworkctl
doug@s19:~$ networkctl status vnet0
● 4: vnet0
             Link File: /usr/lib/systemd/network/99-default.link
          Network File: n/a
                  Type: ether
                 State: carrier (unmanaged)
                Driver: tun
            HW Address: fe:54:00:60:ea:0e
                   MTU: 1500 (min: 68, max: 65521)
  Queue Length (Tx/Rx): 1/1
      Auto negotiation: no
                 Speed: 10Mbps
                Duplex: full
                  Port: tp

Apr 30 07:40:51 s19 systemd-networkd[530]: vnet0: Link UP
Apr 30 07:40:51 s19 systemd-networkd[530]: vnet0: Gained carrier

Debian:

doug@s15:~$ networkctl
IDX LINK   TYPE     OPERATIONAL SETUP
  1 lo     loopback carrier     unmanaged
  2 enp3s0 ether    enslaved    configured
  3 enp1s0 ether    routable    configured
  4 br0    bridge   routable    configured
 10 tap0   ether    carrier     unmanaged

5 links listed.
doug@s15:~$ networkctl status tap0
● 10: tap0
             Link File: /usr/lib/systemd/network/99-default.link
          Network File: n/a
                  Type: ether
                 State: carrier (unmanaged)
                Driver: tun
            HW Address: fe:8a:6a:ce:18:9c
                   MTU: 1500 (min: 68, max: 65521)
                 QDisc: pfifo_fast
                Master: br0   <<<<< Different than Ubuntu
  Queue Length (Tx/Rx): 1/1
      Auto negotiation: no
                 Speed: 10Mbps
                Duplex: full
                  Port: tp

问题:哪里出了问题?我怎样才能使桥接虚拟机正常工作?

答案1

我们在 PowerEdge R740 和 Broadcomm 10GB NIC 上遇到了同样的 kvm vlan 桥接问题。我们发现

  Broadcom Adv. Dual 10Gb 以太网 21.65.33.33 和内核 5.4 -> 错误
  Broadcom Adv. Dual 10Gb 以太网 21.65.33.33 和内核 5.5 -> 良好
  Broadcom Adv. Dual 10Gb 以太网 21.80.16.95 和内核 5.5 -> 错误
  Broadcom Adv. Dual 10Gb 以太网 21.80.16.95 和内核 5.12 -> 错误

但是,固件回滚不起作用,所以要小心。

答案2

问题出在内核上。主机 20.04 服务器未使用 HWE 堆栈,因此当前内核为 5.4.0.72。

但是,在主线 5.5-rc1 及更高版本(包括 5.12-rc6)上测试的所有内核都运行正常(我还没有安装 5.12)。一些较旧的 5.4 系列内核也进行了测试,包括主线 5.4.0 和 5.4.117,但都失败了。

看来核二分法不能反向进行,即当好核在坏核之后时。但是,如果我们只是反转好核和坏核的定义,那么也许可以。参考

编辑 1:此 20.04 服务器需要 HWE 堆栈,因为硬件对于 5.4 系列内核来说太新了。我无法对内核进行二分,因为git bisect skip需要的 s 太多,而太多内核没有启动甚至没有编译。这整件事浪费了大量时间。

相关内容