eBPF 是否与 Ubuntu 22.04 上的网络命名空间兼容?

eBPF 是否与 Ubuntu 22.04 上的网络命名空间兼容?

我正在尝试使用 eBPF 和网络命名空间,因为我最终想要过滤 Kubernetes 容器之间的流量。但我发现这不起作用。我正在使用 eBPF 和我正​​在处理的常用测试用例中的用户代码https://github.com/tjcw/xdp-tutorial/tree/master/ebpf-filter,当我跑步时https://github.com/tjcw/xdp-tutorial/blob/master/ebpf-filter/runns.sh我收到如下失败信息

libbpf: elf: skipping unrecognized data section(7) xdp_metadata
libxdp: No bpffs found at /sys/fs/bpf
libxdp: Compatibility check for dispatcher program failed: No such file or directory
libxdp: Falling back to loading single prog without dispatcher
libbpf: specified path /sys/fs/bpf/accept_map is not on BPF FS
libbpf: map 'accept_map': failed to auto-pin at '/sys/fs/bpf/accept_map': -22
libbpf: map 'accept_map': failed to create: Invalid argument(-22)
libbpf: failed to load object './af_xdp_kern.o'
ERROR:xdp_program__attach returns -22

这似乎是因为在 Ubuntu 22.04 中使用内核时,eBPF 与网络命名空间不兼容;调查文件系统类型显示

tjcw@tjcw-Standard-PC-Q35-ICH9-2009:~/workspace/xdp-tutorial/ebpf-filter$ sudo ip netns exec ns1 bash
root@tjcw-Standard-PC-Q35-ICH9-2009:/home/tjcw/workspace/xdp-tutorial/ebpf-filter# cd /sys/fs
root@tjcw-Standard-PC-Q35-ICH9-2009:/sys/fs# ls
bpf  cgroup  ecryptfs  ext4  fuse  pstore
root@tjcw-Standard-PC-Q35-ICH9-2009:/sys/fs# df .
Filesystem     1K-blocks  Used Available Use% Mounted on
ns1                    0     0         0    - /sys
root@tjcw-Standard-PC-Q35-ICH9-2009:/sys/fs# cd bpf
root@tjcw-Standard-PC-Q35-ICH9-2009:/sys/fs/bpf# df .
Filesystem     1K-blocks  Used Available Use% Mounted on
ns1                    0     0         0    - /sys
root@tjcw-Standard-PC-Q35-ICH9-2009:/sys/fs/bpf# ls
root@tjcw-Standard-PC-Q35-ICH9-2009:/sys/fs/bpf#

在网络命名空间中,

tjcw@tjcw-Standard-PC-Q35-ICH9-2009:~/workspace/xdp-tutorial/ebpf-filter$ sudo bash
root@tjcw-Standard-PC-Q35-ICH9-2009:/home/tjcw/workspace/xdp-tutorial/ebpf-filter# df /sys/fs
Filesystem     1K-blocks  Used Available Use% Mounted on
sysfs                  0     0         0    - /sys
root@tjcw-Standard-PC-Q35-ICH9-2009:/home/tjcw/workspace/xdp-tutorial/ebpf-filter# df /sys/fs/bpf
Filesystem     1K-blocks  Used Available Use% Mounted on
bpf                    0     0         0    - /sys/fs/bpf
root@tjcw-Standard-PC-Q35-ICH9-2009:/home/tjcw/workspace/xdp-tutorial/ebpf-filter# ls -l /sys/fs/bpf
total 0
drwx------ 2 root root 0 Nov 11 12:30 snap
drwx------ 3 root root 0 Nov 11 15:17 xdp
root@tjcw-Standard-PC-Q35-ICH9-2009:/home/tjcw/workspace/xdp-tutorial/ebpf-filter#

在根命名空间中。eBPF 真的与网络命名空间不兼容吗,还是我配置错误或误解了某些内容?我在虚拟机中运行,而虚拟机又位于我的笔记本电脑上。Ubuntu 22.04 内核是 5.15.0-52-generic。

有人建议我应该在命名空间内挂载 bpf 文件系统,所以我的测试脚本的开头看起来像

#!/bin/bash -x
ip netns add ns1
ip netns exec ns1 mount -t bpf bpf /sys/fs/bpf
ip netns exec ns1 df /sys/fs/bpf

ip netns add ns2
ip netns exec ns2 mount -t bpf bpf /sys/fs/bpf
ip netns exec ns2 df /sys/fs/bpf

但这对我来说不起作用;我明白了

+ ip netns add ns1
+ ip netns exec ns1 mount -t bpf bpf /sys/fs/bpf
+ ip netns exec ns1 df /sys/fs/bpf
Filesystem     1K-blocks  Used Available Use% Mounted on
ns1                    0     0         0    - /sys
+

也有人建议我看看 https://github.com/cilium/ciliumhttps://isovalent.com/并查看 Linux 源代码 tools/testing/selftests/bpf 中的测试以获得灵感。这是我接下来要尝试的。

尝试初始答案(如下)后,我收到了许多Destination Host Unreachable消息。我的 AF_DXP 程序加载了,但 ping 没有通过。这是我的测试用例脚本

ip netns delete ns1
ip netns delete ns2
sleep 2

ip netns add ns1
ip netns add ns2

ip link add veth1 type veth peer name vpeer1
ip link add veth2 type veth peer name vpeer2

ip link set veth1 up
ip link set veth2 up

ip link set vpeer1 netns ns1
ip link set vpeer2 netns ns2

ip link add br0 type bridge
ip link set br0 up

ip link set veth1 master br0
ip link set veth2 master br0

ip addr add 10.10.0.1/16 dev br0

iptables -P FORWARD ACCEPT
iptables -F FORWARD


ip netns exec ns2 ./runns2.sh &
ip netns exec ns1 ./runns1.sh

wait

使用辅助脚本 runns1.sh

#!/bin/bash -x

ip netns exec ns1 ip link set lo up

ip netns exec ns1 ip link set vpeer1 up

ip netns exec ns1 ip addr add 10.10.0.10/16 dev vpeer1
sleep 6
ip netns exec ns1 ping -c 10 10.10.0.20

和帮助脚本 runns2.sh

#!/bin/bash -x

ip link set lo up
ip link set vpeer2 up
ip addr add 10.10.0.20/16 dev vpeer2
ip link set dev vpeer2 xdpgeneric off
ip tuntap add mode tun tun0
ip link set dev tun0 down
ip link set dev tun0 addr 10.10.0.30/24
ip link set dev tun0 up

mount -t bpf bpf /sys/fs/bpf
df /sys/fs/bpf
ls -l /sys/fs/bpf
rm -f /sys/fs/bpf/accept_map /sys/fs/bpf/xdp_stats_map
if [[ -z "${LEAVE}" ]]
then 
  export LD_LIBRARY_PATH=/usr/local/lib
  ./af_xdp_user -S -d vpeer2 -Q 0 --filename ./af_xdp_kern.o &
  ns2_pid=$!
  sleep 20
  kill -INT ${ns2_pid}
fi 
wait

给出输出

+ ip netns delete ns1
+ ip netns delete ns2
+ sleep 2
+ ip netns add ns1
+ ip netns add ns2
+ ip link add veth1 type veth peer name vpeer1
+ ip link add veth2 type veth peer name vpeer2
+ ip link set veth1 up
+ ip link set veth2 up
+ ip link set vpeer1 netns ns1
+ ip link set vpeer2 netns ns2
+ ip link add br0 type bridge
RTNETLINK answers: File exists
+ ip link set br0 up
+ ip link set veth1 master br0
+ ip link set veth2 master br0
+ ip addr add 10.10.0.1/16 dev br0
RTNETLINK answers: File exists
+ iptables -P FORWARD ACCEPT
+ iptables -F FORWARD
+ ip netns exec ns1 ./runns1.sh
+ ip netns exec ns2 ./runns2.sh
+ ip netns exec ns1 ip link set lo up
+ ip link set lo up
+ ip netns exec ns1 ip link set vpeer1 up
+ ip link set vpeer2 up
+ ip addr add 10.10.0.20/16 dev vpeer2
+ ip netns exec ns1 ip addr add 10.10.0.10/16 dev vpeer1
+ ip link set dev vpeer2 xdpgeneric off
+ ip tuntap add mode tun tun0
+ sleep 6
+ ip link set dev tun0 down
+ ip link set dev tun0 addr 10.10.0.30/24
"10.10.0.30/24" is invalid lladdr.
+ ip link set dev tun0 up
+ mount -t bpf bpf /sys/fs/bpf
+ df /sys/fs/bpf
Filesystem     1K-blocks  Used Available Use% Mounted on
bpf                    0     0         0    - /sys/fs/bpf
+ ls -l /sys/fs/bpf
total 0
+ rm -f /sys/fs/bpf/accept_map /sys/fs/bpf/xdp_stats_map
+ [[ -z '' ]]
+ export LD_LIBRARY_PATH=/usr/local/lib
+ LD_LIBRARY_PATH=/usr/local/lib
+ ns2_pid=3266
+ sleep 20
+ ./af_xdp_user -S -d vpeer2 -Q 0 --filename ./af_xdp_kern.o
main cfg.filename=./af_xdp_kern.o
main Opening program file ./af_xdp_kern.o
libbpf: elf: skipping unrecognized data section(8) .xdp_run_config
libbpf: elf: skipping unrecognized data section(9) xdp_metadata
main xdp_prog=0x56161aa476b0
main bpf_object=0x56161aa44490
libbpf: elf: skipping unrecognized data section(7) xdp_metadata
libbpf: elf: skipping unrecognized data section(7) xdp_metadata
+ ip netns exec ns1 ping -c 10 10.10.0.20
xsk_socket__create_shared_named_prog returns 0
bpf_map_update_elem(9,0x7ffef63436d0,0x7ffef63436e4,0)
bpf_map_update_elem returns 0
xsk_ring_prod__reserve returns 2048, XSK_RING_PROD__DEFAULT_NUM_DESCS is 2048
tun_read thread running
tun_read

0x0000 60 00 00 00 00 08 3a ff fe 80 00 00 00 00 00 00
0x0010 4c 45 17 e6 11 7c b7 4e ff 02 00 00 00 00 00 00
0x0020 00 00 00 00 00 00 00 02 85 00 50 41 00 00 00 00
addr=0x1fff100 len=90 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2049 free_count=0 frame=0x1fff100
addr=0x1ffe100 len=86 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2050 free_count=1 frame=0x1ffe100
addr=0x1ffd100 len=90 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2051 free_count=2 frame=0x1ffd100
addr=0x1ffc100 len=86 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2052 free_count=3 frame=0x1ffc100
addr=0x1ffb100 len=130 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2053 free_count=4 frame=0x1ffb100
addr=0x1ffa100 len=90 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2055 free_count=5 frame=0x1ffa100
addr=0x1ff9100 len=70 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2055 free_count=6 frame=0x1ff9100
addr=0x1ff8100 len=90 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2056 free_count=7 frame=0x1ff8100
addr=0x1ff7100 len=107 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2057 free_count=8 frame=0x1ff7100
addr=0x1ff6100 len=110 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2058 free_count=9 frame=0x1ff6100
addr=0x1ff5100 len=90 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2059 free_count=10 frame=0x1ff5100
addr=0x1ff4100 len=214 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2060 free_count=11 frame=0x1ff4100
addr=0x1ff3100 len=214 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2061 free_count=12 frame=0x1ff3100
addr=0x1ff2100 len=214 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2062 free_count=13 frame=0x1ff2100
addr=0x1ff1100 len=90 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2064 free_count=14 frame=0x1ff1100
addr=0x1ff0100 len=70 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2064 free_count=15 frame=0x1ff0100
addr=0x1fef100 len=202 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2065 free_count=16 frame=0x1fef100
addr=0x1fee100 len=107 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2066 free_count=17 frame=0x1fee100
addr=0x1fed100 len=90 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2067 free_count=18 frame=0x1fed100
addr=0x1fec100 len=202 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2068 free_count=19 frame=0x1fec100
tun_read

0x0000 60 00 00 00 00 08 3a ff fe 80 00 00 00 00 00 00
0x0010 4c 45 17 e6 11 7c b7 4e ff 02 00 00 00 00 00 00
0x0020 00 00 00 00 00 00 00 02 85 00 50 41 00 00 00 00
addr=0x1feb100 len=107 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2069 free_count=20 frame=0x1feb100
addr=0x1fea100 len=70 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2070 free_count=21 frame=0x1fea100
addr=0x1fe9100 len=202 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2071 free_count=22 frame=0x1fe9100
addr=0x1fe8100 len=70 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2072 free_count=23 frame=0x1fe8100
addr=0x1fe7100 len=42 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2073 free_count=24 frame=0x1fe7100
addr=0x1fe6100 len=42 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2074 free_count=25 frame=0x1fe6100
addr=0x1fe5100 len=42 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2075 free_count=26 frame=0x1fe5100
addr=0x1fe4100 len=107 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocaPING 10.10.0.20 (10.10.0.20) 56(84) bytes of data.
From 10.10.0.10 icmp_seq=1 Destination Host Unreachable
From 10.10.0.10 icmp_seq=2 Destination Host Unreachable
From 10.10.0.10 icmp_seq=3 Destination Host Unreachable
From 10.10.0.10 icmp_seq=4 Destination Host Unreachable
From 10.10.0.10 icmp_seq=5 Destination Host Unreachable
From 10.10.0.10 icmp_seq=6 Destination Host Unreachable
From 10.10.0.10 icmp_seq=7 Destination Host Unreachable
From 10.10.0.10 icmp_seq=8 Destination Host Unreachable
From 10.10.0.10 icmp_seq=9 Destination Host Unreachable
From 10.10.0.10 icmp_seq=10 Destination Host Unreachable

--- 10.10.0.20 ping statistics ---
10 packets transmitted, 0 received, +10 errors, 100% packet loss, time 9209ms
pipe 4
+ wait
+ kill -INT 3266
+ wait
tion_count=2076 free_count=27 frame=0x1fe4100
addr=0x1fe3100 len=42 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2077 free_count=28 frame=0x1fe3100
addr=0x1fe2100 len=42 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2078 free_count=29 frame=0x1fe2100
tun_read

0x0000 60 00 00 00 00 08 3a ff fe 80 00 00 00 00 00 00
0x0010 4c 45 17 e6 11 7c b7 4e ff 02 00 00 00 00 00 00
0x0020 00 00 00 00 00 00 00 02 85 00 50 41 00 00 00 00
addr=0x1fe1100 len=42 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2079 free_count=30 frame=0x1fe1100
addr=0x1fe0100 len=42 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2080 free_count=31 frame=0x1fe0100
addr=0x1fdf100 len=70 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2082 free_count=32 frame=0x1fdf100
addr=0x1fde100 len=70 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2082 free_count=33 frame=0x1fde100
addr=0x1fdd100 len=42 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2083 free_count=34 frame=0x1fdd100
addr=0x1fdc100 len=42 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2084 free_count=35 frame=0x1fdc100
addr=0x1fdb100 len=42 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2085 free_count=36 frame=0x1fdb100
addr=0x1fda100 len=107 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2086 free_count=37 frame=0x1fda100
addr=0x1fd9100 len=42 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2087 free_count=38 frame=0x1fd9100
addr=0x1fd8100 len=42 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2088 free_count=39 frame=0x1fd8100

我认为我已经看到脚本按预期运行一次(第一个数据包丢失,因为它被视为火星人,后续数据包通过)但我无法重现这种行为,即使在重启后立即运行。

感谢您提供的所有帮助,

答案1

来自同事的回答:

> Something isn't working as I expect; it looks like the bpf file system does not mount in the network namespace. The start of my script is
> #!/bin/bash -x
> ip netns add ns1
> ip netns exec ns1 mount -t bpf bpf /sys/fs/bpf
> ip netns exec ns1 df /sys/fs/bpf
>
> and I think I should expect to see 'bpf' as the filesystem type for the 'df' command. However what I actually get is
> + ip netns add ns1
> + ip netns exec ns1 mount -t bpf bpf /sys/fs/bpf
> + ip netns exec ns1 df /sys/fs/bpf
> Filesystem     1K-blocks  Used Available Use% Mounted on
> ns1                    0     0         0    - /sys
> +
> and then my attempt to run the afxdp test case process fails as before. Any idea what I am doing wrong ?

Well, the problem is that 'ip' sets up a new mount namespace every thing
you do 'ip netns exec'. So the BPF mount doesn't stay across different
'exec' invocations.

This is a bit of an impedance mismatch between libxdp and 'ip netns'.
You can get around it by having multiple commands in a single script and
executing that script with 'ip netns exec', instead of doing multiple
'exec' commands.

One thing to be aware of here is that the fact that the mount goes away
also means all the pinned programs disappear; so if you load an XDP
program with libxdp, then exit the netns, and go back in, libxdp may
have trouble unloading the program. If you're running a single
application that uses AF_XDP, this should be much of an issue, though.

I guess we could also teach libxdp to try to mount the bpffs if it's not
already there...

答案2

“目标无法到达”消息是因为我没有在 eBPF 内核代码中处理 ARP 数据包。修复该问题后,我的代码可以正常工作(流的第一个数据包被丢弃,所有后续数据包都通过)。我的运行脚本和帮助脚本是

#!/bin/bash -x
ip netns delete ns1
ip netns delete ns2
sleep 2

ip netns add ns1
ip netns add ns2

ip link add veth1 type veth peer name vpeer1
ip link add veth2 type veth peer name vpeer2

ip link set veth1 up
ip link set veth2 up

ip link set vpeer1 netns ns1
ip link set vpeer2 netns ns2

ip link add br0 type bridge
ip link set br0 up

ip link set veth1 master br0
ip link set veth2 master br0

ip addr add 10.10.0.1/16 dev br0

iptables -P FORWARD ACCEPT
iptables -F FORWARD


ip netns exec ns2 ./runns2.sh &
ip netns exec ns1 ./runns1.sh

wait

运行1.sh:

#!/bin/bash -x

ip netns exec ns1 ip link set lo up

ip netns exec ns1 ip link set vpeer1 up

ip netns exec ns1 ip addr add 10.10.0.10/16 dev vpeer1
sleep 6
ip netns exec ns1 ping -c 10 10.10.0.20

运行2.sh:

#!/bin/bash -x

ip link set lo up
ip link set vpeer2 up
ip addr add 10.10.0.20/16 dev vpeer2
ip link set dev vpeer2 xdpgeneric off
ip tuntap add mode tun tun0
ip link set dev tun0 down
ip link set dev tun0 addr 10.10.0.30/24
ip link set dev tun0 up

mount -t bpf bpf /sys/fs/bpf
df /sys/fs/bpf
ls -l /sys/fs/bpf
rm -f /sys/fs/bpf/accept_map /sys/fs/bpf/xdp_stats_map
if [[ -z "${LEAVE}" ]]
then
  export LD_LIBRARY_PATH=/usr/local/lib
  ./af_xdp_user -S -d vpeer2 -Q 0 --filename ./af_xdp_kern.o &
  ns2_pid=$!
  sleep 20
  kill -INT ${ns2_pid}
fi
wait

相关内容