我试图调试calico的iptables规则,发现一个奇怪的问题:nat:PREROUTING后有些数据包丢失。
我在三个不同的节点上有三个 Pod,例如:
edge1/net-tool-edge1:10.22.46.41/192.168.0.16
node1/net-tool-node1:10.22.46.16/10.234.102.161
master/net-tool-master:10.22.46.11/10.234.79.169
我应该指出的是,edge1 上没有 calico-node,而是运行着 fabedge-agent(另一个 CNI)。 Edge1 出现问题,因此来自 net-tool-edge1 的数据包在 Node1 上丢失。这就是为什么我尝试调试 calico iptable 规则,因为我不认为 fabedge 上的错误会影响 calico。
我trace
按照本文的建议使用目标调试了 iptables:https://www.opsist.com/blog/2015/08/11/how-do-i-see-what-iptables-is-doing.html
当 calico 正常工作时(net-tool-master -> net-tool-node1),我得到了这个:
raw:PREROUTING:policy:3
mangle:PREROUTING:rule:1
mangle:cali-PREROUTING:rule:3
mangle:cali-from-host-endpoint:return:1
mangle:cali-PREROUTING:return:5
mangle:PREROUTING:policy:2
nat:PREROUTING:rule:1
nat:cali-PREROUTING:rule:1
nat:cali-fip-dnat:return:1
nat:cali-PREROUTING:return:2
nat:PREROUTING:rule:2
nat:KUBE-SERVICES:return:18
nat:PREROUTING:policy:4
mangle:FORWARD:policy:1
filter:FORWARD:rule:1
filter:cali-FORWARD:rule:1
filter:cali-FORWARD:rule:2
filter:cali-from-hep-forward:return:1
filter:cali-FORWARD:rule:4
filter:cali-to-wl-dispatch:rule:3
filter:cali-tw-cali20fd069ebc8:rule:3
filter:cali-tw-cali20fd069ebc8:rule:4
filter:cali-pri-_zbxMTbNMDRyfczFBup:rule:1
filter:cali-pri-_zbxMTbNMDRyfczFBup:rule:2
filter:cali-tw-cali20fd069ebc8:rule:5
filter:cali-FORWARD:rule:5
filter:cali-to-hep-forward:return:1
filter:cali-FORWARD:rule:6
filter:cali-cidr-block:return:1
filter:cali-FORWARD:return:7
filter:FORWARD:rule:2
filter:FABEDGE-FORWARD:return:4
filter:FORWARD:rule:3
filter:KUBE-FORWARD:return:5
filter:FORWARD:rule:4
filter:KUBE-SERVICES:return:1
filter:FORWARD:rule:5
filter:KUBE-EXTERNAL-SERVICES:return:1
filter:FORWARD:rule:6
filter:DOCKER-USER:return:1
filter:FORWARD:rule:7
filter:DOCKER-ISOLATION-STAGE-1:return:2
filter:FORWARD:rule:12
mangle:POSTROUTING:rule:1
mangle:cali-POSTROUTING:rule:1
mangle:POSTROUTING:policy:2
nat:POSTROUTING:rule:1
nat:cali-POSTROUTING:rule:1
nat:cali-fip-snat:return:1
nat:cali-POSTROUTING:rule:2
nat:cali-nat-outgoing:return:2
nat:cali-POSTROUTING:return:4
nat:POSTROUTING:rule:2
nat:FABEDGE-POSTROUTING:return:3
nat:POSTROUTING:rule:3
nat:KUBE-POSTROUTING:rule:1
nat:POSTROUTING:policy:5
当数据包丢失时(net-tool-edge1 -> net-tool-node1),跟踪如下:
raw:PREROUTING:policy:4
mangle:PREROUTING:rule:1
mangle:cali-PREROUTING:rule:3
mangle:cali-from-host-endpoint:return:1
mangle:cali-PREROUTING:return:5
mangle:PREROUTING:policy:2
nat:PREROUTING:rule:1
nat:cali-PREROUTING:rule:1
nat:cali-fip-dnat:return:1
nat:cali-PREROUTING:return:2
nat:PREROUTING:rule:2
nat:KUBE-SERVICES:return:18
nat:PREROUTING:policy:4
看起来默认策略是丢弃数据包,但是 nat 表的 PREROUTING 默认策略是ACCEPT
:
[root@node1 ~]# iptables -t nat -L PREROUTING --line-numbers
Chain PREROUTING (policy ACCEPT)
num target prot opt source destination
1 cali-PREROUTING all -- anywhere anywhere /* cali:6gwbT8clXdHdC1b1 */
2 KUBE-SERVICES all -- anywhere anywhere /* kubernetes service portals */
3 DOCKER all -- anywhere anywhere ADDRTYPE match dst-type LOCAL
事实上这nat:PREROUTING:policy:4
也出现在正常的痕迹中,所以我认为这可能不是原因。
我实在想不明白到底发生了什么。也没有找到 conntrack 记录。
欢迎任何帮助,提前致谢。
答案1
丢包的原因是没有路由,并且开启了rp_filter。设置rp_filter关闭后,数据包最终可以到达net-tool-node1