为什么来自 Technitium 的 DNS 查询响应无法返回到我的 kubernetes 容器?

为什么来自 Technitium 的 DNS 查询响应无法返回到我的 kubernetes 容器?

我探索过我能想到的每个角落,但还是找不到问题所在。

我在服务器上设置了 k3s,并在其中托管 technitium。TechnitiumhostNetwork目前正在向我的网络提供请求,并且处理 DHCP 没有任何问题。

我想设置 wireguard 来远程访问服务,并且为了解析服务,我决定将 technitium 设置为我的 coredns 上游,并使用分割水平解析使 wireguard 流量体验与网络中的设备尽可能难以区分。

我尝试将上游设置为我的主机 IP(10.1.2.3出于本文的目的),但容器无法解析 DNS,因此我恢复了更改,转而拉起一次性 netshoot 容器并尝试手动将 DNS 解析为主机。这就是奇怪的事情发生的地方。

netshoot$ dig google.com # works fine
host$ dig @10.1.2.3 google.com # works fine from the host
netshoot$ @10.1.2.3 google.com # times out as follows:

;; communications error to 10.1.2.3#53: timed out
;; communications error to 10.1.2.3#53: timed out
;; communications error to 10.1.2.3#53: timed out

; <<>> DiG 9.18.13 <<>> @10.1.2.3 google.com
; (1 server found)
;; global options: +cmd
;; no servers could be reached

好吧,让我们看看tcpdump说了什么。请注意,10.4.5.6出于本文的目的,这是容器 IP。node-host出于本文的目的,这是节点主机的名称。

20:10:02.884944 veth640b06a5 P   IP (tos 0x0, ttl 64, id 35863, offset 0, flags [none], proto UDP (17), length 79)
    10.4.5.6.48995 > node-host.domain: 33766+ [1au] A? google.com. (51)
20:10:02.884956 cni0  In  IP (tos 0x0, ttl 64, id 35863, offset 0, flags [none], proto UDP (17), length 79)
    10.4.5.6.48995 > node-host.domain: 33766+ [1au] A? google.com. (51)
20:10:02.885180 cni0  Out IP (tos 0x0, ttl 64, id 6664, offset 0, flags [DF], proto UDP (17), length 83)
    node-host.domain > 10.4.5.6.48995: 33766 1/0/1 google.com. A 142.251.46.206 (55)
20:10:02.885182 veth640b06a5 Out IP (tos 0x0, ttl 64, id 6664, offset 0, flags [DF], proto UDP (17), length 83)
    node-host.domain > 10.4.5.6.48995: 33766 1/0/1 google.com. A 142.251.46.206 (55)
20:10:02.885192 veth640b06a5 P   IP (tos 0xc0, ttl 64, id 57398, offset 0, flags [none], proto ICMP (1), length 111)
    10.4.5.6 > node-host: ICMP 10.4.5.6 udp port 48995 unreachable, length 91
        IP (tos 0x0, ttl 64, id 6664, offset 0, flags [DF], proto UDP (17), length 83)
    node-host.domain > 10.4.5.6.48995: 33766 1/0/1 google.com. A 142.251.46.206 (55)
20:10:02.885195 cni0  In  IP (tos 0xc0, ttl 64, id 57398, offset 0, flags [none], proto ICMP (1), length 111)
    10.4.5.6 > node-host: ICMP 10.4.5.6 udp port 48995 unreachable, length 91
        IP (tos 0x0, ttl 64, id 6664, offset 0, flags [DF], proto UDP (17), length 83)
    node-host.domain > 10.4.5.6.48995: 33766 1/0/1 google.com. A 142.251.46.206 (55)
20:10:07.890371 veth640b06a5 P   IP (tos 0x0, ttl 64, id 48767, offset 0, flags [none], proto UDP (17), length 79)
    10.4.5.6.49549 > node-host.domain: 33766+ [1au] A? google.com. (51)
20:10:07.890398 cni0  In  IP (tos 0x0, ttl 64, id 48767, offset 0, flags [none], proto UDP (17), length 79)
    10.4.5.6.49549 > node-host.domain: 33766+ [1au] A? google.com. (51)
20:10:07.890867 cni0  Out IP (tos 0x0, ttl 64, id 7896, offset 0, flags [DF], proto UDP (17), length 83)
    node-host.domain > 10.4.5.6.49549: 33766 1/0/1 google.com. A 142.251.46.206 (55)
20:10:07.890874 veth640b06a5 Out IP (tos 0x0, ttl 64, id 7896, offset 0, flags [DF], proto UDP (17), length 83)
    node-host.domain > 10.4.5.6.49549: 33766 1/0/1 google.com. A 142.251.46.206 (55)
20:10:07.890921 veth640b06a5 P   IP (tos 0xc0, ttl 64, id 58748, offset 0, flags [none], proto ICMP (1), length 111)
    10.4.5.6 > node-host: ICMP 10.4.5.6 udp port 49549 unreachable, length 91
        IP (tos 0x0, ttl 64, id 7896, offset 0, flags [DF], proto UDP (17), length 83)
    node-host.domain > 10.4.5.6.49549: 33766 1/0/1 google.com. A 142.251.46.206 (55)
20:10:07.890935 cni0  In  IP (tos 0xc0, ttl 64, id 58748, offset 0, flags [none], proto ICMP (1), length 111)
    10.4.5.6 > node-host: ICMP 10.4.5.6 udp port 49549 unreachable, length 91
        IP (tos 0x0, ttl 64, id 7896, offset 0, flags [DF], proto UDP (17), length 83)
    node-host.domain > 10.4.5.6.49549: 33766 1/0/1 google.com. A 142.251.46.206 (55)
20:10:07.932176 cni0  Out ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.4.5.6 tell node-host, length 28
20:10:07.932185 veth640b06a5 Out ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.4.5.6 tell node-host, length 28
20:10:07.932228 veth640b06a5 P   ARP, Ethernet (len 6), IPv4 (len 4), Request who-has node-host tell 10.4.5.6, length 28
20:10:07.932234 veth640b06a5 P   ARP, Ethernet (len 6), IPv4 (len 4), Reply 10.4.5.6 is-at 01:23:45:67:89:0a (oui Unknown), length 28
20:10:07.932239 cni0  In  ARP, Ethernet (len 6), IPv4 (len 4), Reply 10.4.5.6 is-at 01:23:45:67:89:0a (oui Unknown), length 28
20:10:07.932240 cni0  In  ARP, Ethernet (len 6), IPv4 (len 4), Request who-has node-host tell 10.4.5.6, length 28
20:10:07.932253 cni0  Out ARP, Ethernet (len 6), IPv4 (len 4), Reply node-host is-at bc:de:f1:23:45:67 (oui Unknown), length 28
20:10:07.932258 veth640b06a5 Out ARP, Ethernet (len 6), IPv4 (len 4), Reply node-host is-at bc:de:f1:23:45:67 (oui Unknown), length 28

看起来 DNS ID 匹配得很好。至少服务器有响应。不知怎么的,我得到了 ICMP 无法访问,代码 3,端口无法访问。好的,IPTables 怎么样?

Chain KUBE-ROUTER-INPUT (1 references)
   6   570 KUBE-POD-<ID>  0    --  *      *       10.4.5.6         0.0.0.0/0            /* rule to jump traffic from POD name:netshoot namespace: default to chain KUBE-POD-<ID> */

Chain KUBE-NWPLCY-DEFAULT (18 references)
 pkts bytes target     prot opt in     out     source               destination
    3   237 MARK       0    --  *      *       0.0.0.0/0            0.0.0.0/0            /* rule to mark traffic matching a network policy */ MARK or 0x10000

Chain KUBE-POD-<ID> (7 references)
 pkts bytes target     prot opt in     out     source               destination
    3   333 ACCEPT     0    --  *      *       0.0.0.0/0            0.0.0.0/0            /* rule for stateful firewall for pod */ ctstate RELATED,ESTABLISHED
    0     0 DROP       0    --  *      *       0.0.0.0/0            0.0.0.0/0            /* rule to drop invalid state for pod */ ctstate INVALID
    3   249 ACCEPT     0    --  *      *       0.0.0.0/0            10.4.5.6         /* rule to permit the traffic traffic to pods when source is the pod's local node */ ADDRTYPE match src-type LOCAL
    3   237 KUBE-NWPLCY-DEFAULT  0    --  *      *       10.4.5.6         0.0.0.0/0            /* run through default egress network policy chain */
    0     0 KUBE-NWPLCY-DEFAULT  0    --  *      *       0.0.0.0/0            10.4.5.6         /* run through default ingress network policy chain */
    0     0 NFLOG      0    --  *      *       0.0.0.0/0            0.0.0.0/0            /* rule to log dropped traffic POD name:netshoot namespace: default */ mark match ! 0x10000/0x10000 limit: avg 10/min burst 10 nflog-group 100
    0     0 REJECT     0    --  *      *       0.0.0.0/0            0.0.0.0/0            /* rule to REJECT traffic destined for POD name:netshoot namespace: default */ mark match ! 0x10000/0x10000 reject-with icmp-port-unreachable
    3   237 MARK       0    --  *      *       0.0.0.0/0            0.0.0.0/0            MARK and 0xfffeffff
    3   237 MARK       0    --  *      *       0.0.0.0/0            0.0.0.0/0            /* set mark to ACCEPT traffic that comply to network policies */ MARK or 0x20000

这些规则是否得到满足? 很好地表明,每当发送查询时,连接的watch iptables -vn -L KUBE-POD-<ID>首要ACCEPT规则ESTABLISHED都会适当增加。 它会在请求期间增加 3 次,然后停止。规则从不增加。 事实上,没有一条规则会在整个请求期间增加其数据包计数(由和验证)。digdigREJECTREJECTdigwatch "iptables -vn -L | grep REJECTwatch "iptables -vn -L | grep icmp-port-unreachable

嗯,也许命名空间 iptables 规则不同。

好吧,我运行了这个有趣的小脚本,并查看了其他命名空间的 iptables 规则。我拥有的一些服务有一些规则,但没有与 netshoot 容器或 technitium 容器相关的规则。(Technitium 无论如何都在主机模式下运行。)可能有更简单的方法可以做到这一点哈哈,但没关系。不忍心看到别人会如何处理这个问题。

#!/bin/bash

#numbers.txt contains the PID of every running process

while IFS= read -r num; do
    echo "Output for number $num:" >> output.txt
    nsenter --net=/proc/$num/ns/net iptables -nv -L >> output.txt
    echo "" >> output.txt # Optional: adds an extra newline for readability
done < numbers.txt

我还过滤了此输出的REJECT规则,运行时它们均未增加dig

挖掘源代码以查找错误消息(和这里), (这些都是唯一的地方该错误的搜索词出现的位置)

凭直觉,ip route 显示:

debug:~# ip route
default via 10.4.0.1 dev eth0 
10.4.0.0/24 dev eth0 proto kernel scope link src 10.4.5.6
10.4.0.0/16 via 10.4.0.1 dev eth0 

显然没有通往的路线10.1.2.3

但是,由于这一系列命令,容器似乎能够以某种方式访问​​主机:

node-host$ nc -l -p 12345
netshoot$ echo "hello" | nc 10.4.5.6 12345

# the above works fine

netshoot$ nc -l -p 12345
node-host$ echo "hello" | nc 10.1.2.3 12345

# Also works fine

添加dig -d到初始 DNS 命令会导致:

(10.6.7.8 是 coredns ip)

debug:~# dig -d @10.1.2.3 google.com
setup_libs()
setup_system()
create_search_list()
ndots is 5.
timeout is 0.
retries is 3.
get_server_list()
make_server(10.6.7.8)
dig_query_setup
parse_args()
making new lookup
make_empty_lookup()
make_empty_lookup() = 0x7f230e3dd050->references = 1
digrc (open)
main parsing -d
main parsing @10.1.2.3
make_server(10.1.2.3)
main parsing google.com
clone_lookup()
make_empty_lookup()
make_empty_lookup() = 0x7f230e3de590->references = 1
clone_server_list()
make_server(10.1.2.3)
looking up google.com
dig_startup()
lock_lookup dighost.c:4659
success
start_lookup()
setup_lookup(0x7f230e3de590)
resetting lookup counter.
using root origin
recursive query
AD query
add_question()
starting to render the message
add_opt()
done rendering
create query 0x7f230e64ccc0 linked to lookup 0x7f230e3de590
dighost.c:2177:lookup_attach(0x7f230e3de590) = 2
dighost.c:2690:new_query(0x7f230e64ccc0) = 1
do_lookup()
start_udp(0x7f230e64ccc0)
dighost.c:3301:query_attach(0x7f230e64ccc0) = 2
working on lookup 0x7f230e3de590, query 0x7f230e64ccc0
dighost.c:3346:query_attach(0x7f230e64ccc0) = 3
unlock_lookup dighost.c:4661
udp_ready()
udp_ready(0x7f230e64ce60, success, 0x7f230e64ccc0)
lock_lookup dighost.c:3188
success
dighost.c:3189:lookup_attach(0x7f230e3de590) = 3
dighost.c:3261:query_attach(0x7f230e64ccc0) = 4
recving with lookup=0x7f230e3de590, query=0x7f230e64ccc0, handle=0x7f230e64ce60
recvcount=1
have local timeout of 5000
dighost.c:3135:query_attach(0x7f230e64ccc0) = 5
sending a request
sendcount=1
dighost.c:1761:query_detach(0x7f230e64ccc0) = 4
dighost.c:3281:query_detach(0x7f230e64ccc0) = 3
dighost.c:3282:lookup_detach(0x7f230e3de590) = 2
unlock_lookup dighost.c:3283
send_done(0x7f230e64ce60, success, 0x7f230e64ccc0)
sendcount=0
lock_lookup dighost.c:2765
success
dighost.c:2769:lookup_attach(0x7f230e3de590) = 3
dighost.c:2787:query_detach(0x7f230e64ccc0) = 2
dighost.c:2788:lookup_detach(0x7f230e3de590) = 2
check_if_done()
list empty
unlock_lookup dighost.c:2792
recv_done(0x7f230e64ce60, timed out, 0x7f230e4f78d8, 0x7f230e64ccc0)
lock_lookup dighost.c:3955
success
recvcount=0
dighost.c:3960:lookup_attach(0x7f230e3de590) = 3
;; communications error to 10.1.2.3#53: timed out

我真的迷路了。看来 IPTables 是正确的查找地点,因为这是我能想到的 icmp-port-unreachable 消息的唯一来源。

相关内容