问题陈述:升级/重启 KVM 主机后,无法通过 HTTP 访问 gitlab VM。
拓扑: 拓扑
故障排除说明:
gitlab VM 可通过 SSH 从拓扑中的所有主机访问。gitlab VM 可通过 HTTP 从其自身以及 KVM 主机 (wspbm) 上的多个接口访问,但不能从拓扑中的 PC 访问。IPTABLES 规则已添加到 INPUT 和 FORWARD 链以允许一切,没有变化。
从 PC SSH 到 gitlab VM:
[root@los-alamos]$ ssh gitlab ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 52:54:00:52:92:86 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.86/24 brd 192.168.1.255 scope global ens3
valid_lft forever preferred_lft forever
inet6 fe80::5054:ff:fe52:9286/64 scope link
valid_lft forever preferred_lft forever
[root@los-alamos]$
来自 KVM 主机 wspbm 的 Netcat 命令:
[root@wspbm]# nc -s 192.168.1.17 192.168.1.86 80
get
HTTP/1.1 400 Bad Request
Server: nginx
Date: Sat, 11 Aug 2018 14:13:36 GMT
Content-Type: text/html
Content-Length: 166
Connection: close
<html>
<head><title>400 Bad Request</title></head>
<body bgcolor="white">
<center><h1>400 Bad Request</h1></center>
<hr><center>nginx</center>
</body>
</html>
[root@wspbm]# nc -s 192.168.5.1 gitlab 80
get
HTTP/1.1 400 Bad Request
Server: nginx
Date: Sat, 11 Aug 2018 14:13:41 GMT
Content-Type: text/html
Content-Length: 166
Connection: close
<html>
<head><title>400 Bad Request</title></head>
<body bgcolor="white">
<center><h1>400 Bad Request</h1></center>
<hr><center>nginx</center>
</body>
</html>
[root@wspbm]#
从 gitlab 主机本身进行 Telnet:
root@gitlab:~# telnet 192.168.1.86 80
Trying 192.168.1.86...
Connected to 192.168.1.86.
Escape character is '^]'.
get
HTTP/1.1 400 Bad Request
Server: nginx
Date: Sat, 11 Aug 2018 16:32:27 GMT
Content-Type: text/html
Content-Length: 166
Connection: close
<html>
<head><title>400 Bad Request</title></head>
<body bgcolor="white">
<center><h1>400 Bad Request</h1></center>
<hr><center>nginx</center>
</body>
</html>
Connection closed by foreign host.
root@gitlab:~#
尽管我们得到了 400 多分,但这证明应用程序已启动并正在响应。
运行 tcpdump 显示 kvm 主机 wspbm 看到来自 gitlab kvm 客户机的重置:
[root@wspbm]# date ; tcpdump -vvv -i any host 192.168.1.86 and host 192.168.1.33 and port 80
Sat Aug 11 10:07:45 MDT 2018
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
10:07:47.940406 IP (tos 0x10, ttl 64, id 3351, offset 0, flags [DF], proto TCP (6), length 60)
192.168.1.33.39392 > gitlab.wsp.local.http: Flags [S], cksum 0x49b5 (correct), seq 475015689, win 29200, options [mss 1460,sackOK,TS val 13313822 ecr 0,nop,wscale 7], length 0
10:07:47.940492 IP (tos 0x10, ttl 64, id 36097, offset 0, flags [DF], proto TCP (6), length 40)
**gitlab.wsp.local.http > 192.168.1.33.39392: Flags [R.], cksum 0x4b7e (correct), seq 0, ack 475015690, win 0, length 0
10:07:47.940501 IP (tos 0x10, ttl 64, id 36097, offset 0, flags [DF], proto TCP (6), length 40)**
gitlab.wsp.local.http > 192.168.1.33.39392: Flags [R.], cksum 0x4b7e (correct), seq 0, ack 1, win 0, length 0 (NOT SURE WHY A SECOND RESET IS SEEN)
^C
3 packets captured
3 packets received by filter
0 packets dropped by kernel
[root@wspbm]#
提供 SYN 的主机只会看到一个重置:
root@los-alamos:~# date ; tcpdump -vvv -i enp3s0 host 192.168.1.86 and port 80
Sat Aug 11 10:07:47 MDT 2018
tcpdump: listening on enp3s0, link-type EN10MB (Ethernet), capture size 262144 bytes
10:07:47.919351 IP (tos 0x10, ttl 64, id 3351, offset 0, flags [DF], proto TCP (6), length 60)
192.168.1.33.39392 > gitlab.wsp.local.http: Flags [S], cksum 0x49b5 (correct), seq 475015689, win 29200, options [mss 1460,sackOK,TS val 13313822 ecr 0,nop,wscale 7], length 0
10:07:47.919838 IP (tos 0x10, ttl 64, id 36097, offset 0, flags [DF], proto TCP (6), length 40)
**gitlab.wsp.local.http > 192.168.1.33.39392: Flags [R.], cksum 0x4b7e (correct), seq 0, ack 475015690, win 0, length 0**
^C
2 packets captured
2 packets received by filter
0 packets dropped by kernel
root@los-alamos:~#
但是 gitlab kvm guest 从未看到任何流量:
root@gitlab:~# date ; tcpdump -vvv -i ens3 host 192.168.1.33 and dst port 80
Sat Aug 11 10:07:46 MDT 2018
tcpdump: listening on ens3, link-type EN10MB (Ethernet), capture size 262144 bytes
^C
0 packets captured
0 packets received by filter
0 packets dropped by kernel
root@gitlab:~#
在 INPUT 和 FORWARD iptables 链中添加全面接受规则后,上述结果相同:
[root@wspbm]# iptables -L | egrep '(INPUT|FORWARD)' -A 2
Chain INPUT (policy ACCEPT)
target prot opt source destination
ACCEPT all -- anywhere anywhere
--
Chain FORWARD (policy ACCEPT)
target prot opt source destination
ACCEPT all -- anywhere anywhere
[root@wspbm]#
提前感谢任何建议。我很乐意提供任何其他输出。
编辑:8/16/18
我在 KVM 主机操作系统(不在 gitlab VM 内)上启动了一个 nginx docker 容器,监听端口 80,仅用于测试。可以从 KVM 主机访问该服务:
[root@wspbm]# docker run --name nginx -d -p 80:80 nginx
f8e91cfec019e42354b4c3d7dac09947bb3b7f6ba6f75c2965b1524d6dc69e4a
[root@wspbm]# telnet 192.168.1.17 80
Trying 192.168.1.17...
Connected to 192.168.1.17.
Escape character is '^]'.
^]
telnet> quit
Connection closed.
[root@wspbm]#
但无法从拓扑中的桌面访问:
[root@los-alamos]$ telnet 192.168.1.17 80
Trying 192.168.1.17...
telnet: Unable to connect to remote host: Connection refused
[root@los-alamos]$
编辑:8/17/18
我将 gitlab VM 上使用的端口从 80 更改为 8888,现在可以从任何地方访问该服务。
Gitlab 正在监听 8888:
root@gitlab:~# lsof -i :8888
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
nginx 3255 root 7u IPv4 5570784 0t0 TCP *:8888 (LISTEN)
nginx 3256 gitlab-www 7u IPv4 5570784 0t0 TCP *:8888 (LISTEN)
nginx 3257 gitlab-www 7u IPv4 5570784 0t0 TCP *:8888 (LISTEN)
root@gitlab:~#
从桌面尝试连接:
root@los-alamos:~# telnet gitlab 8888
Trying 192.168.1.86...
Connected to gitlab.wsp.local.
Escape character is '^]'.
get
HTTP/1.1 400 Bad Request
Server: nginx
Date: Fri, 17 Aug 2018 11:36:42 GMT
Content-Type: text/html
Content-Length: 166
Connection: close
<html>
<head><title>400 Bad Request</title></head>
<body bgcolor="white">
<center><h1>400 Bad Request</h1></center>
<hr><center>nginx</center>
</body>
</html>
Connection closed by foreign host.
root@los-alamos:~#
虽然这解除了我正在从事的副项目的阻碍,但我觉得这并不能真正解决这里的潜在问题。如果有人对如何使虚拟机上的端口 80 可访问有任何建议,我很乐意保留此问题并继续进行故障排除,以防该解决方案将来对其他人有用。
答案1
搞清楚这里发生了什么。iptables 中的重定向发生在 nat 表中:
[root@wspbm]# iptables -L --line-numbers -t nat
Chain PREROUTING (policy ACCEPT)
num target prot opt source destination
1 REDIRECT tcp -- anywhere anywhere tcp dpt:http redir ports 3129
删除上述规则后,一切开始正常工作。我认为这里发生的事情是,使用 squid 的附带项目工作被搁置,规则被删除,但 iptables-persist 并没有忘记我的重定向规则,所以我的重启让我很头疼!
[root@wspbm]# cat /etc/iptables/rules.v4 | grep 3129
-A PREROUTING -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 3129
[root@wspbm]#
为了解决这个问题,删除了有效规则:
[root@wspbm]# iptables -t nat -D PREROUTING -p tcp --dport 80 -j REDIRECT --to 3129
[root@wspbm]#
然后使用活动规则更新存储的规则,现在该规则不包含有问题的重定向规则:
[root@wspbm]# cat /etc/iptables/rules.v4 | grep 3129
-A PREROUTING -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 3129
[root@wspbm]# cd /etc/iptables/
[root@wspbm]# cp rules.v4 rules.v4.08212018
[root@wspbm]# iptables-save > rules.v4
[root@wspbm]# cat rules.v4 | grep 3129
[root@wspbm]#
所有连接现在均按预期运行。将此问题标记为 PEBCAK。