我的基于弹性的监控系统经常会在目标为 AWS EC2 实例的 http 检查中报告超时。检查是从同一区域、同一可用区、同一子网中的另一个 EC2 实例执行的(但它到达的是公共 IP 而不是私有 IP)。
我无法弄清楚问题到底出在哪里,但是根据 wildfly 日志,流量甚至没有到达应用程序,然而在执行 tcpdump 时,我看到数据包到达目的地并且目的地尝试回复。
以下是流量突然停止时的 tcpdump 日志片段(记录在接收端点)
02:29:21.468733 IP (tos 0x0, ttl 63, id 50556, offset 0, flags [DF], proto TCP (6), length 60)
ec2-pu-bl-ic-ip.eu-central-1.compute.amazonaws.com.54138 > ip-172-31-17-107.eu-central-1.compute.internal.https: Flags [S], cksum 0x9bcd (correct), seq 110066568, win 62727, options [mss 1460,sackOK,TS val 2736748378 ecr 0,nop,wscale 7], length 0
02:29:21.468771 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
ip-172-31-17-107.eu-central-1.compute.internal.https > ec2-pu-bl-ic-ip.eu-central-1.compute.amazonaws.com.54138: Flags [S.], cksum 0x4592 (incorrect -> 0x6ade), seq 2350778599, ack 110066569, win 26847, options [mss 8961,sackOK,TS val 638903455 ecr 2736748378,nop,wscale 7], length 0
02:29:21.468931 IP (tos 0x0, ttl 63, id 50557, offset 0, flags [DF], proto TCP (6), length 52)
ec2-pu-bl-ic-ip.eu-central-1.compute.amazonaws.com.54138 > ip-172-31-17-107.eu-central-1.compute.internal.https: Flags [.], cksum 0x1dec (correct), seq 1, ack 1, win 491, options [nop,nop,TS val 2736748378 ecr 638903455], length 0
02:29:21.475997 IP (tos 0x0, ttl 63, id 50558, offset 0, flags [DF], proto TCP (6), length 569)
ec2-pu-bl-ic-ip.eu-central-1.compute.amazonaws.com.54138 > ip-172-31-17-107.eu-central-1.compute.internal.https: Flags [P.], cksum 0x1a5e (correct), seq 1:518, ack 1, win 491, options [nop,nop,TS val 2736748385 ecr 638903455], length 517
02:29:21.476022 IP (tos 0x0, ttl 64, id 55076, offset 0, flags [DF], proto TCP (6), length 52)
ip-172-31-17-107.eu-central-1.compute.internal.https > ec2-pu-bl-ic-ip.eu-central-1.compute.amazonaws.com.54138: Flags [.], cksum 0x458a (incorrect -> 0x1cee), seq 1, ack 518, win 219, options [nop,nop,TS val 638903457 ecr 2736748385], length 0
我们可以在最后一行看到,服务器响应了正在执行检查的主机,执行检查的主机上的 tcpdump 也收到了该数据包:
02:42:37.742465 IP ec2-pu-bl-ic-ip.eu-central-1.compute.amazonaws.com.https > elk-lb.57714: Flags [.], ack 518, win 219, options [nop,nop,TS val 639102523 ecr 2737544651], length 0
但什么也没有发生,相反,发生了超时。
我会将主机名映射到私有 IP 地址,并尝试查看这样检查是否更可靠,但我想检查公共端点以确保安全组设置和网络始终正常。