AWS 中的 TCP 连接冻结/丢失

AWS 中的 TCP 连接冻结/丢失

我们在同一个可用区内拥有多个 AWS EC2 实例,它们彼此之间传输大量网络流量。在一小部分连接中,当主机 A 上的客户端连接到主机 B 上的服务器并以高速率从 A 向 B 发送大量数据(例如 20 GB)时,TCP 连接会冻结或超时。我已经调查过这个问题,症状并不总是相同的,但通常情况下,当连接受到此问题的影响时,发送方(在主机 A 上)会在一段时间后停止接收接收方(主机 B)发送给 A 的 ACK。因此,一开始所有 ACK 都会通过,然后它们在连接中途被阻止。此外,VPC 流日志显示从主机 B(接收方)返回到主机 A(发送方)的一些数据包被拒绝。

这发生在许多 EC2 实例(通常是 r5a.xlarge)上,这些实例运行 Debian Linux 10,Linux 内核为 5.3.9,并且 ENA AWS 网络驱动程序是 Debian 内核的一部分。它们运行 Docker 18.09.1,通过 docker.io Debian Buster 包安装。有趣的是,我无法在 Amazon Linux 2(安装了 Docker)上重现该问题。

通过让以下简单实验循环运行一段时间,我已经能够重现它:

# Host B (server receiving data)
docker run -it --rm -p 20098:20098 debian:buster bash
apt-get update && apt-get -y install netcat-openbsd
while true; do date; nc -l -p 20098 | dd of=/dev/null bs=1M; done

# Host A (client sending data)
docker run -it --rm debian:buster bash
apt-get update && apt-get -y install netcat-openbsd
while sleep 1; do date; dd if=/dev/zero bs=1M count=20480 | nc -q 1 <server> 20098; done

绝大多数情况下,实验都会成功通过网络发送 20 GB,但偶尔(有时在几分钟内,有时在几小时甚至几天内)传输会由于意外断开连接/超时而卡住或中断。在某些主机上,我可以比在其他主机上更轻松地重现问题。我可以更快地重现此问题的主机往往拥有更多 Docker 容器和网络活动,但我还不确定其中是否存在因果关系。在主机上直接运行上述 netcat 实验而不是在 Docker 容器中时,我也能够直接重现该问题,尽管以这种方式重现似乎要困难得多。这发生在同一 VPC、AZ 甚至子网内的主机上,因此我们可以排除跨区域/跨 AZ/跨子网连接问题是原因。

以下是发生这种情况时显示网络活动的示例 tcpdump 输出。我跳过了同一连接中许多成功传输和确认的 TCP 数据包。此信息是使用 捕获的tcpdump -i eth0 -p -G 600 -s 80 -w ... host ... and port 20098。这是在主机的网络接口上捕获的,而不是在 Docker 网络内,因此已经应用了网络地址转换。

主机 A(172.20.3.188,发送客户端)上的 Tcpdump 输出:

08:00:03.615061 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322435576:322444525, ack 1, win 491, options [nop,nop,TS val 4223113896 ecr 683441101], length 8949
08:00:03.615064 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322444525:322453474, ack 1, win 491, options [nop,nop,TS val 4223113896 ecr 683441101], length 8949
08:00:03.615066 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322462423, ack 1, win 491, options [nop,nop,TS val 4223113896 ecr 683441101], length 8949
08:00:03.615069 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322462423:322471372, ack 1, win 491, options [nop,nop,TS val 4223113896 ecr 683441101], length 8949
08:00:03.615071 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322471372:322480321, ack 1, win 491, options [nop,nop,TS val 4223113896 ecr 683441101], length 8949
08:00:03.615073 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322480321:322489270, ack 1, win 491, options [nop,nop,TS val 4223113896 ecr 683441101], length 8949
08:00:03.615076 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322489270:322498219, ack 1, win 491, options [nop,nop,TS val 4223113896 ecr 683441101], length 8949
08:00:03.615140 IP 172.20.3.89.20098 > 172.20.3.188.35506: Flags [.], ack 322435576, win 256, options [nop,nop,TS val 683441101 ecr 4223113896], length 0
08:00:03.615178 IP 172.20.3.89.20098 > 172.20.3.188.35506: Flags [.], ack 322453474, win 117, options [nop,nop,TS val 683441101 ecr 4223113896], length 0
08:00:03.824740 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322462423, ack 1, win 491, options [nop,nop,TS val 4223114105 ecr 683441101], length 8949
08:00:04.256748 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322462423, ack 1, win 491, options [nop,nop,TS val 4223114537 ecr 683441101], length 8949
08:00:05.084733 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322462423, ack 1, win 491, options [nop,nop,TS val 4223115365 ecr 683441101], length 8949
08:00:06.748724 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322462423, ack 1, win 491, options [nop,nop,TS val 4223117029 ecr 683441101], length 8949
08:00:10.108720 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322462423, ack 1, win 491, options [nop,nop,TS val 4223120389 ecr 683441101], length 8949
08:00:16.764722 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322462423, ack 1, win 491, options [nop,nop,TS val 4223127045 ecr 683441101], length 8949
08:00:30.076723 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322462423, ack 1, win 491, options [nop,nop,TS val 4223140357 ecr 683441101], length 8949
08:00:57.724718 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322462423, ack 1, win 491, options [nop,nop,TS val 4223168005 ecr 683441101], length 8949
08:01:50.972736 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322462423, ack 1, win 491, options [nop,nop,TS val 4223221253 ecr 683441101], length 8949
08:03:37.468722 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322462423, ack 1, win 491, options [nop,nop,TS val 4223327749 ecr 683441101], length 8949
08:05:38.304715 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322462423, ack 1, win 491, options [nop,nop,TS val 4223448585 ecr 683441101], length 8949
08:07:39.132913 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322462423, ack 1, win 491, options [nop,nop,TS val 4223569414 ecr 683441101], length 8949

主机 B(172.20.3.89,接收服务器)上的 Tcpdump 输出:

08:00:03.615206 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322435576:322453474, ack 1, win 491, options [nop,nop,TS val 4223113896 ecr 683441101], length 17898
08:00:03.615225 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322498219, ack 1, win 491, options [nop,nop,TS val 4223113896 ecr 683441101], length 44745
08:00:03.615228 IP 172.20.3.89.20098 > 172.20.3.188.35506: Flags [.], ack 322453474, win 117, options [nop,nop,TS val 683441101 ecr 4223113896], length 0
08:00:03.615256 IP 172.20.3.89.20098 > 172.20.3.188.35506: Flags [.], ack 322498219, win 0, options [nop,nop,TS val 683441101 ecr 4223113896], length 0
08:00:03.615908 IP 172.20.3.89.20098 > 172.20.3.188.35506: Flags [.], ack 322498219, win 1642, options [nop,nop,TS val 683441102 ecr 4223113896], length 0
08:00:03.616389 IP 172.20.3.89.20098 > 172.20.3.188.35506: Flags [.], ack 322498219, win 3373, options [nop,nop,TS val 683441102 ecr 4223113896], length 0
08:00:03.618742 IP 172.20.3.89.20098 > 172.20.3.188.35506: Flags [.], ack 322498219, win 6862, options [nop,nop,TS val 683441105 ecr 4223113896], length 0
08:00:03.621737 IP 172.20.3.89.20098 > 172.20.3.188.35506: Flags [.], ack 322498219, win 13913, options [nop,nop,TS val 683441108 ecr 4223113896], length 0
08:00:03.824879 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322462423, ack 1, win 491, options [nop,nop,TS val 4223114105 ecr 683441101], length 8949
08:00:03.824905 IP 172.20.3.89.20098 > 172.20.3.188.35506: Flags [.], ack 322498219, win 24576, options [nop,nop,TS val 683441311 ecr 4223113896,nop,nop,sack 1 {322453474:322462423}], length 0
08:00:04.256895 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322462423, ack 1, win 491, options [nop,nop,TS val 4223114537 ecr 683441101], length 8949
08:00:04.256929 IP 172.20.3.89.20098 > 172.20.3.188.35506: Flags [.], ack 322498219, win 24576, options [nop,nop,TS val 683441743 ecr 4223113896,nop,nop,sack 1 {322453474:322462423}], length 0
08:00:05.084873 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322462423, ack 1, win 491, options [nop,nop,TS val 4223115365 ecr 683441101], length 8949
08:00:05.084908 IP 172.20.3.89.20098 > 172.20.3.188.35506: Flags [.], ack 322498219, win 24576, options [nop,nop,TS val 683442571 ecr 4223113896,nop,nop,sack 1 {322453474:322462423}], length 0
08:00:06.748872 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322462423, ack 1, win 491, options [nop,nop,TS val 4223117029 ecr 683441101], length 8949
08:00:06.748901 IP 172.20.3.89.20098 > 172.20.3.188.35506: Flags [.], ack 322498219, win 24576, options [nop,nop,TS val 683444235 ecr 4223113896,nop,nop,sack 1 {322453474:322462423}], length 0
08:00:10.108863 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322462423, ack 1, win 491, options [nop,nop,TS val 4223120389 ecr 683441101], length 8949
08:00:10.108889 IP 172.20.3.89.20098 > 172.20.3.188.35506: Flags [.], ack 322498219, win 24576, options [nop,nop,TS val 683447595 ecr 4223113896,nop,nop,sack 1 {322453474:322462423}], length 0
08:00:16.764877 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322462423, ack 1, win 491, options [nop,nop,TS val 4223127045 ecr 683441101], length 8949
08:00:16.764905 IP 172.20.3.89.20098 > 172.20.3.188.35506: Flags [.], ack 322498219, win 24576, options [nop,nop,TS val 683454251 ecr 4223113896,nop,nop,sack 1 {322453474:322462423}], length 0
08:00:30.076864 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322462423, ack 1, win 491, options [nop,nop,TS val 4223140357 ecr 683441101], length 8949
08:00:30.076881 IP 172.20.3.89.20098 > 172.20.3.188.35506: Flags [.], ack 322498219, win 24576, options [nop,nop,TS val 683467563 ecr 4223113896,nop,nop,sack 1 {322453474:322462423}], length 0
08:00:57.724863 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322462423, ack 1, win 491, options [nop,nop,TS val 4223168005 ecr 683441101], length 8949
08:00:57.724877 IP 172.20.3.89.20098 > 172.20.3.188.35506: Flags [.], ack 322498219, win 24576, options [nop,nop,TS val 683495211 ecr 4223113896,nop,nop,sack 1 {322453474:322462423}], length 0
08:01:50.972908 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322462423, ack 1, win 491, options [nop,nop,TS val 4223221253 ecr 683441101], length 8949
08:01:50.972922 IP 172.20.3.89.20098 > 172.20.3.188.35506: Flags [.], ack 322498219, win 24576, options [nop,nop,TS val 683548459 ecr 4223113896,nop,nop,sack 1 {322453474:322462423}], length 0
08:03:37.468882 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322462423, ack 1, win 491, options [nop,nop,TS val 4223327749 ecr 683441101], length 8949
08:03:37.468902 IP 172.20.3.89.20098 > 172.20.3.188.35506: Flags [.], ack 322498219, win 24576, options [nop,nop,TS val 683654955 ecr 4223113896,nop,nop,sack 1 {322453474:322462423}], length 0
08:05:38.304895 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322462423, ack 1, win 491, options [nop,nop,TS val 4223448585 ecr 683441101], length 8949
08:05:38.304942 IP 172.20.3.89.20098 > 172.20.3.188.35506: Flags [.], ack 322498219, win 24576, options [nop,nop,TS val 683775791 ecr 4223113896,nop,nop,sack 1 {322453474:322462423}], length 0
08:07:39.133073 IP 172.20.3.188.35506 > 172.20.3.89.20098: Flags [.], seq 322453474:322462423, ack 1, win 491, options [nop,nop,TS val 4223569414 ecr 683441101], length 8949
08:07:39.133092 IP 172.20.3.89.20098 > 172.20.3.188.35506: Flags [.], ack 322498219, win 24576, options [nop,nop,TS val 683896619 ecr 4223113896,nop,nop,sack 1 {322453474:322462423}], length 0

注意主机 A 在收到08:00:03.615178 ... ack 322453474数据包后如何停止接收来自主机 B 的数据包。

以下是连接失败期间 VPC 流日志的输出(捕获时间与上面的 tcpdump 输出不同):

VPC 流日志输出

鉴于 Amazon Linux 2 似乎没有出现此问题,我尝试将 Debian 上的网络堆栈与 Amazon Linux 更紧密地结合起来。我尝试在 Debian 实例上执行以下操作:

  • 将一些网络 sysctl 设置从 Amazon Linux 应用于 Debian
  • 将 Linux 内核升级到 5.8.10
  • 将 ena 驱动程序升级到 2.2.11
  • 将 Docker 升级到 19.03.13
  • 明确允许这些主机使用的安全组中 VPC 内的所有 IP 之间的临时端口 (32768-65535) 的进出流量

这些似乎都无法解决我遇到的问题。什么可能导致这些数据包被丢弃/拒绝?

相关内容