(注:我不是网络工程师)我们正在将文件发送给外部供应商,并在不同的服务上随机超时。似乎我们在较大的文件上最常发生超时。我们进行了数据包捕获,显示我们的窗口正在缩小,并怀疑小有效负载在窗口达到 0 之前完成,而大有效负载会给我们一个 RST。
11369 > su-mit-tg [ACK] Seq=677231 Ack=253694 Win=32768 Len=0
11369 > su-mit-tg [ACK] Seq=677231 Ack=256614 Win=29848 Len=0
11369 > su-mit-tg [ACK] Seq=677231 Ack=259534 Win=26928 Len=0
11369 > su-mit-tg [ACK] Seq=677231 Ack=262454 Win=24008 Len=0
11369 > su-mit-tg [ACK] Seq=677231 Ack=265374 Win=21088 Len=0
11369 > su-mit-tg [ACK] Seq=677231 Ack=268294 Win=18168 Len=0
11369 > su-mit-tg [ACK] Seq=677231 Ack=271214 Win=15248 Len=0
11369 > su-mit-tg [ACK] Seq=677231 Ack=274134 Win=12328 Len=0
11369 > su-mit-tg [ACK] Seq=677231 Ack=277054 Win=9408 Len=0
11369 > su-mit-tg [ACK] Seq=677231 Ack=279974 Win=6488 Len=0
11369 > su-mit-tg [ACK] Seq=677231 Ack=282894 Win=3568 Len=0
11369 > su-mit-tg [ACK] Seq=677231 Ack=285814 Win=648 Len=0
编辑:我指的是我们从应用程序调用的不同 Web 服务。超时不会在特定服务上持续失败,而是在不同时间影响所有服务。我无法从其他网络发送它。
答案1
我认为这个问题与 IO 问题或应用程序问题有关,并且由于某种原因套接字缓冲区已经用完了空间
我做了类似这样的事情来重现 Linux 中与 IO 相关的问题:
/dev/vdb 2.0G 1.6G 470M 77% /brick1
[root@nod01 ~]# ls -l /dev/vdb
brw-rw---- 1 root disk 252, 16 Apr 19 22:46 /dev/vdb
echo "252:16 $((1024*250))" > /sys/fs/cgroup/blkio/blkio.throttle.write_bps_device ## Limit write to 250KB per second
cd /brick1 ## change directory for downloading the Centos Iso
wget ftp://mirror.fdcservers.net/centos/6.4/isos/x86_64/CentOS-6.4-x86_64-bin-DVD2.iso
00:19:58.992042 IP mirror.50966 > nod01.example.com.46637: Flags [.], ack 1, win 46, options [nop,nop,TS val 2662018758 ecr 5131800], length 0
00:19:58.992107 IP nod01.example.com.46637 > mirror.50966: Flags [.], ack 11256736, win 0, options [nop,nop,TS val 5144749 ecr 2661992655], length 0 ## I'm telling to the sender, please don't send me more data, because my socket buffer is full
[root@nod01 ~]# netstat -tunap | grep wget
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 5264896 0 192.168.122.244:46637 208.53.158.34:50966 ESTABLISHED 15574/wget #### note the sender has 5M of data in the doesn't buffer, because it cannot write fast in /brick1 as data arrive
tcp 0 0 192.168.122.244:51331 208.53.158.34:21 ESTABLISHED 15574/wget