FreeBSD 机器对流的前几个数据包没有响应

FreeBSD 机器对流的前几个数据包没有响应

在我管理的站点前面有两台机器作为反向代理缓存/负载平衡器运行。最近在高峰时段,我发现一个问题,即任何数据包流(ICMP、UDP、TCP 等)的前几个数据包到达机器但没有得到响应。

以下是从 ping 机器的人的角度来看的症状:

PING X.X.X.X (X.X.X.X): 56 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
Request timeout for icmp_seq 2
Request timeout for icmp_seq 3
Request timeout for icmp_seq 4
Request timeout for icmp_seq 5
Request timeout for icmp_seq 6
Request timeout for icmp_seq 7
Request timeout for icmp_seq 8
Request timeout for icmp_seq 9
Request timeout for icmp_seq 10
64 bytes from X.X.X.X: icmp_seq=11 ttl=62 time=47.515 ms
64 bytes from X.X.X.X: icmp_seq=12 ttl=62 time=46.108 ms
64 bytes from X.X.X.X: icmp_seq=13 ttl=62 time=48.893 ms
64 bytes from X.X.X.X: icmp_seq=14 ttl=62 time=47.466 ms
64 bytes from X.X.X.X: icmp_seq=15 ttl=62 time=49.679 ms
64 bytes from X.X.X.X: icmp_seq=16 ttl=62 time=50.011 ms
64 bytes from X.X.X.X: icmp_seq=17 ttl=62 time=49.324 ms
64 bytes from X.X.X.X: icmp_seq=18 ttl=62 time=48.989 ms
64 bytes from X.X.X.X: icmp_seq=19 ttl=62 time=51.003 ms
64 bytes from X.X.X.X: icmp_seq=20 ttl=62 time=48.612 ms

以下是受影响计算机上通过 tcpdump 看到的 HTTP 会话(XXXX -> 受影响的机器,CCCC -> 客户端发出请求):

21:46:27.105396 IP C.C.C.C.62425 > X.X.X.X.80: Flags [S], seq 139436485, win 65535, options [mss 1380,nop,wscale 3,nop,nop,TS val 398010008 ecr 0,sackOK,eol], length 0
21:46:28.032300 IP C.C.C.C.62425 > X.X.X.X.80: Flags [S], seq 139436485, win 65535, options [mss 1380,nop,wscale 3,nop,nop,TS val 398010017 ecr 0,sackOK,eol], length 0
21:46:28.032337 IP X.X.X.X.80 > C.C.C.C.62425: Flags [S.], seq 1108838018, ack 139436486, win 65535, options [mss 1380,nop,wscale 9,sackOK,TS val 1918451162 ecr 398010017], length 0
21:46:28.064417 IP C.C.C.C.62425 > X.X.X.X.80: Flags [.], ack 1, win 65535, options [nop,nop,TS val 398010018 ecr 1918451162], length 0
21:46:28.064438 IP C.C.C.C.62425 > X.X.X.X.80: Flags [P.], ack 1, win 65535, options [nop,nop,TS val 398010018 ecr 1918451162], length 160
21:46:28.165372 IP X.X.X.X.80 > C.C.C.C.62425: Flags [.], ack 161, win 128, options [nop,nop,TS val 1918451296 ecr 398010018], length 0
21:46:28.165933 IP X.X.X.X.80 > C.C.C.C.62425: Flags [.], ack 161, win 128, options [nop,nop,TS val 1918451296 ecr 398010018], length 1368
21:46:28.219978 IP C.C.C.C.62425 > X.X.X.X.80: Flags [.], ack 1369, win 65535, options [nop,nop,TS val 398010019 ecr 1918451296], length 0
21:46:28.220001 IP X.X.X.X.80 > C.C.C.C.62425: Flags [.], ack 161, win 128, options [nop,nop,TS val 1918451350 ecr 398010019], length 1368
21:46:28.220011 IP X.X.X.X.80 > C.C.C.C.62425: Flags [.], ack 161, win 128, options [nop,nop,TS val 1918451350 ecr 398010019], length 1368
21:46:28.288178 IP C.C.C.C.62425 > X.X.X.X.80: Flags [.], ack 4105, win 65493, options [nop,nop,TS val 398010020 ecr 1918451350], length 0
21:46:28.288196 IP X.X.X.X.80 > C.C.C.C.62425: Flags [.], ack 161, win 128, options [nop,nop,TS val 1918451418 ecr 398010020], length 1368
21:46:28.288203 IP X.X.X.X.80 > C.C.C.C.62425: Flags [.], ack 161, win 128, options [nop,nop,TS val 1918451418 ecr 398010020], length 1368
21:46:28.288210 IP X.X.X.X.80 > C.C.C.C.62425: Flags [.], ack 161, win 128, options [nop,nop,TS val 1918451418 ecr 398010020], length 1368
21:46:28.288217 IP X.X.X.X.80 > C.C.C.C.62425: Flags [.], ack 161, win 128, options [nop,nop,TS val 1918451418 ecr 398010020], length 1368
21:46:28.333968 IP C.C.C.C.62425 > X.X.X.X.80: Flags [.], ack 6841, win 65493, options [nop,nop,TS val 398010020 ecr 1918451418], length 0
21:46:28.333986 IP X.X.X.X.80 > C.C.C.C.62425: Flags [.], ack 161, win 128, options [nop,nop,TS val 1918451464 ecr 398010020], length 1368
21:46:28.333994 IP X.X.X.X.80 > C.C.C.C.62425: Flags [.], ack 161, win 128, options [nop,nop,TS val 1918451464 ecr 398010020], length 1368
21:46:28.338939 IP C.C.C.C.62425 > X.X.X.X.80: Flags [.], ack 9577, win 65493, options [nop,nop,TS val 398010020 ecr 1918451418], length 0
21:46:28.338955 IP X.X.X.X.80 > C.C.C.C.62425: Flags [.], ack 161, win 128, options [nop,nop,TS val 1918451469 ecr 398010020], length 1368
21:46:28.338962 IP X.X.X.X.80 > C.C.C.C.62425: Flags [.], ack 161, win 128, options [nop,nop,TS val 1918451469 ecr 398010020], length 1368
21:46:28.349943 IP C.C.C.C.62425 > X.X.X.X.80: Flags [.], ack 12313, win 65535, options [nop,nop,TS val 398010021 ecr 1918451464], length 0
21:46:28.354190 IP C.C.C.C.62425 > X.X.X.X.80: Flags [.], ack 15049, win 65535, options [nop,nop,TS val 398010021 ecr 1918451469], length 0
21:46:28.354206 IP X.X.X.X.80 > C.C.C.C.62425: Flags [P.], ack 161, win 128, options [nop,nop,TS val 1918451484 ecr 398010021], length 8
21:46:28.393441 IP C.C.C.C.62425 > X.X.X.X.80: Flags [.], ack 15057, win 65535, options [nop,nop,TS val 398010021 ecr 1918451484], length 0
21:46:28.393452 IP C.C.C.C.62425 > X.X.X.X.80: Flags [F.], seq 161, ack 15057, win 65535, options [nop,nop,TS val 398010021 ecr 1918451484], length 0
21:46:28.393467 IP X.X.X.X.80 > C.C.C.C.62425: Flags [.], ack 162, win 128, options [nop,nop,TS val 1918451524 ecr 398010021], length 0
21:46:28.393481 IP X.X.X.X.80 > C.C.C.C.62425: Flags [F.], seq 15057, ack 162, win 128, options [nop,nop,TS val 1918451524 ecr 398010021], length 0
21:46:28.445126 IP C.C.C.C.62425 > X.X.X.X.80: Flags [.], ack 15057, win 65535, options [nop,nop,TS val 398010021 ecr 1918451524], length 0
21:46:28.445138 IP C.C.C.C.62425 > X.X.X.X.80: Flags [.], ack 15058, win 65535, options [nop,nop,TS val 398010021 ecr 1918451524], length 0

两台机器位于一个 CARP 池中,其中一台为活动机器,另一台为备用机器。机器的硬件和配置完全相同。问题仅影响活动机器,并且对于流向专用机器 IP 地址和 CARPed 浮动 IP 的流量可见。将活动机器交换为备用机器或将备用机器交换为活动机器可以转移问题,因此我很确定这不是硬件问题。

他们使用 pf 作为防火墙,并对其后面的机器进行 NAT 流量。

他们运行的是 FreeBSD 8.0-RELEASE-p5。虽然他们的内核是定制的,但那只是为了添加使用 CARP 所需的位。内核配置如下:

include GENERIC

ident LOADBALANCER

device pf
device pflog
device pfsync
device carp

网卡是使用 em 驱动程序的 Intel 82574L。

有什么线索吗?

答案1

后来发现问题出在 pf 上。

FreeBSD 上的 pf 默认将状态表条目数限制为 10,000。自适应超时在大多数时间都能够满足要求,但在高峰时段却无法应付。

相关内容