我有一个使用 TCP 连接运行的游戏服务器。服务器会随机断开用户连接。我认为这与服务器的 TCP 设置有关。
在本地开发环境中,编写的代码可以处理 8000 多个并发用户,而不会出现任何断开连接或错误(在本地主机)。
但在实际部署的 Centos 5 64 位服务器中,服务器产生的这些断开连接与并发 tcp 连接量无关。
服务器似乎无法处理吞吐量。
netstat -s -t
IcmpMsg:
InType0: 31
InType3: 87717
InType4: 699
InType5: 2
InType8: 1023781
InType11: 7211
OutType0: 1023781
OutType3: 603
Tcp:
8612766 active connections openings
14255236 passive connection openings
12174 failed connection attempts
319225 connection resets received
723 connections established
6351090913 segments received
6180297746 segments send out
45791634 segments retransmited
0 bad segments received.
1664280 resets sent
TcpExt:
46244 invalid SYN cookies received
3745 resets received for embryonic SYN_RECV sockets
327 ICMP packets dropped because they were out-of-window
1 ICMP packets dropped because socket was locked
11475281 TCP sockets finished time wait in fast timer
140 time wait sockets recycled by time stamp
1569 packets rejects in established connections because of timestamp
103783714 delayed acks sent
6929 delayed acks further delayed because of locked socket
Quick ack mode was activated 6210096 times
1806 times the listen queue of a socket overflowed
1806 SYNs to LISTEN sockets ignored
1080380601 packets directly queued to recvmsg prequeue.
31441059 packets directly received from backlog
5272599307 packets directly received from prequeue
324498008 packets header predicted
1143146 packets header predicted and directly queued to user
3217838883 acknowledgments not containing data received
1027969883 predicted acknowledgments
395 times recovered from packet loss due to fast retransmit
257420 times recovered from packet loss due to SACK data
5843 bad SACKs received
Detected reordering 29 times using FACK
Detected reordering 12 times using SACK
Detected reordering 1 times using reno fast retransmit
Detected reordering 809 times using time stamp
1602 congestion windows fully recovered
1917 congestion windows partially recovered using Hoe heuristic
TCPDSACKUndo: 8196226
7850525 congestion windows recovered after partial ack
139681 TCP data loss events
TCPLostRetransmit: 26
10139 timeouts after reno fast retransmit
2802678 timeouts after SACK recovery
86212 timeouts in loss state
273698 fast retransmits
19494 forward retransmits
2637236 retransmits in slow start
33381883 other TCP timeouts
TCPRenoRecoveryFail: 92
19488 sack retransmits failed
7 times receiver scheduled too late for direct processing
6354641 DSACKs sent for old packets
333 DSACKs sent for out of order packets
20615579 DSACKs received
2724 DSACKs for out of order packets received
123034 connections reset due to unexpected data
91876 connections reset due to early user close
169244 connections aborted due to timeout
28736 times unabled to send RST due to no memory
IpExt:
InMcastPkts: 2
让我思考的是这些看上去很成问题。
123034 connections reset due to unexpected data
91876 connections reset due to early user close
28736 times unabled to send RST due to no memory
我该如何修复这些错误?我需要进行 TCP 调整吗?
编辑:一些 sysctl 信息:
sysctl -A | grep net | grep mem
net.ipv4.udp_wmem_min = 4096
net.ipv4.udp_rmem_min = 4096
net.ipv4.udp_mem = 772704 1030272 1545408
net.ipv4.tcp_rmem = 4096 87380 4194304
net.ipv4.tcp_wmem = 4096 16384 4194304
net.ipv4.tcp_mem = 196608 262144 393216
net.ipv4.igmp_max_memberships = 20
net.core.optmem_max = 20480
net.core.rmem_default = 129024
net.core.wmem_default = 129024
net.core.rmem_max = 131071
net.core.wmem_max = 131071
编辑:检测到的 2 个以太网卡的 ethtool 信息:
Settings for eth0:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
Supports Wake-on: g
Wake-on: d
Link detected: yes
Settings for eth1:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Advertised auto-negotiation: Yes
Speed: Unknown!
Duplex: Half
Port: Twisted Pair
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
Supports Wake-on: g
Wake-on: d
Link detected: no
答案1
您是否会提高 FD 限额?您可以在此处获取一些信息http://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files/
答案2
如果您所说的“服务器随机断开用户连接”是指客户端在没有预期的 FIN、ACK、RST 通信的情况下断开连接,那么我会首先解决半双工接口问题,特别是当您的开发环境的两个 NIC 都处于全双工状态时。eth1 接口处于半双工状态,而 Auto-negotiation=on 通常是由以下任一原因造成的:
- 交换机与服务器之间的自动协商失败。
- 交换机已禁用自动协商,明确设置端口的速度和双工。
我在情况 #2 中更常看到这种情况,但这可能是因为我已经有十多年没有发现自动协商失败并需要检查了。当一方是自动而另一方是硬编码(或无法响应)时,以太网自动协商行为是自动方进入半双工模式。
简单来说,Eth1 处于半双工模式会导致服务器仅通过接口发送或接收数据,而不是发送和接收。硬编码端仍将处于全双工模式,并尝试在从服务器接收数据的同时向服务器发送数据。但是,服务器会将此视为冲突,因为它假设存在冲突域,而全双工可消除冲突域。服务器将使用退避算法来安排重新传输。如果服务器继续遇到它认为的冲突,它将继续增加等待重新传输数据的时间。
因此,半双工与全双工伙伴之间的配合很容易导致客户端断开连接、吞吐量或性能问题、延迟峰值和其他各种问题。