我们目前正在使用 Solaris 11 (SPARC) 在一些大型硬件上运行一些性能测试。这些测试包括发送 SOAP 请求(每个请求 50kb),运行良好,直到我们达到数千个用户(即 30,000 个用户),在大约 2 分钟后,我们开始在日志中看到许多连接超时错误。CPU 使用率和内存使用率很低,任何时候都不超过 15%。我们正在使用 WebLogic 11g 和 Oracle HTTP Server。
我已经调整了以下 TCP 参数,但它们似乎没有产生任何显著的差异:
_conn_req_max_q = 262144 (also tried 16384)
_conn_req_max_q0 = 16384 (also tried 4096 - increased to remove tcpListenDrop0 being above 0)
_time_wait_interval = 15000
我还将以下内容添加到 /etc/system:
set ip:ipcl_conn_hash_size=16834
netstat -sP tcp
在测试结束时运行(测试开始前服务器重新启动)导致:
TCP tcpRtoAlgorithm = 4 tcpRtoMin = 200
tcpRtoMax = 60000 tcpMaxConn = -1
tcpActiveOpens =133886 tcpPassiveOpens =584461
tcpAttemptFails =102899 tcpEstabResets =553474
tcpCurrEstab = 339 tcpOutSegs =35235864
tcpOutDataSegs =20302930 tcpOutDataBytes =842489656
tcpRetransSegs = 92070 tcpRetransBytes =337976
tcpOutAck =2044606 tcpOutAckDelayed =252534
tcpOutUrg = 0 tcpOutWinUpdate = 0
tcpOutWinProbe = 0 tcpOutControl =901262
tcpOutRsts = 29486 tcpOutFastRetrans = 0
tcpInSegs =39352489
tcpInAckSegs = 0 tcpInAckBytes =2742139410
tcpInDupAck = 32470 tcpInAckUnsent = 0
tcpInInorderSegs =15010534 tcpInInorderBytes =1321218448
tcpInUnorderSegs = 1515 tcpInUnorderBytes =2008280
tcpInDupSegs = 47362 tcpInDupBytes =160101
tcpInPartDupSegs = 0 tcpInPartDupBytes = 0
tcpInPastWinSegs = 0 tcpInPastWinBytes = 0
tcpInWinProbe = 0 tcpInWinUpdate = 0
tcpInClosed = 1099 tcpRttNoUpdate = 425
tcpRttUpdate =11258426 tcpTimRetrans =194800
tcpTimRetransDrop = 4 tcpTimKeepalive = 0
tcpTimKeepaliveProbe= 0 tcpTimKeepaliveDrop = 0
tcpListenDrop =300269 tcpListenDropQ0 = 0
tcpHalfOpenDrop = 0 tcpOutSackRetrans = 7
tcpListenDrop 值仍然很高,但在我们开始在日志中看到错误之前,该值就会增加,因此可能与此无关,我不确定。是否有其他 (TCP) 参数值得调整以尝试减少我们看到的错误数量?如果没有,有什么推荐的方法来诊断此类问题?