我有一个 2.0.9 版的 syslog-ng 实例,它有些旧了,但是...这是企业 IT,升级版本很...有趣...运行在 Solaris 10 上。我遇到了一个奇怪的问题,有些客户端无法通过 TCP 保持与服务器的连接。
当客户端工作时,我可以在客户端上启动 syslog-ng,它会连接并发送数据,并保持连接状态......
12:20:13.200547 IP (tos 0x0, ttl 64, id 13064, offset 0, flags [DF], proto: TCP (6), length: 60) 10.37.128.185.35765 > 10.37.141.31.shell: S, cksum 0xade4 (correct), 1572869826:1572869826(0) win 5840 <mss 1460,sackOK,timestamp 958735818 0,nop,wscale 7>
12:20:13.202279 IP (tos 0x0, ttl 63, id 27707, offset 0, flags [DF], proto: TCP (6), length: 64) 10.37.141.31.shell > 10.37.128.185.35765: S, cksum 0x434d (correct), 3180100791:3180100791(0) ack 1572869827 win 32942 <nop,nop,timestamp 2210148518 958735818,mss 1460,nop,wscale 2,nop,nop,sackOK>
12:20:13.202327 IP (tos 0x0, ttl 64, id 13065, offset 0, flags [DF], proto: TCP (6), length: 52) 10.37.128.185.35765 > 10.37.141.31.shell: ., cksum 0x0499 (correct), ack 1 win 46 <nop,nop,timestamp 958735820 2210148518>
12:20:13.202823 IP (tos 0x0, ttl 64, id 13066, offset 0, flags [DF], proto: TCP (6), length: 140) 10.37.128.185.35765 > 10.37.141.31.shell: P, cksum 0x179d (correct), 1:89(88) ack 1 win 46 <nop,nop,timestamp 958735820 2210148518>
12:20:13.204061 IP (tos 0x0, ttl 63, id 27708, offset 0, flags [DF], proto: TCP (6), length: 52) 10.37.141.31.shell > 10.37.128.185.35765: ., cksum 0x83d6 (correct), ack 89 win 32920 <nop,nop,timestamp 2210148518 958735820>
12:20:13.205558 IP (tos 0x0, ttl 64, id 13067, offset 0, flags [DF], proto: TCP (6), length: 124) 10.37.128.185.35765 > 10.37.141.31.shell: P, cksum 0xc071 (correct), 89:161(72) ack 1 win 46 <nop,nop,timestamp 958735823 2210148518>
12:20:13.206247 IP (tos 0x0, ttl 63, id 27709, offset 0, flags [DF], proto: TCP (6), length: 52) 10.37.141.31.shell > 10.37.128.185.35765: ., cksum 0x839d (correct), ack 161 win 32902 <nop,nop,timestamp 2210148518 958735823>
当客户端无法保持连接时,我看到服务器立即以 FIN 断开连接...
12:20:02.441949 IP (tos 0x10, ttl 64, id 8231, offset 0, flags [DF], proto: TCP (6), length: 60) 10.37.128.185.46121 > 10.37.141.31.shell: S, cksum 0xeb7e (correct), 1553390564:1553390564(0) win 5840 <mss 1460,sackOK,timestamp 958725059 0,nop,wscale 7>
12:20:02.443817 IP (tos 0x0, ttl 63, id 27678, offset 0, flags [DF], proto: TCP (6), length: 64) 10.37.141.31.shell > 10.37.128.185.46121: S, cksum 0xe379 (correct), 3007391908:3007391908(0) ack 1553390565 win 32942 <nop,nop,timestamp 2210147442 958725059,mss 1460,nop,wscale 2,nop,nop,sackOK>
12:20:02.443840 IP (tos 0x10, ttl 64, id 8232, offset 0, flags [DF], proto: TCP (6), length: 52) 10.37.128.185.46121 > 10.37.141.31.shell: ., cksum 0xa4c5 (correct), ack 1 win 46 <nop,nop,timestamp 958725061 2210147442>
12:20:02.445689 IP (tos 0x0, ttl 63, id 27679, offset 0, flags [DF], proto: TCP (6), length: 52) 10.37.141.31.shell > 10.37.128.185.46121: F, cksum 0x2444 (correct), 1:1(0) ack 1 win 32942 <nop,nop,timestamp 2210147442 958725061>
12:20:02.445737 IP (tos 0x10, ttl 64, id 8233, offset 0, flags [DF], proto: TCP (6), length: 52) 10.37.128.185.46121 > 10.37.141.31.shell: F, cksum 0xa4c1 (correct), 1:1(0) ack 2 win 46 <nop,nop,timestamp 958725063 2210147442>
12:20:02.447244 IP (tos 0x0, ttl 63, id 27680, offset 0, flags [DF], proto: TCP (6), length: 52) 10.37.141.31.shell > 10.37.128.185.46121: ., cksum 0x2441 (correct), ack 2 win 32942 <nop,nop,timestamp 2210147442 958725063>
现在,这个问题最初出现在不同的客户端上,但在这种情况下,它是同一个盒子。我通过重新启动客户端 syslog-ng 服务生成了成功的消息,并通过 telnet 到服务器端口生成了不成功的消息。
我还在不同的端口上启动了 syslog-ng 服务器的新实例,并且在本地主机上 telnet 到 514 连接并断开连接...
$ telnet localhost 514
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Connection to localhost closed by foreign host
但是在另一个端口上,在一个新的进程中,我们得到了一个保持打开的连接......
$ telnet localhost 1140
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
^]
telnet
quit
Connection to localhost closed.
因此,在进程运行了一段未定义的时间后,syslog-ng 或 Solaris 10 中的某些东西似乎不喜欢这些连接中的某些。这与 tcpwrappers 有关,hosts.allow 中定义了“syslog-ng: ALL”,我看到的行为类似于 tcpwrappers 阻止连接时发生的行为,但我认为这不是系统出现故障的部分,因为它似乎是通用的。
“本地主机到新进程”行为看起来与远程连接相同,看起来不像是防火墙在做奇怪的事情。我迷路了。
欢迎提出任何猜测或指点!
答案1
检查max-connections
syslog.conf 中的设置 - 它默认为 10,这对您来说可能太低了。