syslog-ng 2.0.9 立即关闭来自某些客户端的 tcp 连接..？

2024-6-2 • tag-icon

我有一个 2.0.9 版的 syslog-ng 实例，它有些旧了，但是...这是企业 IT，升级版本很...有趣...运行在 Solaris 10 上。我遇到了一个奇怪的问题，有些客户端无法通过 TCP 保持与服务器的连接。

当客户端工作时，我可以在客户端上启动 syslog-ng，它会连接并发送数据，并保持连接状态......

12:20:13.200547 IP (tos 0x0, ttl  64, id 13064, offset 0, flags [DF], proto: TCP (6), length: 60) 10.37.128.185.35765 > 10.37.141.31.shell: S, cksum 0xade4 (correct), 1572869826:1572869826(0) win 5840 <mss 1460,sackOK,timestamp 958735818 0,nop,wscale 7>
12:20:13.202279 IP (tos 0x0, ttl  63, id 27707, offset 0, flags [DF], proto: TCP (6), length: 64) 10.37.141.31.shell > 10.37.128.185.35765: S, cksum 0x434d (correct), 3180100791:3180100791(0) ack 1572869827 win 32942 <nop,nop,timestamp 2210148518 958735818,mss 1460,nop,wscale 2,nop,nop,sackOK>
12:20:13.202327 IP (tos 0x0, ttl  64, id 13065, offset 0, flags [DF], proto: TCP (6), length: 52) 10.37.128.185.35765 > 10.37.141.31.shell: ., cksum 0x0499 (correct), ack 1 win 46 <nop,nop,timestamp 958735820 2210148518>
12:20:13.202823 IP (tos 0x0, ttl  64, id 13066, offset 0, flags [DF], proto: TCP (6), length: 140) 10.37.128.185.35765 > 10.37.141.31.shell: P, cksum 0x179d (correct), 1:89(88) ack 1 win 46 <nop,nop,timestamp 958735820 2210148518>
12:20:13.204061 IP (tos 0x0, ttl  63, id 27708, offset 0, flags [DF], proto: TCP (6), length: 52) 10.37.141.31.shell > 10.37.128.185.35765: ., cksum 0x83d6 (correct), ack 89 win 32920 <nop,nop,timestamp 2210148518 958735820>
12:20:13.205558 IP (tos 0x0, ttl  64, id 13067, offset 0, flags [DF], proto: TCP (6), length: 124) 10.37.128.185.35765 > 10.37.141.31.shell: P, cksum 0xc071 (correct), 89:161(72) ack 1 win 46 <nop,nop,timestamp 958735823 2210148518>
12:20:13.206247 IP (tos 0x0, ttl  63, id 27709, offset 0, flags [DF], proto: TCP (6), length: 52) 10.37.141.31.shell > 10.37.128.185.35765: ., cksum 0x839d (correct), ack 161 win 32902 <nop,nop,timestamp 2210148518 958735823>

当客户端无法保持连接时，我看到服务器立即以 FIN 断开连接...

12:20:02.441949 IP (tos 0x10, ttl  64, id 8231, offset 0, flags [DF], proto: TCP (6), length: 60) 10.37.128.185.46121 > 10.37.141.31.shell: S, cksum 0xeb7e (correct), 1553390564:1553390564(0) win 5840 <mss 1460,sackOK,timestamp 958725059 0,nop,wscale 7>
12:20:02.443817 IP (tos 0x0, ttl  63, id 27678, offset 0, flags [DF], proto: TCP (6), length: 64) 10.37.141.31.shell > 10.37.128.185.46121: S, cksum 0xe379 (correct), 3007391908:3007391908(0) ack 1553390565 win 32942 <nop,nop,timestamp 2210147442 958725059,mss 1460,nop,wscale 2,nop,nop,sackOK>
12:20:02.443840 IP (tos 0x10, ttl  64, id 8232, offset 0, flags [DF], proto: TCP (6), length: 52) 10.37.128.185.46121 > 10.37.141.31.shell: ., cksum 0xa4c5 (correct), ack 1 win 46 <nop,nop,timestamp 958725061 2210147442>
12:20:02.445689 IP (tos 0x0, ttl  63, id 27679, offset 0, flags [DF], proto: TCP (6), length: 52) 10.37.141.31.shell > 10.37.128.185.46121: F, cksum 0x2444 (correct), 1:1(0) ack 1 win 32942 <nop,nop,timestamp 2210147442 958725061>
12:20:02.445737 IP (tos 0x10, ttl  64, id 8233, offset 0, flags [DF], proto: TCP (6), length: 52) 10.37.128.185.46121 > 10.37.141.31.shell: F, cksum 0xa4c1 (correct), 1:1(0) ack 2 win 46 <nop,nop,timestamp 958725063 2210147442>
12:20:02.447244 IP (tos 0x0, ttl  63, id 27680, offset 0, flags [DF], proto: TCP (6), length: 52) 10.37.141.31.shell > 10.37.128.185.46121: ., cksum 0x2441 (correct), ack 2 win 32942 <nop,nop,timestamp 2210147442 958725063>

现在，这个问题最初出现在不同的客户端上，但在这种情况下，它是同一个盒子。我通过重新启动客户端 syslog-ng 服务生成了成功的消息，并通过 telnet 到服务器端口生成了不成功的消息。

我还在不同的端口上启动了 syslog-ng 服务器的新实例，并且在本地主机上 telnet 到 514 连接并断开连接...

 $ telnet localhost 514
 Trying 127.0.0.1...
 Connected to localhost.
 Escape character is '^]'.
 Connection to localhost closed by foreign host

但是在另一个端口上，在一个新的进程中，我们得到了一个保持打开的连接......

 $ telnet localhost 1140
 Trying 127.0.0.1...
 Connected to localhost.
 Escape character is '^]'.
 ^]
 telnet
 quit
 Connection to localhost closed.

因此，在进程运行了一段未定义的时间后，syslog-ng 或 Solaris 10 中的某些东西似乎不喜欢这些连接中的某些。这与 tcpwrappers 有关，hosts.allow 中定义了“syslog-ng: ALL”，我看到的行为类似于 tcpwrappers 阻止连接时发生的行为，但我认为这不是系统出现故障的部分，因为它似乎是通用的。

“本地主机到新进程”行为看起来与远程连接相同，看起来不像是防火墙在做奇怪的事情。我迷路了。

欢迎提出任何猜测或指点！

答案1

检查max-connectionssyslog.conf 中的设置 - 它默认为 10，这对您来说可能太低了。

答案1

相关内容