我们有一台 CentOS 机器作为我们的系统日志服务器。我最近注意到机器有时会丢弃大量系统日志消息。我们有超过 250 个路由器和交换机,以及记录每个连接打开/关闭的防火墙。
我可以看到消息到达以太网网卡...例如,假设我进入其中一台路由器的配置模式然后退出...这会生成一条系统日志消息,我嗅探以证明它到达了系统日志服务器...
[mpenning@uglylogger net]$ sudo tshark -V udp and port 514 and host 214.110.12.246
Running as user "root" and group "root". This could be dangerous.
Capturing on eth0
Frame 1 (151 bytes on wire, 151 bytes captured)
Arrival Time: Apr 26, 2013 06:05:33.490721000
[Time delta from previous captured frame: 0.000000000 seconds]
[Time delta from previous displayed frame: 0.000000000 seconds]
[Time since reference or first frame: 0.000000000 seconds]
Frame Number: 1
Frame Length: 151 bytes
Capture Length: 151 bytes
[Frame is marked: False]
[Protocols in frame: eth:ip:udp:syslog]
Ethernet II, Src: 50:57:a8:9f:68:3c (50:57:a8:9f:68:3c), Dst: Supermic_30:4a:07 (00:30:48:30:4a:07)
Destination: Supermic_30:4a:07 (00:30:48:30:4a:07)
Address: Supermic_30:4a:07 (00:30:48:30:4a:07)
.... ...0 .... .... .... .... = IG bit: Individual address (unicast)
.... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
Source: 50:57:a8:9f:68:3c (50:57:a8:9f:68:3c)
Address: 50:57:a8:9f:68:3c (50:57:a8:9f:68:3c)
.... ...0 .... .... .... .... = IG bit: Individual address (unicast)
.... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
Type: IP (0x0800)
Internet Protocol, Src: 214.110.12.246 (214.110.12.246), Dst: 214.110.16.4 (214.110.16.4)
Version: 4
Header length: 20 bytes
Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00)
0000 00.. = Differentiated Services Codepoint: Default (0x00)
.... ..0. = ECN-Capable Transport (ECT): 0
.... ...0 = ECN-CE: 0
Total Length: 137
Identification: 0x0096 (150)
Flags: 0x00
0.. = Reserved bit: Not Set
.0. = Don't fragment: Not Set
..0 = More fragments: Not Set
Fragment offset: 0
Time to live: 250
Protocol: UDP (0x11)
Header checksum: 0xcf52 [correct]
[Good: True]
[Bad : False]
Source: 214.110.12.246 (214.110.12.246)
Destination: 214.110.16.4 (214.110.16.4)
User Datagram Protocol, Src Port: 63873 (63873), Dst Port: syslog (514)
Source port: 63873 (63873)
Destination port: syslog (514)
Length: 117
Checksum: 0x1ed8 [validation disabled]
[Good Checksum: False]
[Bad Checksum: False]
Syslog message: LOCAL7.NOTICE: 187: .Apr 26 06:05:32.476 CDT: %SYS-5-CONFIG_I: Configured from console by admin on vty0 (214.110.26.29)
1011 1... = Facility: LOCAL7 - reserved for local use (23)
.... .101 = Level: NOTICE - normal but significant condition (5)
Message: 187: .Apr 26 06:05:32.476 CDT: %SYS-5-CONFIG_I: Configured from console by admin on vty0 (214.110.26.29)
^C1 packet captured
[mpenning@uglylogger net]$
但是,该消息不会显示在系统日志中(即使在 05:37:05 发送的另一条消息显示)...这些消息中只有七八条消息显示在日志文件中...
[mpenning@uglylogger net]$ pwd
/var/log/net
[mpenning@uglylogger net]$
[mpenning@uglylogger net]$ grep CONFIG network.log
Apr 26 05:08:59 somedevice.company.local 779476: Apr 26 05:08:58.604
CDT: %SYS-5-CONFIG_I: Configured from console by cbutler on vty0 (214.110.31.65)
Apr 26 05:37:05 w9-idf-lab 182: .Apr 26 05:37:04.452 CDT:
%SYS-5-CONFIG_I: Configured from console by admin on vty0 (214.110.26.29)
[mpenning@uglylogger net]$
我已经尝试了很多方法来使该服务器正确记录...
- 从分区中删除所有无关文件
- 检查 NIC 是否有丢弃/错误
- 重新加载 rsyslogd
- 重新加载了系统
yum update
- 启动缓存名称服务器
- 检查了iostat(负载相当合理,见下文)
- 检查了vmstat(再次加载,相当合理)
我已经没有主意了......我怎样才能阻止这台服务器泄露系统日志?
杂项输出
目录列表
[mpenning@uglylogger net]$ ls -al
total 3292356
drwxr-xr-x 3 root root 20480 Apr 26 05:17 .
drwxr-xr-x. 13 root root 4096 Apr 21 03:39 ..
drwxr-xr-x 2 root root 4096 Apr 19 04:22 daily
-rw-r--r-- 1 root root 185081020 Apr 26 06:21 fw1.log
-rw-r--r-- 1 root root 1936049300 Apr 26 04:15 fw1.log-20130426
-rw-r--r-- 1 root root 7629110 Apr 26 06:21 fw2.log
-rw-r--r-- 1 root root 160792210 Apr 26 04:17 fw2.log-20130426
-rw-r--r-- 1 root root 118327 Apr 26 06:20 network.log
-rw-r--r-- 1 root root 1572210 Apr 26 04:17 network.log-20130426
-rw-r--r-- 1 root root 73108 Apr 26 06:21 wireless.log
-rw-r--r-- 1 root root 686104 Apr 26 04:17 wireless.log-20130426
[mpenning@uglylogger net]$
rsyslog.conf
$ModLoad imudp
$UDPServerRun 514
$ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat
$IncludeConfig /etc/rsyslog.d/*.conf
if $msg contains 'APF-3-RCV_UNSUPP_MSG' then /dev/null
&~
if $msg contains 'FWSM-2-106007' then /dev/null
&~
if $fromhost-ip=='214.110.225.201' then /var/log/net/fw1.log
&~
if $fromhost-ip=='214.110.225.202' then /var/log/net/fw1.log
&~
if $fromhost-ip=='214.110.19.246' then /var/log/net/fw2.log
&~
if $fromhost-ip=='214.110.19.253' then /var/log/net/fw2.log
&~
if $fromhost-ip=='214.110.5.10' then /var/log/net/fw_other.log
&~
if $fromhost-ip=='214.110.6.254' then /var/log/net/fw_other.log
&~
if $fromhost-ip=='214.110.6.253' then /var/log/net/fw_other.log
&~
if $fromhost-ip=='214.110.46.254' then /var/log/net/fw_other.log
&~
if $fromhost-ip=='214.110.46.253' then /var/log/net/fw_other.log
&~
if $fromhost-ip=='214.110.19.42' then /var/log/net/fw_other.log
&~
if $fromhost-ip=='214.110.3.203' then /var/log/net/fw_other.log
&~
if $fromhost-ip=='214.110.84.2' then @214.110.16.6
if $fromhost-ip=='214.110.84.2' then /var/log/net/wireless.log
&~
if $fromhost-ip=='214.110.84.4' then @214.110.16.6
if $fromhost-ip=='214.110.84.4' then /var/log/net/wireless.log
&~
if $fromhost-ip=='214.110.70.8' then @214.110.16.6
if $fromhost-ip=='214.110.70.8' then /var/log/net/wireless.log
&~
if $fromhost-ip=='214.110.70.12' then @214.110.16.6
if $fromhost-ip=='214.110.70.12' then /var/log/net/wireless.log
&~
if $fromhost-ip=='214.110.70.16' then @214.110.16.6
if $fromhost-ip=='214.110.70.16' then /var/log/net/wireless.log
&~
if $fromhost-ip=='214.110.16.6' then /var/log/net/wireless.log
&~
if $fromhost-ip startswith '214.110' then /var/log/net/network.log
&~
if $fromhost-ip startswith '214.111' then /var/log/net/network.log
&~
if $fromhost-ip startswith '214.112' then /var/log/net/network.log
&~
*.info;mail.none;authpriv.none;cron.none /var/log/messages
authpriv.* /var/log/secure
mail.* -/var/log/maillog
cron.* /var/log/cron
*.emerg *
uucp,news.crit /var/log/spooler
local7.* /var/log/boot.log
$template SpiceTmpl,"%TIMESTAMP%.%TIMESTAMP:::date-subseconds% %syslogtag% %syslogseverity-text%:%msg:::sp-if-no-1st-sp%%msg:::drop-last-lf%\n"
:programname, startswith, "spice-vdagent" /var/log/spice-vdagent.log;SpiceTmpl
iostat 5
...这些数字是典型的...负载不会激增那么多...
[mpenning@uglylogger net]$ iostat 5
Linux 2.6.32-358.2.1.el6.i686 (uglylogger.local) 04/26/2013 _i686_ (4 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
2.25 0.05 0.80 0.37 0.00 96.52
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 10.67 364.09 536.25 995468936 1466172104
dm-0 33.82 15.35 268.50 41981146 734110616
dm-1 0.00 0.00 0.00 3056 0
dm-2 2.21 0.05 17.60 132282 48114080
dm-3 33.07 348.68 250.14 953334658 683895008
avg-cpu: %user %nice %system %iowait %steal %idle
0.05 0.00 0.00 0.15 0.00 99.80
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 1.00 0.00 8.00 0 40
dm-0 0.80 0.00 6.40 0 32
dm-1 0.00 0.00 0.00 0 0
dm-2 0.00 0.00 0.00 0 0
dm-3 0.20 0.00 1.60 0 8
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 0.05 0.30 0.00 99.65
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 1.00 0.00 33.60 0 168
dm-0 0.00 0.00 0.00 0 0
dm-1 0.00 0.00 0.00 0 0
dm-2 0.00 0.00 0.00 0 0
dm-3 4.20 0.00 33.60 0 168
avg-cpu: %user %nice %system %iowait %steal %idle
1.05 0.00 0.30 0.20 0.00 98.44
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 1.20 0.00 8.00 0 40
dm-0 0.80 0.00 6.40 0 32
dm-1 0.00 0.00 0.00 0 0
dm-2 0.00 0.00 0.00 0 0
dm-3 0.20 0.00 1.60 0 8
avg-cpu: %user %nice %system %iowait %steal %idle
1.40 0.00 0.30 0.15 0.00 98.14
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.80 0.00 6.40 0 32
dm-0 0.80 0.00 6.40 0 32
dm-1 0.00 0.00 0.00 0 0
dm-2 0.00 0.00 0.00 0 0
dm-3 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
0.05 0.00 0.05 0.25 0.00 99.65
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 5.40 0.00 46.40 0 232
dm-0 0.00 0.00 0.00 0 0
dm-1 0.00 0.00 0.00 0 0
dm-2 0.00 0.00 0.00 0 0
dm-3 5.80 0.00 46.40 0 232
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 0.05 0.15 0.00 99.80
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.80 0.00 4.80 0 24
dm-0 0.60 0.00 4.80 0 24
dm-1 0.00 0.00 0.00 0 0
dm-2 0.00 0.00 0.00 0 0
dm-3 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
3.71 0.00 1.05 0.10 0.00 95.14
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 1.20 0.00 19.20 0 96
dm-0 1.40 0.00 11.20 0 56
dm-1 0.00 0.00 0.00 0 0
dm-2 0.00 0.00 0.00 0 0
dm-3 1.00 0.00 8.00 0 40
avg-cpu: %user %nice %system %iowait %steal %idle
0.05 0.00 0.00 0.35 0.00 99.60
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 3.40 0.00 102.40 0 512
dm-0 1.40 0.00 11.20 0 56
dm-1 0.00 0.00 0.00 0 0
dm-2 0.00 0.00 0.00 0 0
dm-3 11.40 0.00 91.20 0 456
^C
[mpenning@uglylogger net]$
vmstat 5
...
[mpenning@uglylogger net]$ vmstat 5
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 1 0 240408 50068 3506120 0 0 46 67 2 2 2 1 97 0 0
0 0 0 240400 50068 3506152 0 0 0 19 1312 37 0 0 100 0 0
1 0 0 239160 50084 3508648 0 0 0 18 3943 2320 20 6 74 0 0
1 0 0 242632 50100 3510084 0 0 0 212 3561 2718 14 5 81 1 0
0 0 0 238168 50116 3510336 0 0 0 58 1612 2537 1 1 98 0 0
0 0 0 235440 50124 3510340 0 0 0 510 1340 1333 0 0 100 0 0
0 0 0 235564 50132 3510340 0 0 0 19 1470 37 0 0 100 0 0
0 0 0 235564 50140 3510344 0 0 0 28 1348 40 0 0 100 0 0
0 0 0 235440 50156 3510340 0 0 0 17 1497 37 0 0 100 0 0
0 0 0 235564 50156 3510348 0 0 0 70 1522 38 0 0 100 0 0
df -h
...
[mpenning@sasmars net]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_uglylogger-LogVol01
73G 17G 56G 23% /
tmpfs 2.0G 0 2.0G 0% /dev/shm
/dev/sda1 985M 126M 809M 14% /boot
/dev/mapper/vg_uglylogger-LogVol00
20G 655M 18G 4% /home
/dev/mapper/vg_uglylogger-LogVol03
592G 4.9G 557G 1% /var
[mpenning@uglylogger net]$
netstat -s
...
[mpenning@uglylogger net]$ netstat -s | head
Ip:
452407434 total packets received
0 forwarded
0 incoming packets discarded
449256489 incoming packets delivered
156718754 requests sent out
150 reassemblies required
75 packets reassembled ok
Icmp:
2423436 ICMP messages received
[mpenning@uglylogger net]$
ifconfig eth0
...
[mpenning@uglylogger net]$ ifconfig eth0 | grep -E "dropped|collisions"
RX packets:466065257 errors:0 dropped:0 overruns:0 frame:0
TX packets:252751047 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
[mpenning@uglylogger net]$
ethtool -S eth0|grep -vw 0
...
[mpenning@uglylogger net]$ sudo ethtool -S eth0|grep -vw 0
NIC statistics:
rx_packets: 5206242266
tx_packets: 255859804
rx_bytes: 1205016281115
tx_bytes: 180592917566
rx_broadcast: 15001736
tx_broadcast: 2648
rx_multicast: 7518499
tx_multicast: 50236
multicast: 7518499
rx_no_buffer_count: 103
tx_restart_queue: 1
tx_tcp_seg_good: 10107428
rx_long_byte_count: 1205016281115
rx_csum_offload_good: 5189768707
rx_csum_offload_errors: 5
[mpenning@uglylogger net]$
lspci
...
[mpenning@uglylogger net]$ sudo lspci
00:00.0 Host bridge: Intel Corporation E7501 Memory Controller Hub (rev 01)
00:00.1 Unassigned class [ff00]: Intel Corporation E7500/E7501 Host RASUM Controller (rev 01)
00:02.0 PCI bridge: Intel Corporation E7500/E7501 Hub Interface B PCI-to-PCI Bridge (rev 01)
00:03.0 PCI bridge: Intel Corporation E7500/E7501 Hub Interface C PCI-to-PCI Bridge (rev 01)
00:1d.0 USB controller: Intel Corporation 82801CA/CAM USB Controller #1 (rev 02)
00:1d.1 USB controller: Intel Corporation 82801CA/CAM USB Controller #2 (rev 02)
00:1d.2 USB controller: Intel Corporation 82801CA/CAM USB Controller #3 (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 42)
00:1f.0 ISA bridge: Intel Corporation 82801CA LPC Interface Controller (rev 02)
00:1f.1 IDE interface: Intel Corporation 82801CA Ultra ATA Storage Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801CA/CAM SMBus Controller (rev 02)
01:1c.0 PIC: Intel Corporation 82870P2 P64H2 I/OxAPIC (rev 04)
01:1d.0 PCI bridge: Intel Corporation 82870P2 P64H2 Hub PCI Bridge (rev 04)
01:1e.0 PIC: Intel Corporation 82870P2 P64H2 I/OxAPIC (rev 04)
01:1f.0 PCI bridge: Intel Corporation 82870P2 P64H2 Hub PCI Bridge (rev 04)
02:01.0 RAID bus controller: 3ware Inc 7xxx/8xxx-series PATA/SATA-RAID (rev 01)
03:02.0 Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) (rev 01)
03:02.1 Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) (rev 01)
04:1c.0 PIC: Intel Corporation 82870P2 P64H2 I/OxAPIC (rev 04)
04:1d.0 PCI bridge: Intel Corporation 82870P2 P64H2 Hub PCI Bridge (rev 04)
04:1e.0 PIC: Intel Corporation 82870P2 P64H2 I/OxAPIC (rev 04)
04:1f.0 PCI bridge: Intel Corporation 82870P2 P64H2 Hub PCI Bridge (rev 04)
05:02.0 Serial controller: 3Com Corp, Modem Division 56K FaxModem Model 5610 (rev 01)
07:01.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI Rage XL (rev 27)
[mpenning@uglylogger net]$
uname -a
...
[mpenning@uglylogger net]$ uname -a
Linux uglylogger.local 2.6.32-358.2.1.el6.i686 #1 SMP Tue Mar 12 21:42:46 UTC 2013 i686 i686 i386 GNU/Linux
[mpenning@uglylogger net]$
答案1
更新到最新版本的 rsyslog。我们在工作中遇到了这个确切的问题,这是解决它的唯一方法。早期版本存在名称解析问题,即使将其关闭也无法解决问题。 7.x 分支解决了这个问题。我看看能不能找到具体的链接。
答案2
我假设 RedHat 确实将所有与性能相关的问题向后移植到操作系统中包含的 rsyslog 版本。
rx_no_buffer_count: 103
似乎是这里的核心问题。
这表示有 103 个 TCP 数据包在传递到操作系统之前被丢弃。
该 drop 已在 NIC 中注册(并且未传递到操作系统,因此netstat
未显示这一点)。要解决该问题,您可能必须增加 NIC 设置中的接收环缓冲区大小。
执行 aethtool -g eth0
查看您当前和可能的 RX 设置。
根据我的经验,将 RX-ring-buffer 设置为 2048 或 3172 非常好。
这将使 NIC 有时间缓冲传入的 TCP 数据包,直到硬件(这需要 PCI 中断)和操作系统有时间处理数据包。
这是来自的解释红帽, 这里发生了什么。
要使此更改持久,请按所述更改 ETHTOOL_OPS这里。