在这里,我们尝试将syslog
文件分成更小的块。为此,我们使用以下脚本。
#!/bin/bash
date=$(date +%Y%m%d_%H%M)
cp /path/to/sys.log /path/to/chuck/file.log
cat /dev/null > /path/to/sys.log
cp /path/to/chuck/file.log /path/to/chuck/file_"$date".log
cat /dev/null > /path/to/chuck/file.log
该脚本每 5 分钟运行一次,并将其分解syslog
为更小的日志,我们用它来进一步处理。问题是原始文件和分块文件
中缺少一些记录。syslog
有办法解决这个问题吗?
这里的分块文件大约为 2GB。syslog
正在不断增长。
答案1
警告:这在 8.2002 版本上对我来说并不完美,因为它有一些小的最大尺寸和imfile
输入模块。日志文件中丢失了一些数据,但文档通常警告不支持小缓冲区大小。
如果您使用 rsyslog,它提供了一种内置机制,可以在日志文件超过一定大小时轮换日志文件。您只需创建一个 渠道,指定文件名、最大大小以及超出大小时要运行的脚本的名称。然后 使用 过滤器中的频道名称而不是文件名。
例如,要在文件/home/me/rotatescript
写入超过 50Mibytes 时运行脚本/var/log/mylog
,请创建名为 的通道mylogrotatechan
:
$outchannel mylogrotatechan,/var/log/mylog,52428800,/home/me/rotatescript
并替换*.* /var/log/mylog
为
*.* :omfile:$mylogrotatechan
最新来源(2023 年 1 月起)中还有一个测试脚本,它使用了未记录的 RainerScript 语法:
action(type="omfile" file="/var/log/mylog" rotation.sizeLimit="52428800"
rotation.sizeLimitCommand="/home/me/rotatescript")
典型的脚本可能是
#!/bin/bash
mv /var/log/mylog /var/log/mylog.$(date +%Y%m%d.%H%M%S)
答案2
我会使用 Perl 的文件::尾巴为此(我将它用于几乎所有需要监视的东西 - 即连续尾部 - 日志文件)。
#!/usr/bin/perl
# File::Tail need to be installed from a distro package e.g. 'apt
# install libfile-tail-perl' on debian, ubuntu, mint etc or from
# CPAN https://metacpan.org/pod/File::Tail
# BTW, File::Tail has several useful options, run `man File::Tail`
# for details.
use File::Tail;
# These two need to be installed from a distro package e.g. 'apt
# install libtimedate-perl' on debian, ubuntu, mint etc or from
# CPAN https://metacpan.org/release/TimeDate
use Date::Parse;
use Date::Format;
# These two modules are included with perl
use File::Basename;
use Scalar::Util qw(openhandle);
use strict;
# $logfile is hard-coded here, but you can get it from the command line
# e.g. with something as simple as `my $logfile = shift` or use one of
# the command-line option processing modules like Getopt::Std or
# Getopt::Long
my $logfile = '/var/log/syslog';
# the output dir is hard-coded here to `chunk/` in the current dir.
# set it to whatever you want, or get it from the command line.
my $basename = './chunk/' . basename($logfile);
# open a handle to the log file. File::Tail will automatically
# re-open the log file if it gets rotated and re-created.
my $logref=tie(*LOG,"File::Tail", (name => $logfile, tail => -1));
my ($d, $t, $t2, $outfile, $chunk);
while(<LOG>) {
# Example of handling two different common rsyslog logfile date
# formats. Adjust the regex(es) to suit YOUR log file.
if (/^([[:alpha:]]{3} \d+ \d\d:\d\d):/i) {
# Jul 25 00:00:02 ....
$d = $1;
} elsif (/^(\d{4}[ T]\d\d:\d\d):/) {
# 2023-07-25T00:00:01.737457+10:00 ....
$d = $1;
} else {
die "Couldn't find a known date format in:\n$_";
};
$t = str2time($d);
if ($t - $t2 >= 300) {
close($chunk) if openhandle($chunk);
# Alternatively, you could run your chunk processing
# program from here:
# (this is really basic & untested but it should work...but
# there are better ways of handling child processes.)
# if (openhandle($chunk)) {
# close($chunk);
#
# $SIG{CHLD} = "IGNORE";
# fork;
# exec("myprogram", $outfile) or
# warn "Couldn't exec 'myprogram $outfile'\n";
# };
$t2 = $t;
$d = time2str("%Y%m%d_%H%M", $t);
$outfile = "${basename}_$d";
# Ignore output files that already exist, so that we can
# just re-run this script if it gets killed for some reason.
if (! -e $outfile) {
print "opening new output file $outfile\n";
open($chunk, ">", $outfile) or
die "couldn't open $outfile for write: $!\n";
}
};
print $chunk $_ if openhandle($chunk);
}
将其另存为,例如,split-log-5min.pl
或类似的,使其可执行chmod +x split-log-5min.pl
并运行它。它将继续运行并处理日志文件中的数据,直到它被终止(例如使用 Ctrl-C 或kill
),或者直到尝试打开输出文件进行写入时出现错误。
我在 /var/log/syslog 的摘录中运行了此命令,最终得到了一个./chunk
包含大量 5 分钟块的目录:
$ ls chunk/
syslog_20230725_0000 syslog_20230725_0519 syslog_20230725_1037
syslog_20230725_0007 syslog_20230725_0525 syslog_20230725_1043
syslog_20230725_0013 syslog_20230725_0531 syslog_20230725_1049
syslog_20230725_0019 syslog_20230725_0537 syslog_20230725_1055
[...many more deleted]
syslog_20230725_0455 syslog_20230725_1013 syslog_20230725_1531
syslog_20230725_0501 syslog_20230725_1019 syslog_20230725_1537
syslog_20230725_0507 syslog_20230725_1025
syslog_20230725_0513 syslog_20230725_1031
顺便说一句,根据您正在处理五分钟块的确切内容,您也许可以在此脚本中处理它们,而无需将块复制到单独的文件中。例如,不要写入文件,而是将每一行附加到数组中,然后每五分钟处理并清除数组。