如何将系统日志分解成更小的文件而不丢失数据

如何将系统日志分解成更小的文件而不丢失数据

在这里,我们尝试将syslog文件分成更小的块。为此,我们使用以下脚本。

#!/bin/bash
date=$(date +%Y%m%d_%H%M)
 
cp /path/to/sys.log /path/to/chuck/file.log
cat /dev/null > /path/to/sys.log

cp /path/to/chuck/file.log /path/to/chuck/file_"$date".log
cat /dev/null > /path/to/chuck/file.log

该脚本每 5 分钟运行一次,并将其分解syslog为更小的日志,我们用它来进一步处理。问题是原始文件和分块文件
中缺少一些记录。syslog有办法解决这个问题吗?

这里的分块文件大约为 2GB。syslog正在不断增长。

答案1

警告:这在 8.2002 版本上对我来说并不完美,因为它有一些小的最大尺寸和imfile输入模块。日志文件中丢失了一些数据,但文档通常警告不支持小缓冲区大小。


如果您使用 rsyslog,它提供了一种内置机制,可以在日志文件超过一定大小时轮换日志文件。您只需创建一个 渠道,指定文件名、最大大小以及超出大小时要运行的脚本的名称。然后 使用 过滤器中的频道名称而不是文件名。

例如,要在文件/home/me/rotatescript写入超过 50Mibytes 时运行脚本/var/log/mylog,请创建名为 的通道mylogrotatechan

$outchannel mylogrotatechan,/var/log/mylog,52428800,/home/me/rotatescript

并替换*.* /var/log/mylog

*.* :omfile:$mylogrotatechan

最新来源(2023 年 1 月起)中还有一个测试脚本,它使用了未记录的 RainerScript 语法:

action(type="omfile" file="/var/log/mylog" rotation.sizeLimit="52428800"
       rotation.sizeLimitCommand="/home/me/rotatescript")

典型的脚本可能是

#!/bin/bash
mv /var/log/mylog /var/log/mylog.$(date +%Y%m%d.%H%M%S)

答案2

我会使用 Perl 的文件::尾巴为此(我将它用于几乎所有需要监视的东西 - 即连续尾部 - 日志文件)。

#!/usr/bin/perl

# File::Tail need to be installed from a distro package e.g. 'apt
# install libfile-tail-perl' on debian, ubuntu, mint etc or from
# CPAN https://metacpan.org/pod/File::Tail
# BTW, File::Tail has several useful options, run `man File::Tail`
# for details.
use File::Tail;

# These two need to be installed from a distro package e.g. 'apt
# install libtimedate-perl' on debian, ubuntu, mint etc or from
# CPAN https://metacpan.org/release/TimeDate
use Date::Parse;
use Date::Format;

# These two modules are included with perl
use File::Basename;
use Scalar::Util qw(openhandle);

use strict;

# $logfile is hard-coded here, but you can get it from the command line
# e.g. with something as simple as `my $logfile = shift` or use one of
# the command-line option processing modules like Getopt::Std or
# Getopt::Long
my $logfile = '/var/log/syslog';

# the output dir is hard-coded here to `chunk/` in the current dir.
# set it to whatever you want, or get it from the command line.
my $basename = './chunk/' . basename($logfile);

# open a handle to the log file.  File::Tail will automatically
# re-open the log file if it gets rotated and re-created.
my $logref=tie(*LOG,"File::Tail", (name => $logfile, tail => -1));

my ($d, $t, $t2, $outfile, $chunk);

while(<LOG>) {
  # Example of handling two different common rsyslog logfile date
  # formats.  Adjust the regex(es) to suit YOUR log file.
  if (/^([[:alpha:]]{3} \d+ \d\d:\d\d):/i) {
    # Jul 25 00:00:02 ....
    $d = $1;
  } elsif (/^(\d{4}[ T]\d\d:\d\d):/) {
    # 2023-07-25T00:00:01.737457+10:00 ....
    $d = $1;
  } else {
    die "Couldn't find a known date format in:\n$_";
  };
  $t = str2time($d);

  if ($t - $t2 >= 300) {
    close($chunk) if openhandle($chunk);
    # Alternatively, you could run your chunk processing
    # program from here:
    # (this is really basic & untested but it should work...but
    # there are better ways of handling child processes.)
    # if (openhandle($chunk)) {
    #   close($chunk);
    #
    #   $SIG{CHLD} = "IGNORE";
    #   fork;
    #   exec("myprogram", $outfile) or
    #     warn "Couldn't exec 'myprogram $outfile'\n";
    # };


    $t2 = $t;
    $d = time2str("%Y%m%d_%H%M", $t);
    $outfile = "${basename}_$d";

    # Ignore output files that already exist, so that we can
    # just re-run this script if it gets killed for some reason.
    if (! -e $outfile) {
      print "opening new output file $outfile\n";
      open($chunk, ">", $outfile) or
        die "couldn't open $outfile for write: $!\n";
    }
  };

  print $chunk $_ if openhandle($chunk);
}

将其另存为,例如,split-log-5min.pl或类似的,使其可执行chmod +x split-log-5min.pl并运行它。它将继续运行并处理日志文件中的数据,直到它被终止(例如使用 Ctrl-C 或kill),或者直到尝试打开输出文件进行写入时出现错误。

我在 /var/log/syslog 的摘录中运行了此命令,最终得到了一个./chunk包含大量 5 分钟块的目录:

$ ls chunk/
syslog_20230725_0000  syslog_20230725_0519  syslog_20230725_1037
syslog_20230725_0007  syslog_20230725_0525  syslog_20230725_1043
syslog_20230725_0013  syslog_20230725_0531  syslog_20230725_1049
syslog_20230725_0019  syslog_20230725_0537  syslog_20230725_1055
[...many more deleted]
syslog_20230725_0455  syslog_20230725_1013  syslog_20230725_1531
syslog_20230725_0501  syslog_20230725_1019  syslog_20230725_1537
syslog_20230725_0507  syslog_20230725_1025
syslog_20230725_0513  syslog_20230725_1031

顺便说一句,根据您正在处理五分钟块的确切内容,您也许可以在此脚本中处理它们,而无需将块复制到单独的文件中。例如,不要写入文件,而是将每一行附加到数组中,然后每五分钟处理并清除数组。

相关内容