按日期范围搜索日志

按日期范围搜索日志

我的日志文件类似于以下示例:

10.434.22.334 - unauthenticated 10/Aug/2020:23:45:45 +0800 "GET /eai/random.jsp HTTP/1.1" 200 74
10.434.22.334 - unauthenticated 10/Aug/2020:23:45:45 +0800 "GET /eai/random.jsp HTTP/1.1" 200 74
10.434.22.334 - unauthenticated 11/Aug/2020:23:34:45 +0800 "GET /eai/random.jsp HTTP/1.1" 200 74
10.434.22.334 - unauthenticated 12/Aug/2020:23:45:43 +0800 "GET /eai/random.jsp HTTP/1.1" 200 74
10.434.22.334 - unauthenticated 13/Aug/2020:23:43:45 +0800 "GET /eai/random.jsp HTTP/1.1" 200 74
10.434.22.334 - unauthenticated 14/Aug/2020:23:33:45 +0800 "GET /eai/random.jsp HTTP/1.1" 200 74

我想通过指定日期范围来搜索上述条目,如下所示:

./Logsearch.sh 10/Aug/2020 13/Aug/2020

预期结果:

10.434.22.334 - unauthenticated 10/Aug/2020:23:45:45 +0800 "GET /eai/random.jsp HTTP/1.1" 200 74
10.434.22.334 - unauthenticated 10/Aug/2020:23:45:45 +0800 "GET /eai/random.jsp HTTP/1.1" 200 74
10.434.22.334 - unauthenticated 11/Aug/2020:23:34:45 +0800 "GET /eai/random.jsp HTTP/1.1" 200 74
10.434.22.334 - unauthenticated 12/Aug/2020:23:45:43 +0800 "GET /eai/random.jsp HTTP/1.1" 200 74
10.434.22.334 - unauthenticated 13/Aug/2020:23:43:45 +0800 "GET /eai/random.jsp HTTP/1.1" 200 74

我怎样才能做到这一点?


知道如何为我的查询编写脚本。可能的操作系统是solaris 11。请提供一些示例脚本。

答案1

这看起来像一个标准的 HTTP 访问日志,那么为什么不使用它grep来匹配您想要的日期模式呢?

$ grep '1[0-3]/Aug/2020' access_log

10.434.22.334 - unauthenticated 10/Aug/2020:23:45:45 +0800 "GET /eai/random.jsp HTTP/1.1" 200 74
10.434.22.334 - unauthenticated 10/Aug/2020:23:45:45 +0800 "GET /eai/random.jsp HTTP/1.1" 200 74
10.434.22.334 - unauthenticated 11/Aug/2020:23:34:45 +0800 "GET /eai/random.jsp HTTP/1.1" 200 74
10.434.22.334 - unauthenticated 12/Aug/2020:23:45:43 +0800 "GET /eai/random.jsp HTTP/1.1" 200 74
10.434.22.334 - unauthenticated 13/Aug/2020:23:43:45 +0800 "GET /eai/random.jsp HTTP/1.1" 200 74

grep 模式 '1[0-3]/Aug/2020' 使用范围表达式 [0-3]。此表达式匹配可以取值 0、1、2、3 的单个字符。将其与表达式的其余部分相结合,您将得到 10/Aug/2020、11/Aug/2020、12/Aug/2020 和 13/Aug/2020 作为可能的模式。grep将从日志中打印出与这些模式匹配的行。

答案2

您可以使用专门的结构化文本工具,如 Miller (https://github.com/johnkerl/miller)并运行

mlr --nidx then filter 'strftime(strptime($4,"%d/%b/%Y:%H:%M:%S"),"%Y-%m-%d") >="2020-08-11" && strftime(strptime($4,"%d/%b/%Y:%H:%M:%S"),"%Y-%m-%d") <="2020-08-13"' input.txt

具有

10.434.22.334 - unauthenticated 11/Aug/2020:23:34:45 +0800 "GET /eai/random.jsp HTTP/1.1" 200 74
10.434.22.334 - unauthenticated 12/Aug/2020:23:45:43 +0800 "GET /eai/random.jsp HTTP/1.1" 200 74
10.434.22.334 - unauthenticated 13/Aug/2020:23:43:45 +0800 "GET /eai/random.jsp HTTP/1.1" 200 74

我已经应用了一个过滤器来获取2020-08-11和之间的所有内容2020-08-13

一些注意事项:

  • --nidx设置输入和输出格式(https://bit.ly/3h3UvN3
  • filter应用过滤器;
  • strftime(strptime($4,"%d/%b/%Y:%H:%M:%S"),"%Y-%m-%d") >="2020-08-11"是过滤器之一。使用I 设置第四个字段 ( ) 的strptime输入日期格式 ( ) 。使用我更改日期格式%d/%b/%Y:%H:%M:%S$4strftime%Y-%m-%d

答案3

使用(以前称为 Perl_6)

~$ raku -e 'my $start_date = DateTime.new("2020-08-11").in-timezone(28800);   \
            my $stop_date  = DateTime.new("2020-08-13").in-timezone(28800);   \
            my @a = lines.map: *.words; my @b = do for @a {  \
                  .[0..2],   \
                  .[3..4].join.subst(/^ (\d**2) \/ (Aug) \/ (\d**4) \: /, {"$2-08-$0T"} )   \
                  .subst(/ (\+\d**2) (\d**2) $/, {"$0:$1"} ).DateTime,  \
                  .[5..*] }; \
            .put if .[1] ~~ $start_date .. $stop_date for @b;'   file

#或者:

~$ raku -e 'my $start_date = DateTime.new("2020-08-11").in-timezone(28800);  \
            my $stop_date  = DateTime.new("2020-08-13").in-timezone(28800);  \
            my @a = lines.map(*.words).map({   \
                  .[0..2], (   \
                  .[3].subst(/^ (\d**2) \/ (Aug) \/ (\d**4) \: /, {"$2-08-$0T"} ),  \
                  .[4].subst(/ (\+\d**2) (\d**2) $ /, {"$0:$1"} )).join.DateTime,   \ 
                  .[5..*] });  \ 
            .put if .[1] ~~ $start_date .. $stop_date for @a;'   file 

以上是用 Raku(Perl 编程语言家族的成员)编写的答案。 RakuISO 8601内置了 DateTime 对象。

简而言之,每行上的日期/时间都转换为ISO 8601DateTime 对象。当每个lines被拆分为以空格分隔的时words,日期/时间元素可在零索引列.[3]和中找到.[4]。这些使用substitute 命令进行转换并join创建ISO 8601DateTime 对象。然后测试每一行以查看该 DateTime 对象是否落在所需的$start .. $stop范围内。

输入示例:

10.434.22.334 - unauthenticated 10/Aug/2020:23:45:45 +0800 "GET /eai/random.jsp HTTP/1.1" 200 74
10.434.22.334 - unauthenticated 10/Aug/2020:23:45:45 +0800 "GET /eai/random.jsp HTTP/1.1" 200 74
10.434.22.334 - unauthenticated 11/Aug/2020:23:34:45 +0800 "GET /eai/random.jsp HTTP/1.1" 200 74
10.434.22.334 - unauthenticated 12/Aug/2020:23:45:43 +0800 "GET /eai/random.jsp HTTP/1.1" 200 74
10.434.22.334 - unauthenticated 13/Aug/2020:23:43:45 +0800 "GET /eai/random.jsp HTTP/1.1" 200 74
10.434.22.334 - unauthenticated 14/Aug/2020:23:33:45 +0800 "GET /eai/random.jsp HTTP/1.1" 200 74

示例输出(两个代码示例):

10.434.22.334 - unauthenticated 2020-08-11T23:34:45+08:00 "GET /eai/random.jsp HTTP/1.1" 200 74
10.434.22.334 - unauthenticated 2020-08-12T23:45:43+08:00 "GET /eai/random.jsp HTTP/1.1" 200 74

https://www.iso.org/iso-8601-date-and-time-format.html
https://docs.raku.org
https://raku.org

相关内容