我需要完成过滤日志文件中机器人活动的任务。
解决方案应仅显示满足以下条件的记录
- 用户登录、用户更改密码、用户在同一秒内注销。
- 这些操作(登录、更改密码、注销)相继发生,中间没有其他条目。
输入数据示例
[a lot of data]
Mon, 22 Aug 2016 13:15:39 +0200|178.57.66.225|fxsciaqulmlk| - |user logged in| -
Mon, 22 Aug 2016 13:15:39 +0200|178.57.66.225|fxsciaqulmlk| - |user changed password| -
Mon, 22 Aug 2016 13:15:39 +0200|178.57.66.225|fxsciaqulmlk| - |user logged off| -
Mon, 22 Aug 2016 13:15:42 +0200|178.57.66.225|faaaaaa11111| - |user logged in| -
Mon, 22 Aug 2016 13:15:40 +0200|178.57.66.215|terdsfsdfsdf| - |user logged in| -
Mon, 22 Aug 2016 13:15:49 +0200|178.57.66.215|terdsfsdfsdf| - |user changed password| -
Mon, 22 Aug 2016 13:15:49 +0200|178.57.66.215|terdsfsdfsdf| - |user logged off| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|erdsfsdfsdf| - |user logged in| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|erdsfsdfsdf| - |user changed password| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|erdsfsdfsdf| - |user logged off| -
Mon, 22 Aug 2016 13:17:50 +0200|178.57.66.205|abcbbabab| - |user logged in| -
Mon, 22 Aug 2016 13:17:50 +0200|178.57.66.205|abcbbabab| - |user changed password| -
Mon, 22 Aug 2016 13:17:50 +0200|178.57.66.205|abcbbabab| - |user changed profile| -
Mon, 22 Aug 2016 13:17:50 +0200|178.57.66.205|abcbbabab| - |user logged off| -
Mon, 22 Aug 2016 13:19:19 +0200|178.56.66.225|fxsciaqulmla| - |user logged in| -
Mon, 22 Aug 2016 13:19:19 +0200|178.56.66.225|fxsciaqulmla| - |user changed password| -
Mon, 22 Aug 2016 13:19:19 +0200|178.56.66.225|fxsciaqulmla| - |user logged off| -
Mon, 22 Aug 2016 13:20:42 +0200|178.57.67.225|faaaa0a11111| - |user logged in| -
[a lot of data]
我编写了下面的代码以完成任务
awk 'BEGIN { FS=" " } { c[$5]++; l[$5,c[$5]]=$0 } END { for (i in c) { if (c[i] == 3) for (j = 1 ; j <= c[i]; j++) print l[i,j] } }' $1
用法:
./parse_log.sh 日志文件.log
输出:
Mon, 22 Aug 2016 13:15:39 +0200|178.57.66.225|fxsciaqulmlk| - |user logged in| -
Mon, 22 Aug 2016 13:15:39 +0200|178.57.66.225|fxsciaqulmlk| - |user changed password| -
Mon, 22 Aug 2016 13:15:39 +0200|178.57.66.225|fxsciaqulmlk| - |user logged off| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|erdsfsdfsdf| - |user logged in| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|erdsfsdfsdf| - |user changed password| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|erdsfsdfsdf| - |user logged off| -
Mon, 22 Aug 2016 13:19:19 +0200|178.56.66.225|fxsciaqulmla| - |user logged in| -
Mon, 22 Aug 2016 13:19:19 +0200|178.56.66.225|fxsciaqulmla| - |user changed password| -
Mon, 22 Aug 2016 13:19:19 +0200|178.56.66.225|fxsciaqulmla| - |user logged off| -
但我想知道用 Perl 或 Python 编写的替代方案(外部库的使用最少)会是什么样子?
答案1
这不是一个答案,但它对于注释来说太大并且需要格式化,因此为了解决您的评论“Python 代码更容易阅读和理解它的作用。”,仅供参考,一个具有合理变量名称的 AWK 脚本这就是我所做的思考 你的Python脚本它看起来很像你的 python 脚本,但更简短,因为对于操作文本,awk 已经为你完成了你必须在 python 中编写代码才能完成的所有常见操作:
awk -v column=5 '
{ records[$column] = records[$column] $0 ORS }
END {
for ( timestamp in records ) {
if ( gsub(ORS,"&",records[timestamp]) > 2 ) {
printf "%s", records[timestamp]
}
}
}
' logfile.log
但是在处理之前将整个文件读入内存是解决此问题的一种非常低效的方法。您应该在每次时间变化时进行测试并打印:
awk -v column=5 '
$column != prev {
prt()
records = ""
prev = $column
}
{ records = records $0 ORS }
END { prt() }
function prt() {
if ( gsub(ORS,"&",records) > 2 ) {
printf "%s", records
}
}
' logfile.log
答案2
该解决方案本身是用 Python 3 编写的。
#!/usr/bin/env python3
import sys
import re
from collections import defaultdict
column_delimiter = sys.argv[1]
column = int(sys.argv[2]) - 1
records = defaultdict(list)
with open(sys.argv[3]) as inputfile:
for lines in inputfile:
line = lines.rstrip('\n')
row_record = line.split(column_delimiter)
records[row_record[column]].append(line)
for timestamps in records.values():
if len(timestamps) == 3:
for i in range(len(timestamps)):
if (re.search('logged in|changed password|logged off', timestamps[i])):
print(timestamps[i])
用法:parse_log.py ' ' 5 logfile.log
Python 代码更容易阅读和理解它的作用。
答案3
与任何awk
:
#!/usr/bin/awk -f
BEGIN { FS = "[|]" }
prvHour == $1 && prvUsr == $3 {
if ($(NF-1) == "user logged in" ||
$(NF-1) == "user changed password" ||
$(NF-1) == "user logged off" )
actions[++actionCnt] = $0
else actionCnt = 0
}
prvHour != $1 && prvUsr != $3 {
if (prvHour && actionCnt == 3)
for (i = 1; i <= actionCnt ; i++)
print actions[i]
prvHour = $1; prvUsr = $3
actionCnt = 0 ; actions[++actionCnt] = $0
}
END {
if (actionCnt == 3)
for (i = 1; i <= actionCnt; i++)
print actions[i]
}
不Perl
使用外部库:
/bin/perl -e '
while (1) {
$uli = $uli // <>;
$ucp = <> if $uli =~ /^([^|]*)[|][^|]*[|]([^|]*)[|] - [|]user logged in[|] -$/;
last if tell() < 0 ;
if (!defined $ucp) { $uli = undef ; next; }
$ulo = <> if $ucp =~ /^(\Q$1\E)[|][^|]*[|]($2)[|] - [|]user changed password[|] -$/;
last if tell() < 0;
if (!defined $ulo) { $uli = $ucp ; $ucp = undef ; next ; }
if ($ulo !~ /^\Q$1\E[|][^|]*[|]$2[|] - [|]user logged off[|] -$/) {
$uli = $ulo ; $ucp = $ulo = undef ; next ;
}
print "$uli$ucp$ulo";
$uli = $ucp = $ulo = undef;
}
' sample
使用python3
usgin 只是sys
读取文件并exit
使用有意义的值进行调用。
#!/bin/python3
import sys
try:
fullLine = actions = []
prvHour = prvUsr = None
chk_act = lambda x: x == "user logged in" or \
x == "user changed password" or \
x == "user logged off"
with open(sys.argv[1]) as logFile:
for line in logFile:
hour, _, user, _, action, _ = line.split('|')
if prvHour == hour and prvUsr == user:
fullLine.append(line.strip())
actions.append(action.strip())
elif prvUsr != user and prvHour != hour:
if len(actions) == 3 and all(map(chk_act, actions)):
print("\n".join(fullLine))
prvUsr = user
prvHour = hour
actions = []
actions.append(action.strip())
fullLine = []
fullLine.append(line.strip())
except IndexError:
print("usage {} logfile".format(sys.argv[0]))
sys.exit(1)
except (FileNotFoundError, PermissionError):
print("{} not found or permission permission denied", sys.argv[1])
sys.exit(1)
与任何sed
:
#!/bin/sed -nf
N;/^\([^|]*\)|[^|]*|\([^|]*\)| - |user logged in| -\n\1|[^|]*|\2| - |user changed password| -$/{
N;/\n\([^|]*\)|[^|]*|\([^|]*\)| - |user changed password| -\n\1|[^|]*|\2| - |user logged off| -$/{
p;b
}
s/.*\n/\n/g;D
}
D
所有解决方案都避免将整个数据存储在内存中
答案4
在 Perl 中,这可以写成一行行,但感觉有点混乱:
perl -MTime::Piece -F'\|' -ae '$epoch=Time::Piece->strptime($F[0], "%a, %d %b %Y %H:%M:%S %z")->epoch; $diff=$epochlast2 - $epoch; $last =~ /user changed password/ && $last2 =~ /user logged in/ && $_ =~ /user logged off/ && $diff==0 && print $last2, $last, $_; $epochlast2=$epochlast; $epochlast=$epoch; $last2=$last; $last=$_' <<< "$data"
Mon, 22 Aug 2016 13:15:39 +0200|178.57.66.225|fxsciaqulmlk| - |user logged in| -
Mon, 22 Aug 2016 13:15:39 +0200|178.57.66.225|fxsciaqulmlk| - |user changed password| -
Mon, 22 Aug 2016 13:15:39 +0200|178.57.66.225|fxsciaqulmlk| - |user logged off| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|erdsfsdfsdf| - |user logged in| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|erdsfsdfsdf| - |user changed password| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|erdsfsdfsdf| - |user logged off| -
Mon, 22 Aug 2016 13:19:19 +0200|178.56.66.225|fxsciaqulmla| - |user logged in| -
Mon, 22 Aug 2016 13:19:19 +0200|178.56.66.225|fxsciaqulmla| - |user changed password| -
Mon, 22 Aug 2016 13:19:19 +0200|178.56.66.225|fxsciaqulmla| - |user logged off| -
或者,作为脚本:
use warnings;
use Time::Piece;
my $epochlast2=0;
my $epochlast=0;
my $last2="";
my $last="";
while($line = <STDIN>){
$date=(split(/\|/, $line))[0];
$epoch=Time::Piece->strptime($date, "%a, %d %b %Y %H:%M:%S %z")->epoch;
$diff=$epochlast2 - $epoch;
if ($last =~ /user changed password/ && $last2 =~ /user logged in/ && $line =~ /user logged off/ && $diff==0) {
print $last2, $last, $line;
}
$epochlast2=$epochlast;
$epochlast=$epoch;
$last2=$last;
$last=$line
}