我有多行日志文件,我想将其转换为单行日志。
多行示例:
6/13/2015 12:00:47 AM - { 562} START Web
6/13/2015 12:00:47 AM - Requested Web connection from 123.125.71.103 [123.125.71.103], ID=562
6/13/2015 12:01:24 AM - { 563} START POP3
6/13/2015 12:01:24 AM - Requested POP3 connection from 10.127.251.37 [10.127.251.37], ID=563
6/13/2015 12:01:24 AM - ( 563) USER [email protected]
6/13/2015 12:01:24 AM - POP3 connection with 10.127.251.37 [10.127.251.37] ended. ID=563
6/13/2015 12:01:24 AM - { 563} END POP3
6/13/2015 12:01:24 AM - { 564} START POP3
6/13/2015 12:01:24 AM - Requested POP3 connection from 10.127.251.37 [10.127.251.37], ID=564
6/13/2015 12:01:24 AM - ( 564) USER [email protected]
6/13/2015 12:01:24 AM - POP3 connection with 10.127.251.37 [10.127.251.37] ended. ID=564
6/13/2015 12:01:24 AM - { 564} END POP3
6/13/2015 12:01:40 AM - Web connection with 123.125.71.103 [123.125.71.103] ended. ID=562
6/13/2015 12:01:40 AM - { 562} END Web
首先,我想要这样的单行输出,其中我匹配相同的日志 ID(例如“562”)。
6/13/2015 12:00:47 AM - { 562} START Web 6/13/2015 12:00:47 AM - Requested Web connection from 123.125.71.103 [123.125.71.103], ID=562 6/13/2015 12:01:40 AM - Web connection with 123.125.71.103 [123.125.71.103] ended. ID=562 6/13/2015 12:01:40 AM - { 562} END Web
6/13/2015 12:01:24 AM - { 563} START POP3 6/13/2015 12:01:24 AM - Requested POP3 connection from 10.127.251.37 [10.127.251.37], ID=563 6/13/2015 12:01:24 AM - ( 563) USER [email protected] 6/13/2015 12:01:24 AM - POP3 connection with 10.127.251.37 [10.127.251.37] ended. ID=563 6/13/2015 12:01:24 AM - { 563} END POP3
6/13/2015 12:01:24 AM - { 564} START POP3 6/13/2015 12:01:24 AM - Requested POP3 connection from 10.127.251.37 [10.127.251.37], ID=564 6/13/2015 12:01:24 AM - ( 564) USER [email protected] 6/13/2015 12:01:24 AM - POP3 connection with 10.127.251.37 [10.127.251.37] ended. ID=564 6/13/2015 12:01:24 AM - { 564} END POP3
我已经完成了以下 bash 脚本,但该脚本未按预期工作,因为它将所有“POP3”或“Web”消息合并到单行,而不是根据消息 ID 分隔它们。
脚本:
#!/bin/bash
HOME=/var/tmp/test.txt
ID=`((awk '$6 ~/[0-9]\W/ {print $6}' $HOME | awk '{gsub (/)/, ""); print}' | awk '{gsub (/}/, ""); print}') && (awk '$11 ~/[0-9]/ {print $11}' $HOME | awk '{gsub ("ID=", ""); print}'))`
for ID in $HOME
do
awk '!/Web/' $HOME | xargs >> final.txt
awk '/Web/' $HOME | xargs >> final.txt
done
有什么建议我应该如何创建循环来仅合并相同的 ID?
答案1
你可以在 awk 中完成整个事情。下面结合了读取的 ID。
awk '{
line = $0;
# ID is { XXX } or ( XXX )
if ( /[{(] *[0-9]+[})]/ ) {
id = $0;
sub(/ *[})].*/,"", id);
sub(/.*[({] */,"", id);
}
# ID is ID=XXX
else if ( $NF ~ /ID=/ ) {
id = $NF;
sub(/[^=]*=/,"",id);
}
# else ID= previous value
# save line into a assoc. array of IDs
final[id] = final[id]""line" "; # add space between lines
}
END {
# print foreach id
for ( id in final ) {
print final[id];
}
}
' /var/tmp/text.txt
您可以减少多余的信息,例如打印 ID,而只使用前缀,例如
# remove ID
sub(/ID=[0-9]/,"",id);
sub(/[({] *[0-9]+[})]/,"",id);
END {
# print foreach id
for ( id in final ) {
#Print ID then the rest of the line
printf("[ID=%d]: %s\n", id, final[id]);
}
}
答案2
基于@mikeserv 方法,我得到以下输出。
脚本:
( sed -e'y/)},={/(((((/' \
-e's/-\([^(I]*\)[^0-9]*\([0-9]*\)[( ]*/- \2 -\1/;=' |
paste -d- - - |
sort -t- -nk3,3 -nk1,1 |
sed -e's/^[^-]*-//;:n' -e'h;$!N' \
-e's/\(-\([^-]*-\).*[^ ]\) *\n\([^-]*-\)\{2\}\2/\1 - \3/;tn' \
-ex\;:t -e's/\(\([^-]*-\)[^/]*\) - *\2/\1,/;tt' -e'p;g;D'
) < in.txt > out.txt
6/13/2015 12:00:47 AM - 562 - START Web, Requested Web connection from 123.125.71.103 [123.125.71.103] - 6/13/2015 12:01:40 AM - Web connection with 123.125.71.103 [123.125.71.103] ended., END Web
6/13/2015 12:01:24 AM - 563 - START POP3, Requested POP3 connection from 10.127.251.37 [10.127.251.37], +OK ArGoSoft Mail Server Pro for WinNT/2000/XP( Version 1.8 (1.8.9.6( - 6/13/2015 12:01:24 AM - CAPA, -ERR Unknown command, USER [email protected], +OK Password required for [email protected], PASS XXXXXXXXX, +OK Mailbox locked and ready, Adding address to POP Before SMTP manager, STAT, +OK 178 97537344, UIDL, +OK, ., LIST, +OK, ., QUIT, +OK Aba he, POP3 connection with 10.127.251.37 [10.127.251.37] ended., END POP3
6/13/2015 12:04:25 AM - 564 - START POP3, Requested POP3 connection from 10.127.251.37 [10.127.251.37], +OK ArGoSoft Mail Server Pro for WinNT/2000/XP( Version 1.8 (1.8.9.6( - 6/13/2015 12:04:25 AM - CAPA, -ERR Unknown command, USER [email protected], +OK Password required for [email protected], PASS XXXXXXXXX, +OK Mailbox locked and ready, Adding address to POP Before SMTP manager, STAT, +OK 178 97537344, UIDL, +OK, ., LIST, +OK, . - 6/13/2015 12:04:26 AM - QUIT, +OK Aba he, POP3 connection with 10.127.251.37 [10.127.251.37] ended., END POP3
6/13/2015 12:04:36 AM - 565 - START Web, Requested Web connection from 31.133.9.16 [31.133.9.16], Web connection with 31.133.9.16 [31.133.9.16] ended., END Web
6/13/2015 12:07:26 AM - 566 - START POP3, Requested POP3 connection from 10.127.251.37 [10.127.251.37], +OK ArGoSoft Mail Server Pro for WinNT/2000/XP( Version 1.8 (1.8.9.6( - 6/13/2015 12:04:25 AM - CAPA, -ERR Unknown command, USER [email protected], +OK Password required for [email protected], PASS XXXXXXXXX, +OK Mailbox locked and ready, Adding address to POP Before SMTP manager, STAT, +OK 178 97537344, UIDL, +OK, ., LIST, +OK, . - 6/13/2015 12:04:26 AM - QUIT, +OK Aba he, POP3 connection with 10.127.251.37 [10.127.251.37] ended., END POP3
您可以看到,在此示例中的第 4 行中,我们缺少“与 31.133.9.16 [31.133.9.16] 的 Web 连接结束”之前的时间戳。对于以“Web 连接......”开头的所有类似日志,这将是相同的问题。对于包含 POP3 消息的所有其他日志,一切正常。
我应该如何修改 sed 命令以包括所有剩余“Web 连接...”消息的时间戳,而不仅仅是第一个消息?
答案3
如果您仅依靠时间戳就可以了,那么以下内容就足够了:
sed -e:n -e'$!N;s/^\(\([^-]*-\).*\)\n *\2/\1:::/;tn' -eP\;D <in >out
它将递归地将N
ext 行附加到当前行,并且,如果当前行开头的所有字符(包括第一个-
破折号)都可以匹配附加行的开头,则将两者连接起来,附加时间戳为已删除。如果s///
替换t
成功,sed
将分支回:n
另一N
行的标签,否则所有挂起的合并数据P
在删除之前都会打印到标准输出D
,并sed
从顶部重新开始重试。
根据您的示例数据,它会打印:
6/13/2015 12:00:47 AM - { 562} START Web ::: Requested Web connection from 123.125.71.103 [123.125.71.103], ID=562
6/13/2015 12:01:24 AM - { 563} START POP3 ::: Requested POP3 connection from 10.127.251.37 [10.127.251.37], ID=563 ::: ( 563) USER [email protected] ::: POP3 connection with 10.127.251.37 [10.127.251.37] ended. ID=563 ::: { 563} END POP3::: { 564} START POP3 ::: Requested POP3 connection from 10.127.251.37 [10.127.251.37], ID=564 ::: ( 564) USER [email protected] ::: POP3 connection with 10.127.251.37 [10.127.251.37] ended. ID=564 ::: { 564} END POP3
6/13/2015 12:01:40 AM - Web connection with 123.125.71.103 [123.125.71.103] ended. ID=562 ::: { 562} END Web
但这显然不太好。看来您想要合并ID- 对于那个很抱歉。以下确实有效 - 它还清除输入中出现的重复时间戳和 IDS。
sed -e'y/)},={/(((((/' \
-e's/-\([^(I]*\)[^0-9]*\([0-9]*\)[( ]*/- \2 -\1/;=' |
paste -d- - - |
sort -t- -nk3,3 -nk1,1 |
sed -e's/^[^-]*-//;:n' -e'h;$!N' \
-e's/\(-\([^-]*-\).*[^ ]\) *\n\([^-]*-\)\{2\}\2/\1 - \3/;tn' \
-ex\;:t -e's/\(\([^-]*-\)[^/]*\)- *\2/\1:::/;tt' -e'p;g;D'
6/13/2015 12:00:47 AM - 562 - START Web ::: Requested Web connection from 123.125.71.103 [123.125.71.103] - 6/13/2015 12:01:40 AM - Web connection with 123.125.71.103 [123.125.71.103] ended. ::: END Web
6/13/2015 12:01:24 AM - 563 - START POP3 ::: Requested POP3 connection from 10.127.251.37 [10.127.251.37] ::: USER [email protected] ::: POP3 connection with 10.127.251.37 [10.127.251.37] ended. ::: END POP3
6/13/2015 12:01:24 AM - 564 - START POP3 ::: Requested POP3 connection from 10.127.251.37 [10.127.251.37] ::: USER [email protected] ::: POP3 connection with 10.127.251.37 [10.127.251.37] ended. ::: END POP3
答案4
因此,由于第 6 个字段或前一个字段中的每行中的 ID 可以通过 ID 收集所有行,无需进行任何操作sub
awk -F"[ }=)]+" '
NF{
if($6 ~ "[0-9]{3}")
ids=$6
else
ids=$(NF-1)
if(!M[ids])
M[ids]=$0
else
M[ids]=M[ids] " " $0
}
END{
for(i in M)
print M[i]
}' /var/tmp/text.txt