我有一个大文件,其中包含类似于下图所示的日志。我想找到受该错误影响的所有交易(TR#)。我需要提取每个 TR# ID 的一次出现。
我该怎么办呢?
Apr 30 16:51:29.574 application.crit: [6104]:TR#14. Transaction send can not be sent. Error Code: 704
Apr 30 16:51:29.574 application.crit: [6104]:TR#14. Transaction send can not be sent. Error Code: 704
Apr 30 16:51:29.574 application.crit: [6104]:TR#14. Transaction send can not be sent. Error Code: 704
Apr 30 16:51:29.574 application.crit: [6104]:TR#14. Transaction send can not be sent. Error Code: 704
Apr 30 16:51:29.574 application.crit: [6104]:TR#238. Transaction send can not be sent. Error Code: 704
Apr 30 16:51:29.574 application.crit: [6104]:TR#238. Transaction send can not be sent. Error Code: 704
Apr 30 16:51:29.574 application.crit: [6104]:TR#238. Transaction send can not be sent. Error Code: 704
Apr 30 16:51:29.574 application.crit: [6104]:TR#238. Transaction send can not be sent. Error Code: 704
Apr 30 16:51:29.574 application.crit: [6104]:TR#238. Transaction send can not be sent. Error Code: 704
Apr 30 16:51:29.574 application.crit: [6104]:TR#238. Transaction send can not be sent. Error Code: 704
所需输出:
Apr 30 16:51:29.574 application.crit: [6104]:TR#14. Transaction send can not be sent. Error Code: 704
Apr 30 16:51:29.574 application.crit: [6104]:TR#238. Transaction send can not be sent. Error Code: 704
答案1
这很简单awk
:
$ awk 'c[$5]++==1' file
Apr 30 16:51:29.574 application.crit: [6104]:TR#14. Transaction send can not be sent. Error Code: 704
Apr 30 16:51:29.574 application.crit: [6104]:TR#238. Transaction send can not be sent. Error Code: 704
或者,在 Perl 中:
$ perl -ane '$k{$F[4]}++==1 && print' file
Apr 30 16:51:29.574 application.crit: [6104]:TR#14. Transaction send can not be sent. Error Code: 704
Apr 30 16:51:29.574 application.crit: [6104]:TR#238. Transaction send can not be sent. Error Code: 704
上面假设每个前面的数字TR#ID
是 ID 的一部分。如果数字可以更改但您只需要其中之一,请改用:
$ awk -F'[:.]' 'c[$7]++==1' file
或者
$ perl -F'[:.]' -ane '$k{$F[6]}++==1 && print' file
答案2
要获取并打印每条消息的第一次出现,请尝试
awk '! m[$5] {m[$5]=$0} END{for (e in m) print m[e]}'
我将示例中的时间戳设置为连续的以便对其进行测试(并且还更正了最终截断的错误值):
$ awk '! m[$5] {m[$5]=$0} END{for (e in m) print m[e]}' tr2.log
Apr 30 16:51:27.574 application.crit: [6104]:TR#14. Transaction send can not be sent. Error Code: 704
Apr 30 16:51:31.574 application.crit: [6104]:TR#238. Transaction send can not be sent. Error Code: 704
感谢@terdon
答案3
这是一个可以完成您想要的操作的 perl 脚本:
#!/usr/bin/perl
#Read each line
while ($line = <>) {
# Extract the transaction ID by looking for the text TR followed by digits
($trid) = $line =~ /.*(TR#\d+).*/ ;
# If we've not seen the ID before, print it out
unless ($trids{$trid}) {
print $line;
}
# Remember the ID so we don't print it out again
$trids{$trid} = 1;
}
当我使用您的输入调用它时,这就是我得到的:
temeraire:ul jenny$ ./extract.pl in.txt
Apr 30 16:51:29.574 application.crit: [6104]:TR#14. Transaction send can not be sent. Error Code: 704
Apr 30 16:51:29.574 application.crit: [6104]:TR#238. Transaction send can not be sent. Error Code: 704
答案4
通过 GNU sed
,从这所以回答,
sed '$!N; /^\(.*\)\n\1$/!P; D' file