使用 grep 和 tail 从日志中获取唯一行

使用 grep 和 tail 从日志中获取唯一行

我有以下日志文​​件。我想从此文件中提取最后 10 个唯一条目。可以使用 grep 和 tail 来完成吗?

2016-04-18 10:13:11,925 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.7088348025036650 on 711b3fb7d875:80
2016-04-18 10:13:12,383 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.9277403071419588 on 711b3fb7d875:80
2016-04-18 10:13:14,000 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.5617050735043505 on 711b3fb7d875:80
2016-04-18 10:13:18,305 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.3502119403604215 on 711b3fb7d875:80
2016-04-18 10:13:25,571 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.1448386101904803 on 711b3fb7d875:80
2016-04-18 10:13:42,529 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.6017618280263232 on 711b3fb7d875:80
2016-04-18 10:21:20,257 (glastopf.glastopf) 150.70.188.165 requested GET / on 711b3fb7d875:80
2016-04-18 10:35:27,775 (glastopf.glastopf) 150.70.173.55 requested GET / on 711b3fb7d875:80
2016-04-18 10:44:21,799 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.8457383350172993 on 711b3fb7d875:80
2016-04-18 10:44:23,550 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.2291251627482913 on 711b3fb7d875:80
2016-04-18 10:44:24,885 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.9121516725350658 on 711b3fb7d875:80
2016-04-18 10:44:28,611 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.6517709326810913 on 711b3fb7d875:80
2016-04-18 10:44:36,656 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.3339893597346100 on 711b3fb7d875:80
2016-04-18 10:44:52,579 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.9653746532564283 on 711b3fb7d875:80
2016-04-18 11:07:15,576 (glastopf.glastopf) 204.12.196.236 requested GET / on 711b3fb7d875:80
2016-04-18 11:14:46,990 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.6323574164650954 on 711b3fb7d875:80
2016-04-18 11:14:49,798 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.1343994230148844 on 711b3fb7d875:80
2016-04-18 11:14:50,923 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.2092851733275502 on 711b3fb7d875:80
2016-04-18 11:14:54,015 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.6364011485956100 on 711b3fb7d875:80
2016-04-18 11:15:02,021 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.2105667716533854 on 711b3fb7d875:80
2016-04-18 11:15:17,763 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.5330510476532333 on 711b3fb7d875:80
2016-04-18 11:45:51,204 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.7162577798366348 on 711b3fb7d875:80
2016-04-18 11:45:51,456 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.4097472747050946 on 711b3fb7d875:80
2016-04-18 11:45:53,562 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.0435891326571879 on 711b3fb7d875:80
2016-04-18 11:45:57,368 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.9764200678378154 on 711b3fb7d875:80
2016-04-18 11:46:05,598 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.2539390798717596 on 711b3fb7d875:80
2016-04-18 11:53:59,103 (glastopf.glastopf) 150.70.173.9 requested GET / on 711b3fb7d875:80
2016-04-18 12:16:07,343 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.0022258971071879 on 711b3fb7d875:80
2016-04-18 12:16:07,411 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.6543056525672964 on 711b3fb7d875:80
2016-04-18 12:16:09,210 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.0771392409002968 on 711b3fb7d875:80
2016-04-18 12:16:21,475 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.4621648610735409 on 711b3fb7d875:80
2016-04-18 12:16:37,413 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.1810763849106982 on 711b3fb7d875:80
2016-04-18 12:46:31,160 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.0759114015016254 on 711b3fb7d875:80
2016-04-18 12:46:33,023 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.9823929541441208 on 711b3fb7d875:80
2016-04-18 12:46:42,262 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.1670975464416704 on 711b3fb7d875:80
2016-04-18 12:46:44,977 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.3061602425336546 on 711b3fb7d875:80
2016-04-18 12:47:00,555 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.5692431772822398 on 711b3fb7d875:80
2016-04-18 12:50:34,078 (glastopf.glastopf) 150.70.188.178 requested GET / on 711b3fb7d875:80

所以基本上我想要最后 10 个唯一的日志条目,通过唯一的 IP 来标识。

编辑。最后两个唯一条目示例:

2016-04-18 12:47:00,555 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.5692431772822398 on 711b3fb7d875:80
2016-04-18 12:50:34,078 (glastopf.glastopf) 150.70.188.178 requested GET / on 711b3fb7d875:80

答案1

sort在以下帮助下使用tac

sort -k4,4 file.log | tac | sort -uk4,4 | sort -k1,2

要获取最后 10 条条目,请tail -10在最后发送至:

sort -k4,4 file.log | tac | sort -uk4,4 | sort -k1,2 | tail -10
  • -ksort让我们sort用空格分隔的字段号作为键的选项

  • tac将反转输入内容的行,即最后一行放在最前面,第一行放在最后;这是必要的,因为sort -u在使用 key-wise 时将输出第一个条目作为唯一条目,sort即并非所有行都具有相似的内容,但它们在特定字段上匹配

例子:

$ sort -k4,4 file.log | tac | sort -uk4,4 | sort -k1,2
2016-04-18 10:21:20,257 (glastopf.glastopf) 150.70.188.165 requested GET / on 711b3fb7d875:80
2016-04-18 10:35:27,775 (glastopf.glastopf) 150.70.173.55 requested GET / on 711b3fb7d875:80
2016-04-18 11:07:15,576 (glastopf.glastopf) 204.12.196.236 requested GET / on 711b3fb7d875:80
2016-04-18 11:53:59,103 (glastopf.glastopf) 150.70.173.9 requested GET / on 711b3fb7d875:80
2016-04-18 12:47:00,555 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.5692431772822398 on 711b3fb7d875:80
2016-04-18 12:50:34,078 (glastopf.glastopf) 150.70.188.178 requested GET / on 711b3fb7d875:80

答案2

uniq命令可用于消除所有连续的全部或部分相同的行。默认情况下,它只对整行进行操作。也就是说,如果文件中有几行相同的连续行,则uniq删除重复项。

$ cat foo.txt 
foo
foo
foo
bar
baz
baz
foo
foo
$ uniq foo.txt 
foo
bar
baz
foo

为了删除所有重复的行,即使是不连续的行,也可以在之后运行sort

$ sort foo.txt | uniq
bar
baz
foo

可以使用一些标志以便在确定重复项时仅考虑行的一部分。这里我们只想考虑第四列中的 IP 地址,所以首先我们需要告诉它uniq忽略前三列,这是通过标志完成的-f。之后我们需要告诉它只考虑 IP 地址。这可能有点棘手,因为我们只能告诉它考虑固定数量的字符(使用标志-w),但 IP 地址的长度可以有所不同。幸运的是,这不是问题,因为 IP 地址后面总是跟着requested,所以即使在比较中包含这个词的前几个字符,也不会影响是否正确检测出一行是重复的。最后,应用于uniq -f 3 - w 15输入似乎产生了预期的结果。

另外要注意的一点是,当我们在重复检测中只考虑一部分行时,“重复”组中的所有行不需要完全相同,因此我们必须决定在输出中打印哪一行。uniq打印第一个,但可以通过先运行输入来打印最后一个tac

相关内容