我有一个按 IP 地址排序的日志文件,我想查找每个唯一 IP 地址出现的次数。如何使用 bash 执行此操作?可能在 IP 旁边列出出现次数,例如:
5.135.134.16 count: 5
13.57.220.172: count 30
18.206.226 count:2
等等。
以下是日志示例:
5.135.134.16 - - [23/Mar/2019:08:42:54 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:55 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:55 -0400] "POST /wp-login.php HTTP/1.1" 200 3836 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:55 -0400] "POST /wp-login.php HTTP/1.1" 200 3988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:56 -0400] "POST /xmlrpc.php HTTP/1.1" 200 413 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:05 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:06 -0400] "POST /wp-login.php HTTP/1.1" 200 3985 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:07 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:08 -0400] "POST /wp-login.php HTTP/1.1" 200 3833 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:09 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:11 -0400] "POST /wp-login.php HTTP/1.1" 200 3836 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:12 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:15 -0400] "POST /wp-login.php HTTP/1.1" 200 3837 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:17 -0400] "POST /xmlrpc.php HTTP/1.1" 200 413 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.233.99 - - [23/Mar/2019:04:17:45 -0400] "GET / HTTP/1.1" 200 25160 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
18.206.226.75 - - [23/Mar/2019:21:58:07 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "https://www.google.com/url?3a622303df89920683e4421b2cf28977" "Mozilla/5.0 (Windows NT 6.2; rv:33.0) Gecko/20100101 Firefox/33.0"
18.206.226.75 - - [23/Mar/2019:21:58:07 -0400] "POST /wp-login.php HTTP/1.1" 200 3988 "https://www.google.com/url?3a622303df89920683e4421b2cf28977" "Mozilla/5.0 (Windows NT 6.2; rv:33.0) Gecko/20100101 Firefox/33.0"
18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
答案1
您可以使用cut
和uniq
工具:
cut -d ' ' -f1 test.txt | uniq -c
5 5.135.134.16
9 13.57.220.172
1 13.57.233.99
2 18.206.226.75
3 18.213.10.181
解释 :
cut -d ' ' -f1
:提取第一个字段(IP地址)uniq -c
:报告重复的行并显示出现的次数
答案2
如果你没有特别要求给定的输出格式,那么我建议使用已经发布的cut
+uniq
根据答案
如果你真的需要给定的输出格式,在 Awk 中执行此操作的单次方式是
awk '{c[$1]++} END{for(i in c) print i, "count: " c[i]}' log
当输入已经排序时,这有点不理想,因为它不必要将所有 IP 存储到内存中 - 在预排序的情况下执行此操作的更好(虽然更复杂uniq -c
)的方法是:
awk '
NR==1 {last=$1}
$1 != last {print last, "count: " c[last]; last = $1}
{c[$1]++}
END {print last, "count: " c[last]}
'
前任。
$ awk 'NR==1 {last=$1} $1 != last {print last, "count: " c[last]; last = $1} {c[$1]++} END{print last, "count: " c[last]}' log
5.135.134.16 count: 5
13.57.220.172 count: 9
13.57.233.99 count: 1
18.206.226.75 count: 2
18.213.10.181 count: 3
答案3
您可以使用grep
和uniq
获取地址列表,循环遍历它们并grep
再次进行计数:
for i in $(<log grep -o '^[^ ]*' | uniq); do
printf '%s count %d\n' "$i" $(<log grep -c "$i")
done
grep -o '^[^ ]*'
输出从开头 ( ^
) 到每行第一个空格的每个字符,uniq
删除重复的行,从而为您留下 IP 地址列表。由于命令替换,循环for
会循环遍历此列表,打印当前处理的 IP,然后打印“count”和计数。后者由计算得出grep -c
,它计算至少有一个匹配项的行数。
示例运行
$ for i in $(<log grep -o '^[^ ]*'|uniq);do printf '%s count %d\n' "$i" $(<log grep -c "$i");done
5.135.134.16 count 5
13.57.220.172 count 9
13.57.233.99 count 1
18.206.226.75 count 2
18.213.10.181 count 3
答案4
一些 Perl:
$ perl -lae '$k{$F[0]}++; }{ print "$_ count: $k{$_}" for keys(%k)' log
13.57.233.99 count: 1
18.206.226.75 count: 2
13.57.220.172 count: 9
5.135.134.16 count: 5
18.213.10.181 count: 3
这与Steeldriver 的 awk 方法,但在 Perl 中。-a
导致 perl 自动将每个输入行拆分为数组@F
,其第一个元素(IP)为$F[0]
。因此,$k{$F[0]}++
将创建哈希%k
,其键是 IP,其值是每个 IP 被看到的次数。}{
是 perlspeak 的时髦说法,表示“在处理完所有输入后,在最后做剩下的事情”。因此,最后,脚本将迭代哈希的键并打印当前键($_
)及其值($k{$_}
)。
并且,为了让人们不认为 perl 强迫您编写看起来像神秘涂鸦的脚本,以下是同样的事情,但形式不太简洁:
perl -e '
while (my $line=<STDIN>){
@fields = split(/ /, $line);
$ip = $fields[0];
$counts{$ip}++;
}
foreach $ip (keys(%counts)){
print "$ip count: $counts{$ip}\n"
}' < log