我想从 raw 中 grep 一些信息组合日志格式 阿帕奇日志:
51.254.56.62 - - [01/Jun/2016:20:49:28 +0500] "GET /vendors/jquery.slimscroll.min.js HTTP/1.1" 404 - "http://networkconfig.net/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:46.0) Gecko/20100101 Firefox/46.0"
51.254.56.62 - - [01/Jun/2016:20:49:28 +0500] "GET /jquery.fullPage.js HTTP/1.1" 304 - "http://networkconfig.net/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:46.0) Gecko/20100101 Firefox/46.0"
51.254.56.62 - - [01/Jun/2016:20:49:29 +0500] "GET /js/TweenLite.min.js HTTP/1.1" 304 - "http://networkconfig.net/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:46.0) Gecko/20100101 Firefox/46.0"
51.254.56.62 - - [01/Jun/2016:20:49:29 +0500] "GET /js/EasePack.min.js HTTP/1.1" 304 - "http://networkconfig.com/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:46.0) Gecko/20100101 Firefox/46.0"
51.254.56.62 - - [01/Jun/2016:20:49:29 +0500] "GET /js/rAF.js HTTP/1.1" 304 - "http://networkconfig.com/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:46.0) Gecko/20100101 Firefox/46.0"
51.254.56.62 - - [01/Jun/2016:20:49:29 +0500] "GET /js/networkconfig.js HTTP/1.1" 304 - "http://networkconfig.com/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:46.0) Gecko/20100101 Firefox/46.0"
182.180.10.40 - - [01/Jun/2016:20:49:29 +0500] "GET /js/rAF.js HTTP/1.1" 304 - "http://networkconfig.com/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:46.0) Gecko/20100101 Firefox/46.0"
182.180.10.40 - - [01/Jun/2016:20:49:29 +0500] "GET /js/networkconfig.js HTTP/1.1" 304 - "http://networkconfig.com/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:46.0) Gecko/20100101 Firefox/46.0"
182.180.10.40 - - [01/Jun/2016:20:49:28 +0500] "GET /vendors/jquery.slimscroll.min.js HTTP/1.1" 404 - "http://networkconfig.net/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:46.0) Gecko/20100101 Firefox/46.0"
182.180.10.40 - - [01/Jun/2016:20:49:28 +0500] "GET /jquery.fullPage.js HTTP/1.1" 304 - "http://networkconfig.net/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:46.0) Gecko/20100101 Firefox/46.0"
这就是我所做的:
awk '{ print $1,$11}' accesslog | sort | uniq -c | sort -nr | head -n 10
3 51.254.56.62 "http://networkconfig.net/"
3 51.254.56.62 "http://networkconfig.com/"
2 182.180.10.40 "http://networkconfig.net/"
2 182.180.10.40 "http://networkconfig.com/"
我想要得到的是:
Domains Hits By IP
networkconfig.net 3 hits 51.254.56.62 | 2 hits 182.180.10.40 and so on
networkconfig.com 3 hits 51.254.56.62 | 2 hits 182.180.10.40 and so on
答案1
经过修改的版本(3) 的丑陋的sh:
#!/bin/bash
{ echo "Domains Hits by IP" ; \
awk '{ print $1 gsub(/^.*:\/\/|\"|\/.*$/,"",$11) "\t" $11 }' $1 | \
sort | \
uniq -c | \
sort -k3,3 -k1,1nr | \
while n="" read a b c; do \
[ $a = 1 ] && p='' || p=s ; \
if [ "$n" = "$c" ] ; then \
echo -n " | $a hit$p $b" ; \
else echo ; \
echo -n "$c $a hit$p $b" ; \
fi ; n="$c" ; \
done ; \
echo ; \
} | \
while read a b ; do \
printf "%-30s %s\n" "$a" "$b" ; \
done
输出./ugly.sh accesslog
:
Domains Hits by IP
networkconfig.com 3 hits 51.254.56.62 | 2 hits 182.180.10.40
networkconfig.net 3 hits 51.254.56.62 | 2 hits 182.180.10.40
的输出./ugly.sh log.txt
,(OP 的数据 URL:日志.txt):
Domains Hits by IP
- 1 hit 180.76.15.138 | 1 hit 192.243.55.136
www.google.com.pk 3 hits 122.129.73.92
www.networkconfigorchard.com 2 hits 39.46.59.57 | 8 hits 39.46.6.0