我只想编写一个 shell 脚本,以便实现 SARG 提供的基本功能。
- 按点击次数最多的 URL 排序(10 分钟内 10k 条中的前 100 条)
- 状态/错误代码总数
- 以及按 URL 排序,占用大量带宽
- 还有一些排序函数
不幸的是,我对大多数带宽的按 URL 排序功能存在问题。我已经尝试了各种方法,但总是出现同样的问题:要么不起作用,要么将总字节数加在一起,第二个字节数... 有谁知道我该如何实现最佳效果吗?
原始 accecc.log(通用样式)
> tail /var/log/squid3/access.log
192.168.1.208 - - [10/Jan/2016:19:01:44 -0100] "CONNECT i.ytimg.com:443 HTTP/1.1" 200 143903 TCP_MISS:HIER_DIRECT
192.168.1.208 - - [10/Jan/2016:19:02:02 -0100] "CONNECT www.youtube.com:443 HTTP/1.1" 200 87392 TCP_MISS:HIER_DIRECT
192.168.1.208 - - [10/Jan/2016:19:02:12 -0100] "CONNECT s.ytimg.com:443 HTTP/1.1" 200 32718 TCP_MISS:HIER_DIRECT
192.168.1.208 - - [10/Jan/2016:19:03:00 -0100] "CONNECT s.youtube.com:443 HTTP/1.1" 200 6376 TCP_MISS:HIER_DIRECT
192.168.1.208 - - [10/Jan/2016:19:03:39 -0100] "CONNECT r2---sn-h0j7snel.googlevideo.com:443 HTTP/1.1" 200 13740382 TCP_MISS:HIER_DIRECT
192.168.1.208 - - [10/Jan/2016:19:03:40 -0100] "CONNECT r2---sn-h0j7snel.googlevideo.com:443 HTTP/1.1" 200 18250979 TCP_MISS:HIER_DIRECT
192.168.1.208 - - [10/Jan/2016:19:06:57 -0100] "CONNECT token.services.mozilla.com:443 HTTP/1.1" 200 4138 TCP_MISS:HIER_DIRECT
192.168.1.208 - - [10/Jan/2016:19:07:53 -0100] "CONNECT sync-285-us-west-2.sync.services.mozilla.com:443 HTTP/1.1" 200 4749 TCP_MISS:HIER_DIRECT
192.168.1.208 - - [10/Jan/2016:19:41:48 -0100] "CONNECT sync-285-us-west-2.sync.services.mozilla.com:443 HTTP/1.1" 200 4118 TCP_MISS:HIER_DIRECT
192.168.1.208 - - [10/Jan/2016:19:51:49 -0100] "CONNECT sync-285-us-west-2.sync.services.mozilla.com:443 HTTP/1.1" 200 4118 TCP_MISS:HIER_DIRECT
处理并保存在临时文件中
cat /tmp/bandwith.tmp
anonymousstats.keefox.org 5128
anonymousstats.keefox.org 3438
api.accounts.firefox.com:443 5509
api.flattr.com:443 4418
api.flattr.com:443 10397
blocklist.addons.mozilla.org:443 24118
button.flattr.com 4180
clients1.google.com 861
clients1.google.com 861
clients1.google.com 861
clients1.google.com 861
clients1.google.com 861
clients1.google.com 861
clients1.google.com 861
clients1.google.com 861
clients1.google.com 861
cm.g.doubleclick.net 4437
content.googleapis.com:443 4317
content.googleapis.com:443 4914
希望的形式:
anonymousstats.keefox.org 8566
api.accounts.firefox.com:443 5509
api.flattr.com:443 14815
blocklist.addons.mozilla.org:443 24118
button.flattr.com:443 4180
clients1.google.com 7749
cm.g.doubleclick.net:443 4437
content.googleapis.com:443 8754
此时我的功能:
bandwith() {
#First Idee: awk '{print $10, $7}' "$LOGDATEI" | grep -vE "(^\"-\"$|/www.$HOST|/$HOST)" | sort | uniq -c | sort -rn | head -$HITS > /tmp/bandwith.tmp
cat "$LOGDATEI" | awk '{print $10, $7}' | awk '{ sub(/http\:\/\//, ""); sub(/\//, " " ); print $2, $1 } ' | sort -d | head -$HITS > /tmp/bandwith.tmp
我试过:
while read LINE
do
cut -d' ' -f2 /tmp/bandwith.tmp { while read NR
do
x=$(($x+$NR))
echo $x
}
或者
awk '{sum+=$1}END{print sum}' foo.txt
rule1=`head -1 /tmp/bandwith.tmp | awk '{print $1}'`
rule2=`head -2 /tmp/bandwith.tmp | awk '{print $1}'`
for word in `cat /tmp/bandwith.tmp`
cat /tmp/bandwith.tmp | while read line
do
echo "Processing new line" >/dev/tty
$sum = $zeile1 + $zeile2
done
}
until [ "$rule1" != "$rule2" ]
do
echo "$1"
echo "$2"
break
echo "Only to test"
done
done
}
有人对这个问题有什么想法吗?