access.log
对于具有模式的每一行/mypattern
:
www.example.com:80 192.0.2.17 - - [29/Sep/2017:13:49:02 +0200] "GET /mypattern?foo=bar&iptosearch=198.51.100.5
我想提取参数,并显示具有该IPiptosearch
的所有行access.log
和其中包含blah
。例子:
[29/Sep/2017:13:49:02 +0200] "GET /mypattern?foo=bar&iptosearch=198.51.100.5:
www.example3.com:80 198.51.100.5 - - [27/Sep/2017:00:00:00 +0200] "GET /hello/blah" ...
www.example2.com:80 198.51.100.5 - - [25/Sep/2017:00:00:00 +0200] "GET /blah.html" ...
www.example7.com:80 198.51.100.5 - - [12/Sep/2017:00:00:00 +0200] "GET /index.htm?i=blah" ...
[27/Sep/2017:00:00:00 +0200] "GET /mypattern?iptosearch=203.0.113.2&foo2=bar5:
www.example32.com:80 203.0.113.2 - - [15/Sep/2017:00:00:00 +0200] "GET /hello/blah" ...
www.example215.com:80 203.0.113.2 - - [14/Sep/2017:00:00:00 +0200] "GET /blah.html" ...
我正在尝试这样做:
grep -f <(grep -o 'mypattern.*iptosearch=(.*)' access.log) access.log |grep blah
但:
它可能不会像我之前的示例那样进行排序:带有标题,并且下面的列表对应于相关的
iptosearch
我的示例 ( ) 中的标题
[29/Sep/2017:13:49:02 +0200] "GET /test?foo=bar&iptosearch=198.51.100.5:
不会显示,因为它不包含blah
如何做到这一点,才能像以前一样显示?在这种情况下应该使用循环吗?
答案1
扩展巴什+grep+awk方法:
样本access.log
内容:
www.example3.com:80 198.51.100.5 - - [27/Sep/2017:00:00:00 +0200] "GET /hello/blah" ...
www.example2.com:80 198.51.100.5 - - [25/Sep/2017:00:00:00 +0200] "GET /blah.html" ...
[29/Sep/2017:13:49:02 +0200] "GET /mypattern?foo=bar&iptosearch=198.51.100.5:
www.example7.com:80 198.51.100.5 - - [12/Sep/2017:00:00:00 +0200] "GET /index.htm?i=blah" ...
www.example32.com:80 203.0.113.2 - - [15/Sep/2017:00:00:00 +0200] "GET /hello/blah" ...
[27/Sep/2017:00:00:00 +0200] "GET /mypattern?iptosearch=203.0.113.2&foo2=bar5:
www.example215.com:80 203.0.113.2 - - [14/Sep/2017:00:00:00 +0200] "GET /blah.html" ...
工作:
grep '/mypattern' access.log | while read -r l; do
if [[ $l =~ iptosearch=(([0-9]+\.){3}[0-9]+) ]]; then
echo "$l"
awk -v ip="${BASH_REMATCH[1]}" '$0~ip && /blah/;END{ print "" }' access.log
fi
done
输出:
[29/Sep/2017:13:49:02 +0200] "GET /mypattern?foo=bar&iptosearch=198.51.100.5:
www.example3.com:80 198.51.100.5 - - [27/Sep/2017:00:00:00 +0200] "GET /hello/blah" ...
www.example2.com:80 198.51.100.5 - - [25/Sep/2017:00:00:00 +0200] "GET /blah.html" ...
www.example7.com:80 198.51.100.5 - - [12/Sep/2017:00:00:00 +0200] "GET /index.htm?i=blah" ...
[27/Sep/2017:00:00:00 +0200] "GET /mypattern?iptosearch=203.0.113.2&foo2=bar5:
www.example32.com:80 203.0.113.2 - - [15/Sep/2017:00:00:00 +0200] "GET /hello/blah" ...
www.example215.com:80 203.0.113.2 - - [14/Sep/2017:00:00:00 +0200] "GET /blah.html" ...
细节:
while read -r l ...
- 迭代包含由命令/mypattern
返回的行grep
[[ $l =~ iptosearch=(([0-9]+\.){3}[0-9]+) ]]
- 将每一行$l
与正则表达式进行匹配iptosearch=(([0-9]+\.){3}[0-9]+)
。
BASH_REMATCH
是一个数组变量,其成员由“ ”二元运算符分配=~
给[[
条件命令。具有索引的元素0
是字符串中与整个正则表达式匹配的部分。具有索引的元素是字符串中与第一个括号内的n
子表达式匹配的部分。该变量是只读的。n
(...)
-v ip="${BASH_REMATCH[1]}"
- 将变量ip
传入awk脚本$0~ip && /blah/
- 仅输出包含当前ip
值和关键字的行blah
答案2
awk '/blah/ && $2 == "198.51.100.5" { print }' access.log
搜索其中包含文本 blah 的所有行。如果第二个空格分隔的字段也是“198.51.100.5”,则打印该行。