提取带有参数的请求对应的所有流量

提取带有参数的请求对应的所有流量

access.log对于具有模式的每一行/mypattern

www.example.com:80 192.0.2.17 - - [29/Sep/2017:13:49:02 +0200] "GET /mypattern?foo=bar&iptosearch=198.51.100.5  

我想提取参数,并显示具有该IPiptosearch的所有行access.log其中包含blah。例子:

 [29/Sep/2017:13:49:02 +0200] "GET /mypattern?foo=bar&iptosearch=198.51.100.5: 
    www.example3.com:80 198.51.100.5 - - [27/Sep/2017:00:00:00 +0200] "GET /hello/blah" ...
    www.example2.com:80 198.51.100.5 - - [25/Sep/2017:00:00:00 +0200] "GET /blah.html" ...
    www.example7.com:80 198.51.100.5 - - [12/Sep/2017:00:00:00 +0200] "GET /index.htm?i=blah" ...

 [27/Sep/2017:00:00:00 +0200] "GET /mypattern?iptosearch=203.0.113.2&foo2=bar5: 
    www.example32.com:80 203.0.113.2 - - [15/Sep/2017:00:00:00 +0200] "GET /hello/blah" ...
    www.example215.com:80 203.0.113.2 - - [14/Sep/2017:00:00:00 +0200] "GET /blah.html" ...

我正在尝试这样做:

grep -f <(grep -o 'mypattern.*iptosearch=(.*)' access.log) access.log |grep blah

但:

  • 它可能不会像我之前的示例那样进行排序:带有标题,并且下面的列表对应于相关的iptosearch

  • 我的示例 ( ) 中的标题[29/Sep/2017:13:49:02 +0200] "GET /test?foo=bar&iptosearch=198.51.100.5:不会显示,因为它不包含blah

如何做到这一点,才能像以前一样显示?在这种情况下应该使用循环吗?

答案1

扩展巴什+grep+awk方法:

样本access.log内容:

www.example3.com:80 198.51.100.5 - - [27/Sep/2017:00:00:00 +0200] "GET /hello/blah" ...
www.example2.com:80 198.51.100.5 - - [25/Sep/2017:00:00:00 +0200] "GET /blah.html" ...
[29/Sep/2017:13:49:02 +0200] "GET /mypattern?foo=bar&iptosearch=198.51.100.5: 
www.example7.com:80 198.51.100.5 - - [12/Sep/2017:00:00:00 +0200] "GET /index.htm?i=blah" ...
www.example32.com:80 203.0.113.2 - - [15/Sep/2017:00:00:00 +0200] "GET /hello/blah" ...
[27/Sep/2017:00:00:00 +0200] "GET /mypattern?iptosearch=203.0.113.2&foo2=bar5: 
www.example215.com:80 203.0.113.2 - - [14/Sep/2017:00:00:00 +0200] "GET /blah.html" ...

工作:

grep '/mypattern' access.log | while read -r l; do 
    if [[ $l =~ iptosearch=(([0-9]+\.){3}[0-9]+) ]]; then 
        echo "$l"
        awk -v ip="${BASH_REMATCH[1]}" '$0~ip && /blah/;END{ print "" }' access.log
    fi
done

输出:

[29/Sep/2017:13:49:02 +0200] "GET /mypattern?foo=bar&iptosearch=198.51.100.5:
www.example3.com:80 198.51.100.5 - - [27/Sep/2017:00:00:00 +0200] "GET /hello/blah" ...
www.example2.com:80 198.51.100.5 - - [25/Sep/2017:00:00:00 +0200] "GET /blah.html" ...
www.example7.com:80 198.51.100.5 - - [12/Sep/2017:00:00:00 +0200] "GET /index.htm?i=blah" ...

[27/Sep/2017:00:00:00 +0200] "GET /mypattern?iptosearch=203.0.113.2&foo2=bar5:
www.example32.com:80 203.0.113.2 - - [15/Sep/2017:00:00:00 +0200] "GET /hello/blah" ...
www.example215.com:80 203.0.113.2 - - [14/Sep/2017:00:00:00 +0200] "GET /blah.html" ...

细节:

  • while read -r l ...- 迭代包含由命令/mypattern返回的行grep

  • [[ $l =~ iptosearch=(([0-9]+\.){3}[0-9]+) ]]- 将每一行$l与正则表达式进行匹配iptosearch=(([0-9]+\.){3}[0-9]+)
    BASH_REMATCH是一个数组变量,其成员由“ ”二元运算符分配=~[[条件命令。具有索引的元素0是字符串中与整个正则表达式匹配的部分。具有索引的元素是字符串中与第一个括号内的n子表达式匹配的部分。该变量是只读的。n(...)

  • -v ip="${BASH_REMATCH[1]}"- 将变量ip传入awk脚本

  • $0~ip && /blah/- 仅输出包含当前ip值和关键字的行blah

答案2

awk '/blah/ && $2 == "198.51.100.5" { print }' access.log

搜索其中包含文本 blah 的所有行。如果第二个空格分隔的字段也是“198.51.100.5”,则打印该行。

相关内容