在田间砍两次

在田间砍两次

我有一个记录像

192.168.28.168  user82  [08/May/2010:09:52:52]  "GET /NoAuth/js/titlebox-state.js HTTP/1.1"     "http://www.example.com/index.html"     "Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 GTB7.0" 

我希望最终的输出只是显示

   /NoAuth/js/titlebox-state.js HTTP/1.1

我使用这个命令,可以得到以下内容

cut -f4 example.log

"GET /NoAuth/js/titlebox-state.js HTTP/1.1"

但是,我还需要删除 ["GET],我该如何使用 cut 或 awk 或 sed 来做到这一点?

答案1

Awk方法:

awk '{ sub(/"/, "", $6); print $5, $6 }' file

输出:

/NoAuth/js/titlebox-state.js HTTP/1.1

答案2

Sed方法:

sed -n 's/.*"GET \([^ ]* HTTP\/[0-9\.]*\)".*/\1/p' example.log

它搜索*"GET (<no-whitespaces> HTTP/<digits-and-dots>)"*并返回圆括号内的匹配项。

答案3

使用 Perl 正则表达式的替代方法gnu grep

$ echo "$a"
192.168.28.168  user82  [08/May/2010:09:52:52]  "GET /NoAuth/js/titlebox-state.js HTTP/1.1"     "http://www.example.com/index.html"     "Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 GTB7.0"

$ echo "$a" |grep -Po '(?<=GET ).*(?=".*"http)'
/NoAuth/js/titlebox-state.js HTTP/1.1
$#or
$ echo "$a" |grep -Po '(?<=GET).*(?=".*"http)'
 /NoAuth/js/titlebox-state.js HTTP/1.1 #leading space preserved

(?<=GET ) == Lookbehind for word GET& space
.* == 在lookbehind之后匹配任何字符零次或多次,直到lookahead
(?=".*"http)== Lookahead for "& any char zero or more times&"http

相关内容