我有一个记录像
192.168.28.168 user82 [08/May/2010:09:52:52] "GET /NoAuth/js/titlebox-state.js HTTP/1.1" "http://www.example.com/index.html" "Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 GTB7.0"
我希望最终的输出只是显示
/NoAuth/js/titlebox-state.js HTTP/1.1
我使用这个命令,可以得到以下内容
cut -f4 example.log
"GET /NoAuth/js/titlebox-state.js HTTP/1.1"
但是,我还需要删除 ["GET],我该如何使用 cut 或 awk 或 sed 来做到这一点?
答案1
Awk
方法:
awk '{ sub(/"/, "", $6); print $5, $6 }' file
输出:
/NoAuth/js/titlebox-state.js HTTP/1.1
答案2
Sed
方法:
sed -n 's/.*"GET \([^ ]* HTTP\/[0-9\.]*\)".*/\1/p' example.log
它搜索*"GET (<no-whitespaces> HTTP/<digits-and-dots>)"*
并返回圆括号内的匹配项。
答案3
使用 Perl 正则表达式的替代方法gnu grep
:
$ echo "$a"
192.168.28.168 user82 [08/May/2010:09:52:52] "GET /NoAuth/js/titlebox-state.js HTTP/1.1" "http://www.example.com/index.html" "Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 GTB7.0"
$ echo "$a" |grep -Po '(?<=GET ).*(?=".*"http)'
/NoAuth/js/titlebox-state.js HTTP/1.1
$#or
$ echo "$a" |grep -Po '(?<=GET).*(?=".*"http)'
/NoAuth/js/titlebox-state.js HTTP/1.1 #leading space preserved
(?<=GET )
== Lookbehind for word GET
& space
.*
== 在lookbehind之后匹配任何字符零次或多次,直到lookahead
(?=".*"http)
== Lookahead for "
& any char zero or more times
&"http