我想extended.log
使用命令awk
、grep
和/或 将该文件中存在的所有唯一用户名存储到一个新文件中sed
。
以下是我的文件中的字段名称,以制表符分隔。我只想要该字段的值"username"
(第 12 个字段)。
"record_id" "client_id" "request_id" "date_time" "elapsed_time" "status" "size" "upload" "download" "bypassed" "client_ip" "username" "method" "url" "http_referer" "useragent" "mime" "filter_name" "filtering_reason" "interface" "cachecode" "peercode" "peer" "request_host" "request_tld" "referer_host" "referer_tld" "range" "time_profiles" "user_groups" "request_profiles" "application_signatures" "categories" "response_profiles" "upload_content_types" "download_content_types" "profiles"
以下是该文件内容的示例:
"SVZerDLJhIj6G3PA.6575.1466420105.346.1837.1" "1837" "1" "20/Jun/2016:16:25:05" "4" "200" "0" "-" "0" "-" "192.168.12.13" "[email protected]""GET" "-" "-" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)" "-" "-" "-" "192.168.14.11:8080" "TCP_MISS" "DIRECT" "safesquid" "192.168.14.11:8080" "-" "-" "-" "0" "" "NO_AUTHENTICATION" "" "" "" "" "" "" ""
"SVZerDLJhIj6G3PA.6575.1466420107.357.1838.1" "1838" "1" "20/Jun/2016:16:25:07" "4" "200" "0" "-" "0" "-" "192.168.12.13" "[email protected]""GET" "-" "-" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)" "-" "-" "-" "192.168.14.11:8080" "TCP_MISS" "DIRECT" "safesquid" "192.168.14.11:8080" "-" "-" "-" "0" "" "NO_AUTHENTICATION" "" "" "" "" "" "" ""
"SVZerDLJhIj6G3PA.6575.1466420109.367.1840.1" "1840" "1" "20/Jun/2016:16:25:09" "4" "200" "0" "-" "0" "-" "192.168.12.13" "[email protected]""GET" "-" "-" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)" "-" "-" "-" "192.168.14.11:8080" "TCP_MISS" "DIRECT" "safesquid" "192.168.14.11:8080" "-" "-" "-" "0" "" "NO_AUTHENTICATION" "" "" "" "" "" "" ""
"SVZerDLJhIj6G3PA.6575.1466420111.377.1841.1" "1841" "1" "20/Jun/2016:16:25:11" "4" "200" "0" "-" "0" "-" "192.168.12.13" "[email protected]""GET" "-" "-" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)" "-" "-" "-" "192.168.14.11:8080" "TCP_MISS" "DIRECT" "safesquid" "192.168.14.11:8080" "-" "-" "-" "0" "" "NO_AUTHENTICATION" "" "" "" "" "" "" ""
"SVZerDLJhIj6G3PA.6575.1466420113.387.1842.1" "1842" "1" "20/Jun/2016:16:25:13" "5" "200" "0" "-" "0" "-" "192.168.12.13" "[email protected]""GET" "-" "-" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)" "-" "-" "-" "192.168.14.11:8080" "TCP_MISS" "DIRECT" "safesquid" "192.168.14.11:8080" "-" "-" "-" "0" "" "NO_AUTHENTICATION" "" "" "" "" "" "" ""
"SVZerDLJhIj6G3PA.6575.1466420115.399.1843.1" "1843" "1" "20/Jun/2016:16:25:15" "5" "200" "0" "-" "0" "-" "192.168.12.13" "[email protected]""GET" "-" "-" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)" "-" "-" "-" "192.168.14.11:8080" "TCP_MISS" "DIRECT" "safesquid" "192.168.14.11:8080" "-" "-" "-" "0" "" "NO_AUTHENTICATION" "" "" "" "" "" "" ""
"SVZerDLJhIj6G3PA.6575.1466420117.410.1844.1" "1844" "1" "20/Jun/2016:16:25:17" "4" "200" "0" "-" "0" "-" "192.168.12.13" "[email protected]""GET" "-" "-" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)" "-" "-" "-" "192.168.14.11:8080" "TCP_MISS" "DIRECT" "safesquid" "192.168.14.11:8080" "-" "-" "-" "0" "" "NO_AUTHENTICATION" "" "" "" "" "" "" ""
"SVZerDLJhIj6G3PA.6575.1466420119.421.1845.1" "1845" "1" "20/Jun/2016:16:25:19" "4" "200" "0" "-" "0" "-" "192.168.12.13" "[email protected]""GET" "-" "-" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)" "-" "-" "-" "192.168.14.11:8080" "TCP_MISS" "DIRECT" "safesquid" "192.168.14.11:8080" "-" "-" "-" "0" "" "NO_AUTHENTICATION" "" "" "" "" "" "" ""
"SVZerDLJhIj6G3PA.6575.1466420121.431.1846.1" "1846" "1" "20/Jun/2016:16:25:21" "4" "200" "0" "-" "0" "-" "192.168.12.13" "[email protected]""GET" "-" "-" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)" "-" "-" "-" "192.168.14.11:8080" "TCP_MISS" "DIRECT" "safesquid" "192.168.14.11:8080" "-" "-" "-" "0" "" "NO_AUTHENTICATION" "" "" "" "" "" "" ""
"SVZerDLJhIj6G3PA.6575.1466420123.445.1847.1" "1847" "1" "20/Jun/2016:16:25:23" "4" "200" "0" "-" "0" "-" "192.168.12.13" "[email protected]""GET" "-" "-" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)" "-" "-" "-" "192.168.14.11:8080" "TCP_MISS" "DIRECT" "safesquid" "192.168.14.11:8080" "-" "-" "-" "0" "" "NO_AUTHENTICATION" "" "" "" "" "" "" ""
"SVZerDLJhIj6G3PA.6575.1466420108.240.1839.1" "1839" "1" "20/Jun/2016:16:25:23" "15623" "200" "2826" "0" "2826" "-" "192.168.0.14" "[email protected]""CONNECT" "connect://livehelp.safesquid.com:443/" "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36" "-" "-" "-" "192.168.14.11:8080" "TCP_MISS" "DIRECT" "livehelp.safesquid.com" "livehelp.safesquid.com" "safesquid.com" "-" "-" "1K-10K" "" "NO_AUTHENTICATION" "uncachable request,BUSINESS SITES REQ" "" "computersandsoftware" "" "" "" "uncachable"
答案1
尝试
sed -e 's/^.*"\([^" ]*\)"".*/\1/' log | sort | uniq
egrep -o '[^"]+@[^"]+' log | sort | uniq
在哪里
-o
只打印匹配的模式[^X]+
任意数量 (> 0) 的 char 不同于X
请注意
- sed 解决方案中继文件中的拼写错误/功能(双双引号)
- grep 解决方案中继[电子邮件受保护]图案
- awk(或 perl)更适合提取第 n 个字段。
答案2
awk
在制表符分隔的文件上使用:
awk -F '\t' '{ print $12 }' file
这将提取第 12 个字段。如果需要,可以将输出重定向到新文件。
要从数据中删除两侧的双引号,您可以使用
awk -F '\t' '{ sub("^\"", "", $12); sub("\"$", "", $12); print $12 }' file
这将执行两次替换,以在打印之前删除第 12 个字段的第一个和最后一个字符(如果它们是双引号)。
要跳过第一行(如果它是标题行):
awk -F '\t' 'FNR > 1 { sub("^\"", "", $12); sub("\"$", "", $12); print $12 }' file
要仅获取唯一的用户名,请仅使用awk
:
awk -F '\t' 'FNR > 1 && !( $12 in seen ) { seen[$12]++; sub("^\"", "", $12); sub("\"$", "", $12); print $12 }' file
这使用第 12 个字段键控的数组来跟踪已经看到的用户名。如果第12个字段中的数据不是数组中的键,那么它还没有被看到。
另一种方法是只测试 on!seen[$12]
而不是!( $12 in seen )
.
使用sort
来获取唯一的(和排序的)用户名:
awk -F '\t' 'FNR > 1 { sub("^\"", "", $12); sub("\"$", "", $12); print $12 }' file | sort -u