使用 sed 或 grep 过滤文件中的模式

使用 sed 或 grep 过滤文件中的模式

我想extended.log使用命令awkgrep和/或 将该文件中存在的所有唯一用户名存储到一个新文件中sed

以下是我的文件中的字段名称,以制表符分隔。我只想要该字段的值"username"(第 12 个字段)。

"record_id"     "client_id"     "request_id"    "date_time"     "elapsed_time"  "status"        "size"  "upload"        "download"      "bypassed"      "client_ip"     "username"      "method"        "url"   "http_referer"  "useragent"     "mime"  "filter_name"   "filtering_reason"      "interface"     "cachecode"     "peercode"      "peer"  "request_host"  "request_tld"   "referer_host"  "referer_tld"   "range" "time_profiles" "user_groups"   "request_profiles"      "application_signatures"        "categories"    "response_profiles"     "upload_content_types"  "download_content_types"        "profiles"

以下是该文件内容的示例:

"SVZerDLJhIj6G3PA.6575.1466420105.346.1837.1"   "1837"  "1"     "20/Jun/2016:16:25:05"  "4"     "200"   "0"     "-"     "0"     "-"     "192.168.12.13" "[email protected]""GET"   "-"     "-"     "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)"   "-"     "-"     "-"     "192.168.14.11:8080"    "TCP_MISS"      "DIRECT"        "safesquid"      "192.168.14.11:8080"    "-"     "-"     "-"     "0"     ""      "NO_AUTHENTICATION"     ""      ""      ""      ""      ""      ""      ""
"SVZerDLJhIj6G3PA.6575.1466420107.357.1838.1"   "1838"  "1"     "20/Jun/2016:16:25:07"  "4"     "200"   "0"     "-"     "0"     "-"     "192.168.12.13" "[email protected]""GET"   "-"     "-"     "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)"   "-"     "-"     "-"     "192.168.14.11:8080"    "TCP_MISS"      "DIRECT"        "safesquid"      "192.168.14.11:8080"    "-"     "-"     "-"     "0"     ""      "NO_AUTHENTICATION"     ""      ""      ""      ""      ""      ""      ""
"SVZerDLJhIj6G3PA.6575.1466420109.367.1840.1"   "1840"  "1"     "20/Jun/2016:16:25:09"  "4"     "200"   "0"     "-"     "0"     "-"     "192.168.12.13" "[email protected]""GET"   "-"     "-"     "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)"   "-"     "-"     "-"     "192.168.14.11:8080"    "TCP_MISS"      "DIRECT"        "safesquid"      "192.168.14.11:8080"    "-"     "-"     "-"     "0"     ""      "NO_AUTHENTICATION"     ""      ""      ""      ""      ""      ""      ""
"SVZerDLJhIj6G3PA.6575.1466420111.377.1841.1"   "1841"  "1"     "20/Jun/2016:16:25:11"  "4"     "200"   "0"     "-"     "0"     "-"     "192.168.12.13" "[email protected]""GET"   "-"     "-"     "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)"   "-"     "-"     "-"     "192.168.14.11:8080"    "TCP_MISS"      "DIRECT"        "safesquid"      "192.168.14.11:8080"    "-"     "-"     "-"     "0"     ""      "NO_AUTHENTICATION"     ""      ""      ""      ""      ""      ""      ""
"SVZerDLJhIj6G3PA.6575.1466420113.387.1842.1"   "1842"  "1"     "20/Jun/2016:16:25:13"  "5"     "200"   "0"     "-"     "0"     "-"     "192.168.12.13" "[email protected]""GET"   "-"     "-"     "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)"   "-"     "-"     "-"     "192.168.14.11:8080"    "TCP_MISS"      "DIRECT"        "safesquid"      "192.168.14.11:8080"    "-"     "-"     "-"     "0"     ""      "NO_AUTHENTICATION"     ""      ""      ""      ""      ""      ""      ""
"SVZerDLJhIj6G3PA.6575.1466420115.399.1843.1"   "1843"  "1"     "20/Jun/2016:16:25:15"  "5"     "200"   "0"     "-"     "0"     "-"     "192.168.12.13" "[email protected]""GET"   "-"     "-"     "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)"   "-"     "-"     "-"     "192.168.14.11:8080"    "TCP_MISS"      "DIRECT"        "safesquid"      "192.168.14.11:8080"    "-"     "-"     "-"     "0"     ""      "NO_AUTHENTICATION"     ""      ""      ""      ""      ""      ""      ""
"SVZerDLJhIj6G3PA.6575.1466420117.410.1844.1"   "1844"  "1"     "20/Jun/2016:16:25:17"  "4"     "200"   "0"     "-"     "0"     "-"     "192.168.12.13" "[email protected]""GET"   "-"     "-"     "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)"   "-"     "-"     "-"     "192.168.14.11:8080"    "TCP_MISS"      "DIRECT"        "safesquid"      "192.168.14.11:8080"    "-"     "-"     "-"     "0"     ""      "NO_AUTHENTICATION"     ""      ""      ""      ""      ""      ""      ""
"SVZerDLJhIj6G3PA.6575.1466420119.421.1845.1"   "1845"  "1"     "20/Jun/2016:16:25:19"  "4"     "200"   "0"     "-"     "0"     "-"     "192.168.12.13" "[email protected]""GET"   "-"     "-"     "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)"   "-"     "-"     "-"     "192.168.14.11:8080"    "TCP_MISS"      "DIRECT"        "safesquid"      "192.168.14.11:8080"    "-"     "-"     "-"     "0"     ""      "NO_AUTHENTICATION"     ""      ""      ""      ""      ""      ""      ""
"SVZerDLJhIj6G3PA.6575.1466420121.431.1846.1"   "1846"  "1"     "20/Jun/2016:16:25:21"  "4"     "200"   "0"     "-"     "0"     "-"     "192.168.12.13" "[email protected]""GET"   "-"     "-"     "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)"   "-"     "-"     "-"     "192.168.14.11:8080"    "TCP_MISS"      "DIRECT"        "safesquid"      "192.168.14.11:8080"    "-"     "-"     "-"     "0"     ""      "NO_AUTHENTICATION"     ""      ""      ""      ""      ""      ""      ""
"SVZerDLJhIj6G3PA.6575.1466420123.445.1847.1"   "1847"  "1"     "20/Jun/2016:16:25:23"  "4"     "200"   "0"     "-"     "0"     "-"     "192.168.12.13" "[email protected]""GET"   "-"     "-"     "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)"   "-"     "-"     "-"     "192.168.14.11:8080"    "TCP_MISS"      "DIRECT"        "safesquid"      "192.168.14.11:8080"    "-"     "-"     "-"     "0"     ""      "NO_AUTHENTICATION"     ""      ""      ""      ""      ""      ""      ""
"SVZerDLJhIj6G3PA.6575.1466420108.240.1839.1"   "1839"  "1"     "20/Jun/2016:16:25:23"  "15623" "200"   "2826"  "0"     "2826"  "-"     "192.168.0.14"  "[email protected]""CONNECT"        "connect://livehelp.safesquid.com:443/" "-"     "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"    "-"     "-"     "-"     "192.168.14.11:8080"    "TCP_MISS"      "DIRECT"        "livehelp.safesquid.com"        "livehelp.safesquid.com"        "safesquid.com" "-"     "-"      "1K-10K"        ""      "NO_AUTHENTICATION"     "uncachable request,BUSINESS SITES REQ" ""      "computersandsoftware"  ""      ""      ""      "uncachable"

答案1

尝试

 sed -e 's/^.*"\([^" ]*\)"".*/\1/' log | sort | uniq

 egrep -o  '[^"]+@[^"]+' log | sort | uniq

在哪里

  • -o只打印匹配的模式
  • [^X]+任意数量 (> 0) 的 char 不同于X

请注意

  • sed 解决方案中继文件中的拼写错误/功能(双双引号)
  • grep 解决方案中继[电子邮件受保护]图案
  • awk(或 perl)更适合提取第 n 个字段。

答案2

awk在制表符分隔的文件上使用:

awk -F '\t' '{ print $12 }' file

这将提取第 12 个字段。如果需要,可以将输出重定向到新文件。

要从数据中删除两侧的双引号,您可以使用

awk -F '\t' '{ sub("^\"", "", $12); sub("\"$", "", $12); print $12 }' file

这将执行两次替换,以在打印之前删除第 12 个字段的第一个和最后一个字符(如果它们是双引号)。

要跳过第一行(如果它是标题行):

awk -F '\t' 'FNR > 1 { sub("^\"", "", $12); sub("\"$", "", $12); print $12 }' file

要仅获取唯一的用户名,请仅使用awk

awk -F '\t' 'FNR > 1 && !( $12 in seen ) { seen[$12]++; sub("^\"", "", $12); sub("\"$", "", $12); print $12 }' file

这使用第 12 个字段键控的数组来跟踪已经看到的用户名。如果第12个字段中的数据不是数组中的键,那么它还没有被看到。

另一种方法是只测试 on!seen[$12]而不是!( $12 in seen ).

使用sort来获取唯一的(和排序的)用户名:

awk -F '\t' 'FNR > 1 { sub("^\"", "", $12); sub("\"$", "", $12); print $12 }' file | sort -u

相关内容