在每行中搜索多个模式并将其输出到新文件中

Question 1

会走多远

sed -r '/^$/d; s/^[^[]*[[]([^]]*)[]].*cs-categories="([^"]*)".*cs-host=([^ ]*) .*/\1\t\3\t\2/' file
24/09/2018:22:41:49 GMT shavar.services.mozilla.com Technology/Internet
24/09/2018:17:45:44 GMT cvshipping.ups.com  Business/Economy
24/09/2018:17:44:03 GMT blocklist.addons.mozilla.org    Software Downloads
24/09/2018:17:41:44 GMT cebwa.d2.sc.omtrdc.net  Web Ads/Analytics
20/09/2018:15:48:50 GMT data35.adlooxtracking.com   Web Ads/Analytics;Suspicious
20/09/2018:15:48:35 GMT www.google.com  Search Engines/Portals

我懂了？

sed -r '                        use extended regular expressions in the script
/^$/d                           delete empty lines

s/^[^[]*[[]([^]]*)[]].*         look for date time string between square brackets and prepare for 
                                the first "back reference"
cs-categories="([^"]*)".*       look for the string after cs-categories and prepare for second "b r"
cs-host=([^ ]*)                 look for the string after cs-host and prepare for third "b r"
.*/\1\t\3\t\2/                  create output line from back references separated by <TAB> chars.
'

Answer

会走多远

sed -r '/^$/d; s/^[^[]*[[]([^]]*)[]].*cs-categories="([^"]*)".*cs-host=([^ ]*) .*/\1\t\3\t\2/' file
24/09/2018:22:41:49 GMT shavar.services.mozilla.com Technology/Internet
24/09/2018:17:45:44 GMT cvshipping.ups.com  Business/Economy
24/09/2018:17:44:03 GMT blocklist.addons.mozilla.org    Software Downloads
24/09/2018:17:41:44 GMT cebwa.d2.sc.omtrdc.net  Web Ads/Analytics
20/09/2018:15:48:50 GMT data35.adlooxtracking.com   Web Ads/Analytics;Suspicious
20/09/2018:15:48:35 GMT www.google.com  Search Engines/Portals

我懂了？

sed -r '                        use extended regular expressions in the script
/^$/d                           delete empty lines

s/^[^[]*[[]([^]]*)[]].*         look for date time string between square brackets and prepare for 
                                the first "back reference"
cs-categories="([^"]*)".*       look for the string after cs-categories and prepare for second "b r"
cs-host=([^ ]*)                 look for the string after cs-host and prepare for third "b r"
.*/\1\t\3\t\2/                  create output line from back references separated by <TAB> chars.
'

Question 2

使用带有前瞻功能的 perl：这样主机位于类别之前或之后并不重要

perl -lne '
    m(
        ^\[ (.*?) \]                   # match the timestamp
        (?=.* cs-categories= "(.+?)")  # look ahead for the category
        (?=.* cs-host= (\S+) )         # look ahead for the host
    )x
    and print join ",", $1,$2,$3
' log.txt

Answer

使用带有前瞻功能的 perl：这样主机位于类别之前或之后并不重要

perl -lne '
    m(
        ^\[ (.*?) \]                   # match the timestamp
        (?=.* cs-categories= "(.+?)")  # look ahead for the category
        (?=.* cs-host= (\S+) )         # look ahead for the host
    )x
    and print join ",", $1,$2,$3
' log.txt

Question 3

使用 awk 怎么样：

awk '{t=""; h=""; c=""; for (i=1; i<=NF; i++) {if ($i ~ /^\[/) {t=$i} if ($i ~/^cs-host=/) {h=$i} if ($i ~ /^cs-categories=/) {c=$i}} if ((t != "") && (h != "") && (c != "")) printf("%s %s %s\n", t, h, c)}' _inputfile_

这是对您可以做什么的粗略估计。基本上它只是循环遍历给定行中的每个空白字段并检查该行是否以某个字符串开头。如果然后将该字段的值放入某个变量中。处理完所有字段后，如果所有 3 个字段都不为空，则会打印出这些字段。然后它移至输入文件中的下一行。

我没有做任何事情来处理字符串中的空格。您可以在现有检查之一中进行进一步检查，以查看字符串是否以双引号结尾。如果没有，请将下一个字段附加到变量中。

我也没有对子字符串做任何事情来摆脱诸如 [、] 和 " 之类的东西。我把它留给你作为练习。:)

Answer

使用 awk 怎么样：

awk '{t=""; h=""; c=""; for (i=1; i<=NF; i++) {if ($i ~ /^\[/) {t=$i} if ($i ~/^cs-host=/) {h=$i} if ($i ~ /^cs-categories=/) {c=$i}} if ((t != "") && (h != "") && (c != "")) printf("%s %s %s\n", t, h, c)}' _inputfile_

这是对您可以做什么的粗略估计。基本上它只是循环遍历给定行中的每个空白字段并检查该行是否以某个字符串开头。如果然后将该字段的值放入某个变量中。处理完所有字段后，如果所有 3 个字段都不为空，则会打印出这些字段。然后它移至输入文件中的下一行。

我没有做任何事情来处理字符串中的空格。您可以在现有检查之一中进行进一步检查，以查看字符串是否以双引号结尾。如果没有，请将下一个字段附加到变量中。

我也没有对子字符串做任何事情来摆脱诸如 [、] 和 " 之类的东西。我把它留给你作为练习。:)

在每行中搜索多个模式并将其输出到新文件中

答案1

答案2

答案3

相关内容