Grep 匹配并提取

Question 1

使用grep -o，您必须完全匹配您想要提取的内容。由于您不想提取字符串proto=，因此不应匹配它。

tcp匹配或udp后跟斜杠和一些非空字母数字字符串的扩展正则表达式是

(tcp|udp)/[[:alnum:]]+

将其应用于您的数据：

$ grep -E -o '(tcp|udp)/[[:alnum:]]+' file
tcp/http
tcp/https
udp/dns

为了确保我们只在以字符串开头的行上执行此操作proto=：

grep '^proto=' file | grep -E -o '(tcp|udp)/[[:alnum:]]+'

使用，删除第一个空白字符sed之前和之后的所有内容：=

$ sed 's/^[^=]*=//; s/[[:blank:]].*//' file
tcp/http
tcp/https
udp/dns

为了确保我们只在以 string 开头的行上执行此操作，您可以插入与上面proto=相同的预处理步骤，或者您可以使用grep

sed -n '/^proto=/{ s/^[^=]*=//; s/[[:blank:]].*//; p; }' file

在这里，我们使用选项抑制默认输出-n，然后仅当该行匹配时才触发替换并显式打印该行^proto=。

对于awk，使用默认的字段分隔符，然后拆分第一个字段=并打印它的第二位：

$ awk '{ split($1, a, "="); print a[2] }' file
tcp/http
tcp/https
udp/dns

为了确保我们只在以 string 开头的行上执行此操作，您可以插入与上面proto=相同的预处理步骤，或者您可以使用grep

awk '/^proto=/ { split($1, a, "="); print a[2] }' file

Answer

使用grep -o，您必须完全匹配您想要提取的内容。由于您不想提取字符串proto=，因此不应匹配它。

tcp匹配或udp后跟斜杠和一些非空字母数字字符串的扩展正则表达式是

(tcp|udp)/[[:alnum:]]+

将其应用于您的数据：

$ grep -E -o '(tcp|udp)/[[:alnum:]]+' file
tcp/http
tcp/https
udp/dns

为了确保我们只在以字符串开头的行上执行此操作proto=：

grep '^proto=' file | grep -E -o '(tcp|udp)/[[:alnum:]]+'

使用，删除第一个空白字符sed之前和之后的所有内容：=

$ sed 's/^[^=]*=//; s/[[:blank:]].*//' file
tcp/http
tcp/https
udp/dns

为了确保我们只在以 string 开头的行上执行此操作，您可以插入与上面proto=相同的预处理步骤，或者您可以使用grep

sed -n '/^proto=/{ s/^[^=]*=//; s/[[:blank:]].*//; p; }' file

在这里，我们使用选项抑制默认输出-n，然后仅当该行匹配时才触发替换并显式打印该行^proto=。

对于awk，使用默认的字段分隔符，然后拆分第一个字段=并打印它的第二位：

$ awk '{ split($1, a, "="); print a[2] }' file
tcp/http
tcp/https
udp/dns

为了确保我们只在以 string 开头的行上执行此操作，您可以插入与上面proto=相同的预处理步骤，或者您可以使用grep

awk '/^proto=/ { split($1, a, "="); print a[2] }' file

Question 2

如果您使用 GNU grep （对于-P选项），您可以使用：

$ grep -oP 'proto=\K[^ ]*' file
tcp/http
tcp/https
udp/dns

这里我们匹配proto=字符串，以确保我们提取正确的列，但随后我们使用标志将其从输出中丢弃\K。

上面假设列是用空格分隔的。如果制表符也是有效的分隔符，您将使用它\S来匹配非空白字符，因此命令将是：

grep -oP 'proto=\K\S*' file

如果您还想防止匹配字段，其中proto=是子字符串，例如 a thisisnotaproto=tcp/https，您可以添加单词边界，\b如下所示：

grep -oP '\bproto=\K\S*' file

Answer

如果您使用 GNU grep （对于-P选项），您可以使用：

$ grep -oP 'proto=\K[^ ]*' file
tcp/http
tcp/https
udp/dns

这里我们匹配proto=字符串，以确保我们提取正确的列，但随后我们使用标志将其从输出中丢弃\K。

上面假设列是用空格分隔的。如果制表符也是有效的分隔符，您将使用它\S来匹配非空白字符，因此命令将是：

grep -oP 'proto=\K\S*' file

如果您还想防止匹配字段，其中proto=是子字符串，例如 a thisisnotaproto=tcp/https，您可以添加单词边界，\b如下所示：

grep -oP '\bproto=\K\S*' file

Question 3

使用awk：

awk '$1 ~ "proto" { sub(/proto=/, ""); print $1 }' input

$1 ~ "proto"proto将确保我们只对第一列中的行采取行动

sub(/proto=/, "")proto=将从输入中删除

print $1打印剩余的列

$ awk '$1 ~ "proto" { sub(/proto=/, ""); print $1 }' input
tcp/http
tcp/https
udp/dns

Answer

使用awk：

awk '$1 ~ "proto" { sub(/proto=/, ""); print $1 }' input

$1 ~ "proto"proto将确保我们只对第一列中的行采取行动

sub(/proto=/, "")proto=将从输入中删除

print $1打印剩余的列

$ awk '$1 ~ "proto" { sub(/proto=/, ""); print $1 }' input
tcp/http
tcp/https
udp/dns

Question 4

只是另一个grep解决方案：

grep -o '[^=/]\+/[^ ]\+' file

以及类似的sed仅打印匹配的捕获组：

sed -n 's/.*=\([^/]\+\/[^ ]\+\).*/\1/p' file

Answer

只是另一个grep解决方案：

grep -o '[^=/]\+/[^ ]\+' file

以及类似的sed仅打印匹配的捕获组：

sed -n 's/.*=\([^/]\+\/[^ ]\+\).*/\1/p' file

相关内容