Bash - 提取除特定 URL 之外的所有 URL

Question 1

我认为这可以单独使用 sed 来完成：

sed -n '\,http://schemas.openxmlformats.org,!s/.*\(http:.*\).*/\1/p'

-n禁用文本自动打印，因此只能打印选定的行。
\,http://schemas.openxmlformats.org,!仅在不匹配的行上运行以下命令（因此!在最后）http://schemas.openxmlformats.org。我在这里使用了,not 作为/正则表达式分隔符，因此\,在开始时使用了。这减少了\模式中转义的需要。
该s 命令与您的相同，但我p在它后面使用了来打印该行，该行现在只包含 URL。

我假设每行只有一个 URL。

删除额外的引号可以让我正确输出：

$ sed -n '\,http://schemas.openxmlformats.org,!s/.*\(http:.*\).*/\1/p' inpu-file
http://www.yahoo.com/

Answer

我认为这可以单独使用 sed 来完成：

sed -n '\,http://schemas.openxmlformats.org,!s/.*\(http:.*\).*/\1/p'

-n禁用文本自动打印，因此只能打印选定的行。
\,http://schemas.openxmlformats.org,!仅在不匹配的行上运行以下命令（因此!在最后）http://schemas.openxmlformats.org。我在这里使用了,not 作为/正则表达式分隔符，因此\,在开始时使用了。这减少了\模式中转义的需要。
该s 命令与您的相同，但我p在它后面使用了来打印该行，该行现在只包含 URL。

我假设每行只有一个 URL。

删除额外的引号可以让我正确输出：

$ sed -n '\,http://schemas.openxmlformats.org,!s/.*\(http:.*\).*/\1/p' inpu-file
http://www.yahoo.com/

Question 2

使用grepwith-v选项使您能够选择不匹配的行。例如，给出一个file.txt包含以下内容的文件：

first line
second line
third line
fourth text

使用这个命令：

grep "line" file.txt | grep -v "second"

结果将是：

first line
third line

如果您想同时排除多个单词，您可以使用如下正则表达式：

grep "line" file.txt | grep -vE "(second|first)"

结果将是：

    third line

问题更新后：

对于这种情况，您可以使用以下方法之一：

第一种方式只会给你www.yahoo。

yahoo第二个将为您提供其中包含单词的所有 URL 。

用于提取除部分 URL 之外的所有 URL：

grep 'http://' data.txt | sed 's/.*\(http:.*\)/\1/' | grep -vE "(openxmlformats|<Another URL to exclude>)"

Answer

使用grepwith-v选项使您能够选择不匹配的行。例如，给出一个file.txt包含以下内容的文件：

first line
second line
third line
fourth text

使用这个命令：

grep "line" file.txt | grep -v "second"

结果将是：

first line
third line

如果您想同时排除多个单词，您可以使用如下正则表达式：

grep "line" file.txt | grep -vE "(second|first)"

结果将是：

    third line

问题更新后：

对于这种情况，您可以使用以下方法之一：

第一种方式只会给你www.yahoo。

yahoo第二个将为您提供其中包含单词的所有 URL 。

用于提取除部分 URL 之外的所有 URL：

grep 'http://' data.txt | sed 's/.*\(http:.*\)/\1/' | grep -vE "(openxmlformats|<Another URL to exclude>)"

相关内容