如何找出两个字符之间的所有模式？

Question 1

首先，你的grep -Po '"\K[^"]*' file想法失败了，因为将和grep都视为引号内。就我个人而言，我可能会这样做"One"". the second is here"

$ grep -oP '"[^"]+"' file | tr -d '"'
One
Two 
 Three 
Four

但这是两个命令。要使用单个命令执行此操作，您可以使用以下命令之一：

Perl
```
$ perl -lne '@F=/"\s*([^"]+)\s*"/g; print for @F' file 
One
Two 
Three 
Four
```
这里，数组保存了正则表达式的所有匹配项（一个引号，后面跟着尽可能@F多的非，直到下一个）。just 的意思是“打印的每个元素。""print for @F@F

Perl

$ perl -F'"' -lne 'for($i=1;$i<=$#F;$i+=2){print $F[$i]}' file 
One
Two 
 Three 
Four

要从每个匹配中删除前导/尾随空格，请使用以下命令：

perl -F'"' -lne 'for($i=1;$i<=$#F;$i+=2){$F[$i]=~s/^\s*|\s$//; print $F[$i]}' file

在这里，Perl 的行为类似于awk。-a开关使其根据给出的字符自动将输入行拆分为字段-F。由于我已为其指定"，因此字段为：

$ perl -F'"' -lne 'for($i=0;$i<=$#F;$i++){print "Field $i: $F[$i]"}' file 
Field 0: first matched is 
Field 1: One
Field 2: . the second is here
Field 3: Two 
Field 0: and here are in second line
Field 1:  Three 
Field 2: 
Field 3: Four
Field 4: .

因为我们要查找两个连续字段分隔符之间的文本，所以我们知道我们需要每个第二个字段。因此，for($i=1;$i<=$#F;$i+=2){print $F[$i]}将打印我们关心的那些。

想法相同，但awk：

$ awk -F'"' '{for(i=2;i<=NF;i+=2){print $(i)}}' file 
One
Two 
 Three 
Four

Answer

首先，你的grep -Po '"\K[^"]*' file想法失败了，因为将和grep都视为引号内。就我个人而言，我可能会这样做"One"". the second is here"

$ grep -oP '"[^"]+"' file | tr -d '"'
One
Two 
 Three 
Four

但这是两个命令。要使用单个命令执行此操作，您可以使用以下命令之一：

Perl
```
$ perl -lne '@F=/"\s*([^"]+)\s*"/g; print for @F' file 
One
Two 
Three 
Four
```
这里，数组保存了正则表达式的所有匹配项（一个引号，后面跟着尽可能@F多的非，直到下一个）。just 的意思是“打印的每个元素。""print for @F@F

Perl

$ perl -F'"' -lne 'for($i=1;$i<=$#F;$i+=2){print $F[$i]}' file 
One
Two 
 Three 
Four

要从每个匹配中删除前导/尾随空格，请使用以下命令：

perl -F'"' -lne 'for($i=1;$i<=$#F;$i+=2){$F[$i]=~s/^\s*|\s$//; print $F[$i]}' file

在这里，Perl 的行为类似于awk。-a开关使其根据给出的字符自动将输入行拆分为字段-F。由于我已为其指定"，因此字段为：

$ perl -F'"' -lne 'for($i=0;$i<=$#F;$i++){print "Field $i: $F[$i]"}' file 
Field 0: first matched is 
Field 1: One
Field 2: . the second is here
Field 3: Two 
Field 0: and here are in second line
Field 1:  Three 
Field 2: 
Field 3: Four
Field 4: .

因为我们要查找两个连续字段分隔符之间的文本，所以我们知道我们需要每个第二个字段。因此，for($i=1;$i<=$#F;$i+=2){print $F[$i]}将打印我们关心的那些。

想法相同，但awk：

$ awk -F'"' '{for(i=2;i<=NF;i+=2){print $(i)}}' file 
One
Two 
 Three 
Four

Question 2

关键是要使用表达式中的引号。使用单个 grep 命令很难做到这一点。以下是 perl 的一行代码：

perl -0777 -nE 'say for /"(.*?)"/sg' file

这会提取整个输入并打印出匹配的捕获部分。即使引号内有换行符，它也能正常工作，尽管这样很难区分有换行符和没有换行符的元素。为了解决这个问题，可以使用不同的字符作为输出记录分隔符，例如空字符

perl -0777 -lne 'print for /"(.*?)"/sg} BEGIN {$\="\0"' <<DATA | od -c
blah "first" blah "second
quote with newline" blah "third"
DATA

0000000   f   i   r   s   t  \0   s   e   c   o   n   d  \n   q   u   o
0000020   t   e       w   i   t   h       n   e   w   l   i   n   e  \0
0000040   t   h   i   r   d  \0
0000046

Answer

关键是要使用表达式中的引号。使用单个 grep 命令很难做到这一点。以下是 perl 的一行代码：

perl -0777 -nE 'say for /"(.*?)"/sg' file

这会提取整个输入并打印出匹配的捕获部分。即使引号内有换行符，它也能正常工作，尽管这样很难区分有换行符和没有换行符的元素。为了解决这个问题，可以使用不同的字符作为输出记录分隔符，例如空字符

perl -0777 -lne 'print for /"(.*?)"/sg} BEGIN {$\="\0"' <<DATA | od -c
blah "first" blah "second
quote with newline" blah "third"
DATA

0000000   f   i   r   s   t  \0   s   e   c   o   n   d  \n   q   u   o
0000020   t   e       w   i   t   h       n   e   w   l   i   n   e  \0
0000040   t   h   i   r   d  \0
0000046

Question 3

使用下面的 grep 单行命令就可以实现这一点，并且我假设您有平衡的引号。

grep -oP '"\s*\K[^"]+?(?=\s*"(?:[^"]*"[^"]*")*[^"]*$)' file

例子：

$ cat file
first matched is "One". the second is here"Two "
and here are in second line" Three ""Four".
$ grep -oP '"\s*\K[^"]+?(?=\s*"(?:[^"]*"[^"]*")*[^"]*$)' file
One
Two
Three
Four

通过 PCRE 动词解决另一个头发拉扯问题（*SKIP)(*F)，

$ grep -oP '[^"]+(?=(?:"[^"]*"[^"]*)*[^"]*$)(*SKIP)(*F)|\s*\K[^"]+(?=\b\s*)' file
One
Two
Three
Four

Answer

使用下面的 grep 单行命令就可以实现这一点，并且我假设您有平衡的引号。

grep -oP '"\s*\K[^"]+?(?=\s*"(?:[^"]*"[^"]*")*[^"]*$)' file

例子：

$ cat file
first matched is "One". the second is here"Two "
and here are in second line" Three ""Four".
$ grep -oP '"\s*\K[^"]+?(?=\s*"(?:[^"]*"[^"]*")*[^"]*$)' file
One
Two
Three
Four

通过 PCRE 动词解决另一个头发拉扯问题（*SKIP)(*F)，

$ grep -oP '[^"]+(?=(?:"[^"]*"[^"]*)*[^"]*$)(*SKIP)(*F)|\s*\K[^"]+(?=\b\s*)' file
One
Two
Three
Four

Question 4

使用 Python 的替代方法不需要正则表达式（尽管不太强大），即逐个字符地处理文本文件中的每一行。

其工作原理的基本思路是：如果我们看到双引号且没有升起旗帜 - 则升起旗帜，如果我们再次看到双引号且升起旗帜 - 则降下旗帜。当旗帜升起时 - 我们就知道我们在双引号内，因此我们可以存储后续字符。一旦旗帜降下，就打印我们读到的内容。

#!/usr/bin/env python
from __future__ import print_function
import sys

flag=False
quoted_string=[]
for line in sys.stdin:
    for char in line.strip():
        if char == '"':
           if flag:
               flag=False
               if quoted_string:
                  print("".join(quoted_string))
                  quoted_string=[]
           else:
               flag=True
               continue 
        if flag:
           quoted_string.append(char)

并试运行：

$ cat input.txt
first matched is "One". the second is here"Two "
and here are in second line" Three ""Four".

$ ./get_quoted_words.py < input.txt                                                                                      
One
Two 
 Three 
Four

Answer

使用 Python 的替代方法不需要正则表达式（尽管不太强大），即逐个字符地处理文本文件中的每一行。

其工作原理的基本思路是：如果我们看到双引号且没有升起旗帜 - 则升起旗帜，如果我们再次看到双引号且升起旗帜 - 则降下旗帜。当旗帜升起时 - 我们就知道我们在双引号内，因此我们可以存储后续字符。一旦旗帜降下，就打印我们读到的内容。

#!/usr/bin/env python
from __future__ import print_function
import sys

flag=False
quoted_string=[]
for line in sys.stdin:
    for char in line.strip():
        if char == '"':
           if flag:
               flag=False
               if quoted_string:
                  print("".join(quoted_string))
                  quoted_string=[]
           else:
               flag=True
               continue 
        if flag:
           quoted_string.append(char)

并试运行：

$ cat input.txt
first matched is "One". the second is here"Two "
and here are in second line" Three ""Four".

$ ./get_quoted_words.py < input.txt                                                                                      
One
Two 
 Three 
Four

如何找出两个字符之间的所有模式？

答案1

答案2

答案3

答案4

相关内容