Grep 匹配文件中单词的段落

Question 1

您可以将段落变成单行，使用grep -f matchfile结果，然后恢复换行符：

sed '/^$/s/^/\x02/' myfile | tr \\n$'\002' $'\003'\\n \
| grep -f matchfile |  tr $'\003' \\n | head -n -1

head如果输出中的尾随空行不打扰您，您可以不使用。
所以...sed添加\x02到每个空行的开头，然后tr将所有换行符转换为\x03换行符\x02（有效地将所有段落转换为单行，其中原始行是由一些低 ascii 字符分隔的字段，这不太可能出现在文本文件中 -在这种情况下\x03）然后grep仅选择匹配的“线”;最后，第二个tr恢复换行符并head丢弃尾随的空行（您可以使用任何其他工具，例如sed \$d）。
实际上，理解其工作原理的最简单方法是分步骤运行它：仅运行第一个命令，然后运行第一个和第二个命令，依此类推...并观察输出 - 它应该是不言自明的¹。

^{tr1：如果您阅读完手册后熟悉了...}

Answer

您可以将段落变成单行，使用grep -f matchfile结果，然后恢复换行符：

sed '/^$/s/^/\x02/' myfile | tr \\n$'\002' $'\003'\\n \
| grep -f matchfile |  tr $'\003' \\n | head -n -1

head如果输出中的尾随空行不打扰您，您可以不使用。
所以...sed添加\x02到每个空行的开头，然后tr将所有换行符转换为\x03换行符\x02（有效地将所有段落转换为单行，其中原始行是由一些低 ascii 字符分隔的字段，这不太可能出现在文本文件中 -在这种情况下\x03）然后grep仅选择匹配的“线”;最后，第二个tr恢复换行符并head丢弃尾随的空行（您可以使用任何其他工具，例如sed \$d）。
实际上，理解其工作原理的最简单方法是分步骤运行它：仅运行第一个命令，然后运行第一个和第二个命令，依此类推...并观察输出 - 它应该是不言自明的¹。

^{tr1：如果您阅读完手册后熟悉了...}

Question 2

加油，别这么快放弃awk！

awk 'NR == FNR {
          aMatch[NR]=$0
          n=FNR
          next;
    }
    {
          RS="\n( |\t)*\n"
          for(i=1; i<n+1; i++) {
             if($0 ~ aMatch[i]) {
               print
               printf "\n"
               break                   
             }                 
          }
    }' matchFile myFile | head -n-1

不过，您可能想将其放入脚本中：

awk -f myscript.awk matchFile myFile | head -n-1

脚本形式的解决方案awk，带有其作用的注释：

# This block's instructions will only be executed for the first file (containing the lines to be matched)
# NR = number of line read, and FNR = number of line read in current file   
# So the predicate NR == FNR is only true when reading the first file !
NR == FNR {
   aMatch[NR]=$0          # Store the line to match in an array
   n=FNR                  # Store the number of matches
   next;                  # Tells awk to skip further instructions (they are intended for the other file) and process the next record
}
# We are now processing the second file (containing the paragraphs)
{
   RS="\n( |\t)*\n"          # Set record separator to a blank line. Instead of a single line, a record is now the entire paragraph
   for(i=1; i<n+1; i++) {    # Loop on all possible matches
      if($0 ~ aMatch[i]) {   # If $0 (the whole record i.e. paragraph) matches a line we read in file 1 then
         print               # Print the record (i.e. current paragraph)
         printf "\n"         # Print a new line to separate them. However we will end up with a trailing newline, hence the pipe to head -n-1 to get rid of it.
         break               # We need to break out of the loop otherwise paragraphs with n matches will be printed n times
      }                      # End of loop on matches
   }                         # End of 2nd file processing
}

Answer

加油，别这么快放弃awk！

awk 'NR == FNR {
          aMatch[NR]=$0
          n=FNR
          next;
    }
    {
          RS="\n( |\t)*\n"
          for(i=1; i<n+1; i++) {
             if($0 ~ aMatch[i]) {
               print
               printf "\n"
               break                   
             }                 
          }
    }' matchFile myFile | head -n-1

不过，您可能想将其放入脚本中：

awk -f myscript.awk matchFile myFile | head -n-1

脚本形式的解决方案awk，带有其作用的注释：

# This block's instructions will only be executed for the first file (containing the lines to be matched)
# NR = number of line read, and FNR = number of line read in current file   
# So the predicate NR == FNR is only true when reading the first file !
NR == FNR {
   aMatch[NR]=$0          # Store the line to match in an array
   n=FNR                  # Store the number of matches
   next;                  # Tells awk to skip further instructions (they are intended for the other file) and process the next record
}
# We are now processing the second file (containing the paragraphs)
{
   RS="\n( |\t)*\n"          # Set record separator to a blank line. Instead of a single line, a record is now the entire paragraph
   for(i=1; i<n+1; i++) {    # Loop on all possible matches
      if($0 ~ aMatch[i]) {   # If $0 (the whole record i.e. paragraph) matches a line we read in file 1 then
         print               # Print the record (i.e. current paragraph)
         printf "\n"         # Print a new line to separate them. However we will end up with a trailing newline, hence the pipe to head -n-1 to get rid of it.
         break               # We need to break out of the loop otherwise paragraphs with n matches will be printed n times
      }                      # End of loop on matches
   }                         # End of 2nd file processing
}

Question 3

执行此操作非常简单：

awk -v RS="" -v ORS="\n\n" '/match/' myfile

Answer

执行此操作非常简单：

awk -v RS="" -v ORS="\n\n" '/match/' myfile

Grep 匹配文件中单词的段落

答案1

答案2

答案3

相关内容