我有一个按段落组织的文件 ( myfile
),即用空行分隔条目。我想根据 检索其中一些段落match
。
现在,当比赛只有一场时,一切都很有趣和游戏:我只是这样做awk -v RS='' '/match/ {print}' myfile
,就像这里。问题是我在 中找到了数百个匹配项file
,我将其收集在另一个文件 ( matchfile
) 中。如果我必须只检索匹配的行,我会做一个grep -f matchfile myfile
.
有没有办法做类似grep -f
检索整个段落的事情?我的 Unix 风格不支持grep -p
.
答案1
您可以将段落变成单行,使用grep -f matchfile
结果,然后恢复换行符:
sed '/^$/s/^/\x02/' myfile | tr \\n$'\002' $'\003'\\n \
| grep -f matchfile | tr $'\003' \\n | head -n -1
head
如果输出中的尾随空行不打扰您,您可以不使用。
所以...sed
添加\x02
到每个空行的开头,然后tr
将所有换行符转换为\x03
换行符\x02
(有效地将所有段落转换为单行,其中原始行是由一些低 ascii 字符分隔的字段,这不太可能出现在文本文件中 -在这种情况下\x03
)然后grep
仅选择匹配的“线”;最后,第二个tr
恢复换行符并head
丢弃尾随的空行(您可以使用任何其他工具,例如sed \$d
)。
实际上,理解其工作原理的最简单方法是分步骤运行它:仅运行第一个命令,然后运行第一个和第二个命令,依此类推...并观察输出 - 它应该是不言自明的1。
tr
1:如果您阅读完手册后熟悉了...
答案2
加油,别这么快放弃awk
!
awk 'NR == FNR {
aMatch[NR]=$0
n=FNR
next;
}
{
RS="\n( |\t)*\n"
for(i=1; i<n+1; i++) {
if($0 ~ aMatch[i]) {
print
printf "\n"
break
}
}
}' matchFile myFile | head -n-1
不过,您可能想将其放入脚本中:
awk -f myscript.awk matchFile myFile | head -n-1
脚本形式的解决方案awk
,带有其作用的注释:
# This block's instructions will only be executed for the first file (containing the lines to be matched)
# NR = number of line read, and FNR = number of line read in current file
# So the predicate NR == FNR is only true when reading the first file !
NR == FNR {
aMatch[NR]=$0 # Store the line to match in an array
n=FNR # Store the number of matches
next; # Tells awk to skip further instructions (they are intended for the other file) and process the next record
}
# We are now processing the second file (containing the paragraphs)
{
RS="\n( |\t)*\n" # Set record separator to a blank line. Instead of a single line, a record is now the entire paragraph
for(i=1; i<n+1; i++) { # Loop on all possible matches
if($0 ~ aMatch[i]) { # If $0 (the whole record i.e. paragraph) matches a line we read in file 1 then
print # Print the record (i.e. current paragraph)
printf "\n" # Print a new line to separate them. However we will end up with a trailing newline, hence the pipe to head -n-1 to get rid of it.
break # We need to break out of the loop otherwise paragraphs with n matches will be printed n times
} # End of loop on matches
} # End of 2nd file processing
}
答案3
执行此操作非常简单:
awk -v RS="" -v ORS="\n\n" '/match/' myfile