我有stdout
很多看起来像这样的文本块:
% QUESTION
Who played drums for The Beatles?
% QUESTION
Who played
guitar
for The Beatles?
% QUESTION
Who played
bass for The Beatles
?
这里的想法是,文件被分为“块”,其中每个块以行 开头% QUESTION
。我想编写一个脚本来打印该数据的第 n 个块。
例如,发行nthchunk 3
应打印
Who played
bass for The Beatles
?
我该怎么做呢?
答案1
使用awk
支持正则表达式作为记录分隔符 ( RS
) 的实现(例如 GNU )awk
,您可以执行以下操作:
awk -v n=3 -v RS='(\n+|^)% QUESTION\n' 'NR == n+1 {print; exit}' < questions.txt
答案2
GNU sed采用扩展正则表达式模式-E
将用于解决这个问题。基本思想是在模式空间中累积连续的%问题行和之间的行。而计数器则以一串点的形式保留在保存空间中。
chunk=2
sed -E ':loop
/%/,/%/N
/%.*%/!{
/%/!d;$!bloop
s/$/\nfiller/
}
G;s/$/./
/\n[.]{'"${chunk}"'}$/bend
h;s/.*\n//;x
s/.*(\n.*)\n.*$/\1/;D
:end
s/^[^\n]*\n+(\S.*\S)(\n.*){2}$/\1/
q
' file
或者,perl 处于 slurp 模式,并将 FS 设置为问题行。数组@F 的元素应为块。
perl \
-F'/(?:^|\n+)\%\h+QUESTION\n+/' \
-pals -0777 \
-e '$_=$F[$n]' \
-- -n="${chunk}" ./file;
输出:
Who played
guitar
for The Beatles?
答案3
我想编写一个脚本来打印该数据的第 n 个块。
设置RS
和ORS
,您可以这样获取每个问题,例如:
这个需要用于多字符 RS 的 GNU awk
awk -v RS='% QUESTION' -v ORS='' '/\<drums\>/ {print $0}' file
Who played drums for The Beatles?
- 或者
bass
awk -v RS='% QUESTION' -v ORS='' '/\<bass\>/ {print $0}' file
Who played
bass for The Beatles
?
- 或者
guitar
:
awk -v RS='% QUESTION' -v ORS='' '/\<guitar\>/ {print $0}' file
Who played
guitar
for The Beatles?
- 或使用数字代替块
$ nchunk=3
awk -v nchunk="$nchunk" -v RS='% QUESTION' -v ORS='' 'NR==nchunk+1 {print $0}' file
Who played
bass for The Beatles
?