想象一个包含随机文本和两个唯一标记的文本文件
01 text text text
02 text text text
03 __DELETE_THIS_LINE_BEGIN__
04 text text text
05 text text text
06 text text text
07 text text text
08 __DELETE_THIS_LINE_END__
09 four
10 interesting
11 lines
12 follow
13 text text text
14 text text text
15 text text text
16 text text text
17 __DELETE_THIS_LINE_BEGIN__
18 text text text
19 text text text
20 text text text
21 text text text
22 __DELETE_THIS_LINE_END__
23 even
24 more
25 interesting
26 lines
我想要一个 sed/awk/perl/etc 表达式,将 END 标记后的四行有趣的行移动到以前的BEGIN 标记并删除两个标记。这应该导致:
01 text text text
02 text text text
09 four
10 interesting
11 lines
12 follow
04 text text text
05 text text text
06 text text text
07 text text text
13 text text text
14 text text text
15 text text text
16 text text text
23 even
24 more
25 interesting
26 lines
18 text text text
19 text text text
20 text text text
21 text text text
这两个标记始终是一对,并且在文件中多次出现。 BEGIN 标记始终位于 END 标记之前。
它不一定是 oneliner,我也会使用 perl 或 python 脚本。
我尝试使用 sed:
sed -e '/__DELETE_THIS_LINE_END__/,+4 {H;d};/__DELETE_THIS_LINE_BEGIN__/ x' <source.txt> > <target.txt>
...这不起作用。首先DELETE_THIS_LINE_BEGIN标记被删除(缓冲区中没有任何内容可供替换)并且第一个DELETE_THIS_LINE_END标记已移至第二个位置DELETE_THIS_LINE_BEGIN标记。
有任何想法吗?
答案1
awk:
awk '
/__DELETE_THIS_LINE_BEGIN__/ {keep=1; next}
/__DELETE_THIS_LINE_END__/ {keep=0; move=4; next}
keep {saved[++s]=$0; next}
move-- == 0 {for (i=1; i<=s; i++) print saved[i]; delete saved; s=0}
1
END {for (i=1; i<=s; i++) print saved[i]}
' file
01 text text text
02 text text text
09 four
10 interesting
11 lines
12 follow
04 text text text
05 text text text
06 text text text
07 text text text
13 text text text
14 text text text
15 text text text
16 text text text
23 even
24 more
25 interesting
26 lines
18 text text text
19 text text text
20 text text text
21 text text text
另外,使用 awk,您可以重新定义记录分隔符:
awk -v RS='\n[0-9]+ __DELETE_THIS_LINE_(BEGIN|END)__\n' '
NR%2 == 0 {saved=$0; next}
{
n=split($0, lines, "\n")
for (i=1; i<=4 && i<=n; i++) print lines[i]
if (saved) print saved
for (i=5; i<=n; i++) print lines[i]
}
' file
产生相同的结果。
答案2
您必须缓存标记之间的行,并在处理结束标记后的 4 行后插入缓存。在Python中(用2.7测试):
#! /usr/bin/env python
buffer = []
in_block = False
max_interesting_line_nr = 4
begin_marker = "__DELETE_THIS_LINE_BEGIN__"
end_marker = "__DELETE_THIS_LINE_END__"
interesting_line = 0
with open('input') as inf:
with open('output', 'w') as outf:
for line in inf:
if begin_marker in line:
in_block = True
continue
if end_marker in line:
assert in_block is True
interesting_line = max_interesting_line_nr
in_block = False
continue
if interesting_line:
outf.write(line)
interesting_line -= 1
if interesting_line == 0: # output gathered lines
for lbuf in buffer:
outf.write(lbuf)
buffer = [] # empty buffer
continue
if in_block:
buffer.append(line) # gather lines
else:
outf.write(line)