这更好地配合一个示例(是的,来自 .srt 文件):
231
00:13:35,230 --> 00:13:37,120
- Oh, my sister got me into it.
232
00:13:37,129 --> 00:13:38,269
- Yeah?
233
00:13:37,129 --> 00:13:38,269
Is that her?
234
00:13:40,049 --> 00:13:41,090
- Yeah.
该线00:13:37,129 --> 00:13:38,269
出现两次,我想连接这两个部分。所以它必须像这样工作:
- 检查所有包含“ --> ”的行
- 如果它与之前的发现匹配,则删除此行和上面的两行
所以结果是:
231
00:13:35,230 --> 00:13:37,120
- Oh, my sister got me into it.
232
00:13:37,129 --> 00:13:38,269
- Yeah?
Is that her?
234
00:13:40,049 --> 00:13:41,090
- Yeah.
这远远超出了我的sed
能力范围。它可能适用于内部缓冲区和模式空间?好吧,我什至不知道如何解决这个问题......
答案1
我会使用 awk 来实现:
$ cat tst.awk
(!NF) { # blank line
b = ""; f = 1 # empty buffer, start buffering
}
/-->/ { # timestamp
f = 0 # stop buffering
if (p == $0) { # same timestamp
next # discard buffer, start over
}
p = $0 # save timestamp
printf "%s", b # print buffer
}
f { # buffering enabled
b = (b $0 ORS) # buffer line
next # start over
}
1 # print line
输出:
$ awk -f tst.awk file
231
00:13:35,230 --> 00:13:37,120
- Oh, my sister got me into it.
232
00:13:37,129 --> 00:13:38,269
- Yeah?
Is that her?
234
00:13:40,049 --> 00:13:41,090
- Yeah.
答案2
我认为 awk 版本要好得多,但这里有一个 bash 版本只是为了好玩:)
out="";
while read line;
do
if [ "$prevtime" != "$line" ];then
out="${out}${line}\n";
else
out="$(echo -e "${out}"|head -n -2)\n";
fi ;
echo "${line}" |grep -q "\-\->" && prevtime=$line ;
done <test.srt ; echo -e "$out"