sed:删除非首次出现的行和前面的行

sed:删除非首次出现的行和前面的行

这更好地配合一个示例(是的,来自 .srt 文件):

231
00:13:35,230 --> 00:13:37,120
- Oh, my sister got me into it.

232
00:13:37,129 --> 00:13:38,269
- Yeah?

233
00:13:37,129 --> 00:13:38,269
Is that her?

234
00:13:40,049 --> 00:13:41,090
- Yeah.

该线00:13:37,129 --> 00:13:38,269出现两次,我想连接这两个部分。所以它必须像这样工作:

  • 检查所有包含“ --> ”的行
  • 如果它与之前的发现匹配,则删除此行和上面的两行

所以结果是:

231
00:13:35,230 --> 00:13:37,120
- Oh, my sister got me into it.

232
00:13:37,129 --> 00:13:38,269
- Yeah?
Is that her?

234
00:13:40,049 --> 00:13:41,090
- Yeah.

这远远超出了我的sed能力范围。它可能适用于内部缓冲区和模式空间?好吧,我什至不知道如何解决这个问题......

答案1

我会使用 awk 来实现:

$ cat tst.awk
(!NF) {                # blank line
    b = ""; f = 1      # empty buffer, start buffering
}
/-->/ {                # timestamp
    f = 0              # stop buffering
    if (p == $0) {     # same timestamp
        next           # discard buffer, start over
    }
    p = $0             # save timestamp
    printf "%s", b     # print buffer
}
f {                    # buffering enabled
    b = (b $0 ORS)     # buffer line
    next               # start over
}
1                      # print line

输出:

$ awk -f tst.awk file
231
00:13:35,230 --> 00:13:37,120
- Oh, my sister got me into it.

232
00:13:37,129 --> 00:13:38,269
- Yeah?
Is that her?

234
00:13:40,049 --> 00:13:41,090
- Yeah.

答案2

我认为 awk 版本要好得多,但这里有一个 bash 版本只是为了好玩:)

out="";
while read line;
do
    if [ "$prevtime" != "$line" ];then
        out="${out}${line}\n";
    else
        out="$(echo -e "${out}"|head -n -2)\n";
    fi ;
    echo  "${line}" |grep -q  "\-\->" &&  prevtime=$line  ;
done <test.srt ; echo -e "$out"

相关内容