如何附加与 sed 匹配的上一行匹配的一些行？

Question 1

为了代码清晰起见，我们使用GNU sed：

sed -nE '

   /^([0-9][0-9]:){2}[0-9]+[.][0-9]+/!{p;d;}

   h;:a
      $bb;n;H
   /^([0-9][0-9]:){2}[0-9]+[.][0-9]+/!ba

   :b
   x
   y/\n_/_\n/
   s/^([^_]*)_(.*)_([^_]*)$/\1 ---> \3_\2/
   y/\n_/_\n/

   p;g;$!s/^/\n/;D

' yourfile

结果

00:00:10.730 ---> 00:00:13.230
this presentation is delivered by the

00:00:13.230 ---> 00:00:14.610
Stanford center for professional

00:00:14.610 ---> 00:00:25.500
development okay so let's get started

00:00:25.500 ---> 00:00:32.399
with today's material so um welcome back

00:00:32.399
to the second lecture what I want to do

解释

我们保留从数字到下一个数字的行范围。
然后在范围末尾，最后一部分被向前推进并打印范围，同时清除模式空间并使用范围末尾来填充它，然后使用模式空间的该值，将控件转移到顶部sed 代码用于从当前范围末尾重新开始循环直到下一个数字或直到我们到达 eof。

Answer

为了代码清晰起见，我们使用GNU sed：

sed -nE '

   /^([0-9][0-9]:){2}[0-9]+[.][0-9]+/!{p;d;}

   h;:a
      $bb;n;H
   /^([0-9][0-9]:){2}[0-9]+[.][0-9]+/!ba

   :b
   x
   y/\n_/_\n/
   s/^([^_]*)_(.*)_([^_]*)$/\1 ---> \3_\2/
   y/\n_/_\n/

   p;g;$!s/^/\n/;D

' yourfile

结果

00:00:10.730 ---> 00:00:13.230
this presentation is delivered by the

00:00:13.230 ---> 00:00:14.610
Stanford center for professional

00:00:14.610 ---> 00:00:25.500
development okay so let's get started

00:00:25.500 ---> 00:00:32.399
with today's material so um welcome back

00:00:32.399
to the second lecture what I want to do

解释

我们保留从数字到下一个数字的行范围。
然后在范围末尾，最后一部分被向前推进并打印范围，同时清除模式空间并使用范围末尾来填充它，然后使用模式空间的该值，将控件转移到顶部sed 代码用于从当前范围末尾重新开始循环直到下一个数字或直到我们到达 eof。

Question 2

与单呆呆地对于相对“小”（按大小）文件的方法：

awk 'BEGIN{ RS=""; FS="[[:space:]]+" }
     {   c++; 
         a[c]["t"]=$1; 
         a[c]["s"]=substr($0,length($1)+2) 
     }
     END { 
         len=length(a); 
         for(i=1;i<=len;i++) { 
             if((i+1)<=len){ printf("%s --> %s\n%s\n\n",a[i]["t"],a[i+1]["t"],a[i]["s"]) } 
             else { printf("%s\n%s\n",a[i]["t"],a[i]["s"]) }
         } 
     }' file

输出：

00:00:10.730 --> 00:00:13.230
this presentation is delivered by the

00:00:13.230 --> 00:00:14.610
Stanford center for professional

00:00:14.610 --> 00:00:25.500
development okay so let's get started

00:00:25.500 --> 00:00:32.399
with today's material so um welcome back

00:00:32.399
to the second lecture what I want to do

Answer

与单呆呆地对于相对“小”（按大小）文件的方法：

awk 'BEGIN{ RS=""; FS="[[:space:]]+" }
     {   c++; 
         a[c]["t"]=$1; 
         a[c]["s"]=substr($0,length($1)+2) 
     }
     END { 
         len=length(a); 
         for(i=1;i<=len;i++) { 
             if((i+1)<=len){ printf("%s --> %s\n%s\n\n",a[i]["t"],a[i+1]["t"],a[i]["s"]) } 
             else { printf("%s\n%s\n",a[i]["t"],a[i]["s"]) }
         } 
     }' file

输出：

00:00:10.730 --> 00:00:13.230
this presentation is delivered by the

00:00:13.230 --> 00:00:14.610
Stanford center for professional

00:00:14.610 --> 00:00:25.500
development okay so let's get started

00:00:25.500 --> 00:00:32.399
with today's material so um welcome back

00:00:32.399
to the second lecture what I want to do

Question 3

使用 GNUsed和tac：

tac file | \
sed -E '/^[0-9]{2}:[0-9]{2}:[0-9]{2}\.[0-9]{3}$/ { H; x; s/^\n//; s/\n/ --> /; }' | \
tac

可以用传统的sed（即不带-E）来编写相同的内容，但会更冗长。

使用 GNUawk和tac：

tac file | \
gawk --re-interval '
    /^[0-9]{2}:[0-9]{2}:[0-9]{2}\.[0-9]{3} --> / { old = $1 }
    /^[0-9]{2}:[0-9]{2}:[0-9]{2}\.[0-9]{3}$/ { if(old != "") $0 = $0 " --> " old; old = $1 }
    1' | \
tac

请注意，该版本可以处理输入文件中awk的时间间隔，而该版本却被它们愚弄了。00:00:14.610 --> 00:00:25.500sed

另请注意，tac可以使用以下命令进行模拟sed：

sed -n '1!G; $p; h'

或者像这样：

sed '1!G; h; $!d'

然而，这两种形式都会将整个输入文件加载到内存中，因此它们的效率不是很高。

结果：

00:00:10.730 --> 00:00:13.230
this presentation is delivered by the

00:00:13.230 --> 00:00:14.610
Stanford center for professional

00:00:14.610 --> 00:00:25.500
development okay so let's get started

00:00:25.500 --> 00:00:32.399
with today's material so um welcome back

00:00:32.399
to the second lecture what I want to do

Answer