从 srt 文件中删除换行符和段落分隔符

从 srt 文件中删除换行符和段落分隔符

我使用此脚本从字幕中删除时间戳。

awk '/-->/{for(i=1;i<d;i++){print a[i]};delete a;d=0;next}{a[++d]=$0}
    END{for(i in a)print a[i]}' xxxxx.srt > xxx.txt

然后,我将结果粘贴到删除换行符和段落符的网页中。只有一个段落,它是一个空格而不是分隔符。去过那里: https://www.textfixer.com/tools/remove-line-breaks.php

我一直在寻找一种解决方案来将所有这些操作整合到一个命令中,但我找不到如何做。我知道除了 awk 之外还有其他选择,任何可以从 mac 终端轻松完成此任务的东西都适合我!

请帮忙?

这是我想格式化但不起作用的示例字幕。我看到有些字幕可以正常工作……这很奇怪。

字幕文件

预期输出:

Welcome to our program! This month’s theme is “Are You Paying Attention?” Strained relationships, illnesses, careers, entertainment —we’ll learn how to stay focused on Jehovah despite these potential distractions. We’ll see how our ministry is more effective when we focus on reaching the hearts of people. And our new song was written especially for you young adults to help you keep your eyes on the prize of life.

但这就是我从你的脚本中得到的结果:

    Welcome to our program!
 
 2
 00:00:06,089 --> 00:00:08,624
 This month’s theme is
 
 3
 00:00:08,625 --> 00:00:11,126
 “Are You Paying Attention?”
 
 4
 00:00:11,127 --> 00:00:13,595
 Strained relationships,
 
 5
 00:00:13,596 --> 00:00:16,131
 illnesses,
 

答案1

awk在“段落模式”下使用:

awk -v RS= '{
    for (i=5;i<=NF;i++){
      printf "%s%s", (sep ? " " : ""), $i
      sep=1
    }
  }
  END{ print "" }
' file.srt > file.txt

这将记录分隔符设置为空字符串,并且记录由空行分隔。每条记录的前四个字段将被跳过(字段 1 是行号,字段 2-4 是显示时间),并且除第一个字段之外的其他字段都使用前缀空格字符打印。

最后,打印一个换行符。

输入文件:

1
00:00:06,453 --> 00:00:10,579
When one chooses to walk
the Way of the Mandalore,

2
00:00:10,581 --> 00:00:14,095
you are both hunter and prey.

3
00:00:17,935 --> 00:00:20,076
There is one job.

4
00:00:20,078 --> 00:00:21,945
Underworld?

5
00:00:21,947 --> 00:00:26,118
How uncharacteristic of
one of your reputation.

输出:

When one chooses to walk the Way of the Mandalore, you are both hunter and prey. There is one job. Underworld? How uncharacteristic of one of your reputation.

相关内容