从文本文件中删除特定的“单词”

Question 1

.srt 文件是带有 .srt 扩展名的 .txt 文件，因此您可以在 Gedit 文本编辑器中打开 .srt 文件，并轻松删除文本字符串，如<i>或</i>使用搜索->代替（这开启了一个新的代替窗口）->全部替换。

你也可以在 Gnome Subtitiles 应用程序中替换文本字符串，但是搜索->代替Gnome Subtitles 无法识别嵌入的 HTML 标签。相反，Gnome Subtitles 的功能类似于文字处理器，允许您将文本格式化为斜体，而不显示其 HTML 标签。

Answer

.srt 文件是带有 .srt 扩展名的 .txt 文件，因此您可以在 Gedit 文本编辑器中打开 .srt 文件，并轻松删除文本字符串，如<i>或</i>使用搜索->代替（这开启了一个新的代替窗口）->全部替换。

你也可以在 Gnome Subtitiles 应用程序中替换文本字符串，但是搜索->代替Gnome Subtitles 无法识别嵌入的 HTML 标签。相反，Gnome Subtitles 的功能类似于文字处理器，允许您将文本格式化为斜体，而不显示其 HTML 标签。

Question 2

通过`sed`

但那是仅够用于过于简单HTML。。为一个更好的解决方案通过perl或html2text，见下文。

sed -i".$(date +'%s').bak" 's/<[^>]*>//g' your_input_file

解释

-i".$(date +'%s').bak"

就地编辑文件（如果提供了后缀则进行备份）
's/<[^>]*>//g'
- <与字符<逐字匹配
- [^>]*匹配下面列表中不存在的单个字符
  - 量词：*零次至无限次之间，尽可能多次，根据需要返回 [贪婪]
  - >列表中的单个字符>（区分大小写）
- >与字符>逐字匹配
- g修饰符：全局。所有匹配项（不返回第一个匹配项）
- s修饰符：单行。点匹配换行符

例子

输入文件foo

% cat foo
2
00:00:22,000 --> 00:00:28,074
Advertise your product or brand here
contact www.OpenSubtitles.org today

3
00:00:36,036 --> 00:00:39,096
<i>When I was a child in India,</i>

4
00:00:39,205 --> 00:00:43,005
<i>growing up in the tiny village
of Harenmahkeester,</i>

5
00:00:45,145 --> 00:00:47,238
<i>I found a voice-over machine,</i>

命令和文件的新内容foo

% sed -i".$(date +'%s').bak" 's/<[^>]*>//g' foo

% cat foo
2
00:00:22,000 --> 00:00:28,074
Advertise your product or brand here
contact www.OpenSubtitles.org today

3
00:00:36,036 --> 00:00:39,096
When I was a child in India,

4
00:00:39,205 --> 00:00:43,005
growing up in the tiny village
of Harenmahkeester,

5
00:00:45,145 --> 00:00:47,238
I found a voice-over machine,

使用 Perl

安装库

sudo apt-get install libfile-slurp-unicode-perl

创建一个 perl 文件removeTags

#!/usr/bin/perl
use HTML::Parse;
use HTML::FormatText;
use File::Slurp;

my $text = read_file($ARGV[0]);

$text =~ s/\n/<br>/g;
$plain_text = HTML::FormatText->new->format(parse_html($text));
print $plain_text

使用您的srt文件作为参数

dos2unix foo.srt; ./removeTags foo.srt | unix2dos > foo_out.srt

使用`html2text`

dos2unix foo.srt; perl -pe 's/\n/<br>/g' foo.srt | html2text | unix2dos > foo_out.srt

Answer

通过`sed`

但那是仅够用于过于简单HTML。。为一个更好的解决方案通过perl或html2text，见下文。

sed -i".$(date +'%s').bak" 's/<[^>]*>//g' your_input_file

解释

-i".$(date +'%s').bak"

就地编辑文件（如果提供了后缀则进行备份）
's/<[^>]*>//g'
- <与字符<逐字匹配
- [^>]*匹配下面列表中不存在的单个字符
  - 量词：*零次至无限次之间，尽可能多次，根据需要返回 [贪婪]
  - >列表中的单个字符>（区分大小写）
- >与字符>逐字匹配
- g修饰符：全局。所有匹配项（不返回第一个匹配项）
- s修饰符：单行。点匹配换行符

例子

输入文件foo

% cat foo
2
00:00:22,000 --> 00:00:28,074
Advertise your product or brand here
contact www.OpenSubtitles.org today

3
00:00:36,036 --> 00:00:39,096
<i>When I was a child in India,</i>

4
00:00:39,205 --> 00:00:43,005
<i>growing up in the tiny village
of Harenmahkeester,</i>

5
00:00:45,145 --> 00:00:47,238
<i>I found a voice-over machine,</i>

命令和文件的新内容foo

% sed -i".$(date +'%s').bak" 's/<[^>]*>//g' foo

% cat foo
2
00:00:22,000 --> 00:00:28,074
Advertise your product or brand here
contact www.OpenSubtitles.org today

3
00:00:36,036 --> 00:00:39,096
When I was a child in India,

4
00:00:39,205 --> 00:00:43,005
growing up in the tiny village
of Harenmahkeester,

5
00:00:45,145 --> 00:00:47,238
I found a voice-over machine,

使用 Perl

安装库

sudo apt-get install libfile-slurp-unicode-perl

创建一个 perl 文件removeTags

#!/usr/bin/perl
use HTML::Parse;
use HTML::FormatText;
use File::Slurp;

my $text = read_file($ARGV[0]);

$text =~ s/\n/<br>/g;
$plain_text = HTML::FormatText->new->format(parse_html($text));
print $plain_text

使用您的srt文件作为参数

dos2unix foo.srt; ./removeTags foo.srt | unix2dos > foo_out.srt

使用`html2text`

dos2unix foo.srt; perl -pe 's/\n/<br>/g' foo.srt | html2text | unix2dos > foo_out.srt

Question 3

你可以在 Ex 模式下使用 Vim：

ex -sc '%s/<[^>]*>//g|x' file.srt

%选择所有行
s代替
g替换每一行中的所有实例
x保存并关闭

Answer

你可以在 Ex 模式下使用 Vim：

ex -sc '%s/<[^>]*>//g|x' file.srt

%选择所有行
s代替
g替换每一行中的所有实例
x保存并关闭

从文本文件中删除特定的“单词”

答案1

答案2

通过`sed`

使用 Perl

使用`html2text`

答案3

相关内容

答案1

答案2

通过sed

使用 Perl

使用html2text

答案3

相关内容

通过`sed`

使用`html2text`