perl 脚本应从另一个文件中删除一个文件的行中的字符串的问题

Question 1

除了您所询问的问题之外，您的脚本还有一个巨大的缺陷，即它为“remove.txt”中的每一行完整地传递了“foo”。这是极其低效的。更好的方法是读入“remove.txt”，构造一个长正则表达式，然后使用它一次来编辑“foo”。

最简单的方法是将搜索字符串推入数组，然后使用“|”“join()”数组（正则表达式“或”）字符创建可用作正则表达式的字符串。

这是一个脚本，可以执行此操作并解决您原来的问题。

#! /usr/bin/perl 

use strict;
use warnings;

# first construct a regular expression containing every
# line that needs to be removed.  This is so we only have
# to run a single pass through $infile rather than one
# pass per line in $removefile.
my @remove = ();

my $removefile='remove.txt';
open(REMFILE,"<",$removefile) || die "couldn't open $removefile: $!\n";
while(<REMFILE>) {
    chomp;
    next if (/^\s*$/);
    push @remove, $_;
};
close(REMFILE);

# choose one of the following two lines depending on
# whether you want to remove only entire lines or text
# within a line:
my $remove = '^(' . join("|",@remove) . ')$';
#my $remove = join("|",@remove);

# now remove the unwanted text from all lines in $infile
my $infile = 'foo';
system('perl','-p','-i','-e',"s/$remove//g",$infile);

# if you want to delete matching lines, try this instead:
#system('perl','-n','-i','-e',"print unless /$remove/",$infile);

Answer

除了您所询问的问题之外，您的脚本还有一个巨大的缺陷，即它为“remove.txt”中的每一行完整地传递了“foo”。这是极其低效的。更好的方法是读入“remove.txt”，构造一个长正则表达式，然后使用它一次来编辑“foo”。

最简单的方法是将搜索字符串推入数组，然后使用“|”“join()”数组（正则表达式“或”）字符创建可用作正则表达式的字符串。

这是一个脚本，可以执行此操作并解决您原来的问题。

#! /usr/bin/perl 

use strict;
use warnings;

# first construct a regular expression containing every
# line that needs to be removed.  This is so we only have
# to run a single pass through $infile rather than one
# pass per line in $removefile.
my @remove = ();

my $removefile='remove.txt';
open(REMFILE,"<",$removefile) || die "couldn't open $removefile: $!\n";
while(<REMFILE>) {
    chomp;
    next if (/^\s*$/);
    push @remove, $_;
};
close(REMFILE);

# choose one of the following two lines depending on
# whether you want to remove only entire lines or text
# within a line:
my $remove = '^(' . join("|",@remove) . ')$';
#my $remove = join("|",@remove);

# now remove the unwanted text from all lines in $infile
my $infile = 'foo';
system('perl','-p','-i','-e',"s/$remove//g",$infile);

# if you want to delete matching lines, try this instead:
#system('perl','-n','-i','-e',"print unless /$remove/",$infile);

Question 2

您需要在中使用qq()和转义正则表达式元字符 ((和)) $bad_string。

            my $bad_string = "\\($line\\)";
            system( qq( perl -p -i -e 's/$bad_string//g' foo ) );

Answer

您需要在中使用qq()和转义正则表达式元字符 ((和)) $bad_string。

            my $bad_string = "\\($line\\)";
            system( qq( perl -p -i -e 's/$bad_string//g' foo ) );

Question 3

您的问题有 3 个要素：

构建“排除列表” - 请注意，排除列表中的“特殊”字符可能会导致问题。
读取您的文件，如果“匹配”则排除行。
写入你的新文件。

在你的问题中——我认为有一些事情我称之为“糟糕的风格”。

打开 3 个参数的词法文件句柄是很好的风格。
从内部调用system运行效率很低。 perlperl
引用插值是一种麻烦，最好避免
您正在重复地重新处理输出文件，这是非常低效的。（请记住 - 磁盘 IO 是您在系统上执行的最慢的操作）。

考虑到这一点，我会这样做：

#!/usr/bin/env perl
use strict;
use warnings;

my $infile = "remove.txt";
open( my $pattern_fh, '<', $infile ) or die "cannot open $infile $!";

#quotemeta escapes meta characters that'll break your pattern matching. 
my $regex = join( '|', map {quotemeta} <$pattern_fh> );
#compile the regex
$regex = qr/^($regex)$/;    #whole lines
close($input_fh);

print "Using regular expression: $regex\n"; 

open( my $input_fh,  '<', "foo" )     or die $!;
open( my $output_fh, '>', "foo.new" ) or die $!;

#tell print where to print by default. 
#could instead print {$output_fh} $_; 
select($output_fh);
while (<$input_fh>) {
    print unless m/$regex/;
}
close($input_fh);
close($output_fh);

#rename/copy if it worked

（注意：未经过详尽测试 - 如果您可以提供一些示例数据，我将根据需要进行测试/更新）

Answer

您的问题有 3 个要素：

构建“排除列表” - 请注意，排除列表中的“特殊”字符可能会导致问题。
读取您的文件，如果“匹配”则排除行。
写入你的新文件。

在你的问题中——我认为有一些事情我称之为“糟糕的风格”。

打开 3 个参数的词法文件句柄是很好的风格。
从内部调用system运行效率很低。 perlperl
引用插值是一种麻烦，最好避免
您正在重复地重新处理输出文件，这是非常低效的。（请记住 - 磁盘 IO 是您在系统上执行的最慢的操作）。

考虑到这一点，我会这样做：

#!/usr/bin/env perl
use strict;
use warnings;

my $infile = "remove.txt";
open( my $pattern_fh, '<', $infile ) or die "cannot open $infile $!";

#quotemeta escapes meta characters that'll break your pattern matching. 
my $regex = join( '|', map {quotemeta} <$pattern_fh> );
#compile the regex
$regex = qr/^($regex)$/;    #whole lines
close($input_fh);

print "Using regular expression: $regex\n"; 

open( my $input_fh,  '<', "foo" )     or die $!;
open( my $output_fh, '>', "foo.new" ) or die $!;

#tell print where to print by default. 
#could instead print {$output_fh} $_; 
select($output_fh);
while (<$input_fh>) {
    print unless m/$regex/;
}
close($input_fh);
close($output_fh);

#rename/copy if it worked

（注意：未经过详尽测试 - 如果您可以提供一些示例数据，我将根据需要进行测试/更新）

perl 脚本应从另一个文件中删除一个文件的行中的字符串的问题

答案1

答案2

答案3

相关内容