使用字典替换字符串

Question 1

这里有一种方法sed：

sed '
s|"\(.*\)"[[:blank:]]*:[[:blank:]]*"\(.*\)"|\1\
\2|
h
s|.*\n||
s|[\&/]|\\&|g
x
s|\n.*||
s|[[\.*^$/]|\\&|g
G
s|\(.*\)\n\(.*\)|s/\1/\2/g|
' dictionary.txt | sed -f - novel.txt

它是如何工作的：
第一个sed变成dictionary.txt一个脚本文件（编辑命令，每行一个）。这通过管道传输到执行这些命令的第二个sed（注意，这-f -意味着从读取命令stdin），编辑novel.txt。
这需要翻译您的格式

"STRING"   :   "REPLACEMENT"

进入sed命令并转义过程中的所有特殊字符LHS和RHS：

s/ESCAPED_STRING/ESCAPED_REPLACEMENT/g

所以第一次替换

s|"\(.*\)"[[:blank:]]*:[[:blank:]]*"\(.*\)"|\1\
\2|

变成(是换行符) "STRING" : "REPLACEMENT"。然后将结果复制到旧空间。删除第一部分，仅保留然后转义保留字符（这是）。然后，它使用模式空间更改保持缓冲区，并删除仅保留的第二部分并进行转义（这是）。然后，保持缓冲区的内容通过附加到模式空间，因此现在模式空间内容为。最终换人STRING\nREPLACEMENT\nh
s|.*\n||REPLACEMENTs|[\&/]|\\&|gRHS
xs|\n.*||STRINGs|[[\.*^$/]|\\&|gLHS
GESCAPED_STRING\nESCAPED_REPLACEMENT

s|\(.*\)\n\(.*\)|s/\1/\2/g|

将其转化为s/ESCAPED_STRING/ESCAPED_REPLACEMENT/g

Answer

这里有一种方法sed：

sed '
s|"\(.*\)"[[:blank:]]*:[[:blank:]]*"\(.*\)"|\1\
\2|
h
s|.*\n||
s|[\&/]|\\&|g
x
s|\n.*||
s|[[\.*^$/]|\\&|g
G
s|\(.*\)\n\(.*\)|s/\1/\2/g|
' dictionary.txt | sed -f - novel.txt

它是如何工作的：
第一个sed变成dictionary.txt一个脚本文件（编辑命令，每行一个）。这通过管道传输到执行这些命令的第二个sed（注意，这-f -意味着从读取命令stdin），编辑novel.txt。
这需要翻译您的格式

"STRING"   :   "REPLACEMENT"

进入sed命令并转义过程中的所有特殊字符LHS和RHS：

s/ESCAPED_STRING/ESCAPED_REPLACEMENT/g

所以第一次替换

s|"\(.*\)"[[:blank:]]*:[[:blank:]]*"\(.*\)"|\1\
\2|

变成(是换行符) "STRING" : "REPLACEMENT"。然后将结果复制到旧空间。删除第一部分，仅保留然后转义保留字符（这是）。然后，它使用模式空间更改保持缓冲区，并删除仅保留的第二部分并进行转义（这是）。然后，保持缓冲区的内容通过附加到模式空间，因此现在模式空间内容为。最终换人STRING\nREPLACEMENT\nh
s|.*\n||REPLACEMENTs|[\&/]|\\&|gRHS
xs|\n.*||STRINGs|[[\.*^$/]|\\&|gLHS
GESCAPED_STRING\nESCAPED_REPLACEMENT

s|\(.*\)\n\(.*\)|s/\1/\2/g|

将其转化为s/ESCAPED_STRING/ESCAPED_REPLACEMENT/g

Question 2

这是一个 perl 版本。它创建一个包含预编译正则表达式的哈希，然后循环遍历每行输入，将所有正则表达式应用到每一行。 perl's-i用于输入文件的“就地编辑”。您可以轻松添加或更改任何正则表达式或替换字符串。

使用预编译正则表达式qr//大大提高了脚本的速度，如果有大量正则表达式和/或大量输入行需要处理，这将非常明显。

#! /usr/bin/perl -i

use strict;

# the dictionary is embedded in the code itself.
# see 2nd version below for how to read dict in
# from a file.
my %regex = (
    qr/yes/      => 'no',
    qr/stop/     => 'go, go, go!',
    qr/wee-ooo/  => 'ooooh nooo!',
    qr/gooodbye/ => 'hello',
    qr/high/     => 'low',
    qr/why\?/    => 'i don\'t know',
);

while (<>) {
      foreach my $key (keys %regex) {
            s/$key/$regex{$key}/g;
      }
}

这是另一个版本，它从命令行上的第一个文件名读取字典，同时仍然处理第二个（以及可选的后续）文件名：

#! /usr/bin/perl -i

use strict;

# the dictionary is read from a file.
#
# file format is "searchpattern replacestring", with any
# number of whitespace characters (space or tab) separating
# the two fields.  You can add comments or comment out dictionary
# entries with a '#' character.
#
# NOTE: if you want to use any regex-special characters as a
# literal in either $searchpattern or $replacestring, you WILL
# need to escape them with `\`.  e.g. for a literal '?', use '\?'.
#
# this is very basic and could be improved.  a lot.

my %regex = ();

my $dictfile = shift ;
open(DICT,'<',$dictfile) || die "couldn't open $dictfile: $!\n";
while(<DICT>) {
    s/#.*// unless (m/\\#/); # remove comments, unless escaped.
                             # easily fooled if there is an escaped 
                             # '#' and a comment on the same line.

    s/^\s*|\s*$//g ;         # remove leading & trailing spaces
    next if (/^$/) ;         # skip empty lines

    my($search, $replace) = split;
    $regex{qr/$search/} = $replace;
};
close(DICT);


# now read in the input file(s) and modify them.
while (<>) {
      foreach my $key (keys %regex) {
            s/$key/$regex{$key}/g;
      }
}

Answer

这是一个 perl 版本。它创建一个包含预编译正则表达式的哈希，然后循环遍历每行输入，将所有正则表达式应用到每一行。 perl's-i用于输入文件的“就地编辑”。您可以轻松添加或更改任何正则表达式或替换字符串。

使用预编译正则表达式qr//大大提高了脚本的速度，如果有大量正则表达式和/或大量输入行需要处理，这将非常明显。

#! /usr/bin/perl -i

use strict;

# the dictionary is embedded in the code itself.
# see 2nd version below for how to read dict in
# from a file.
my %regex = (
    qr/yes/      => 'no',
    qr/stop/     => 'go, go, go!',
    qr/wee-ooo/  => 'ooooh nooo!',
    qr/gooodbye/ => 'hello',
    qr/high/     => 'low',
    qr/why\?/    => 'i don\'t know',
);

while (<>) {
      foreach my $key (keys %regex) {
            s/$key/$regex{$key}/g;
      }
}

这是另一个版本，它从命令行上的第一个文件名读取字典，同时仍然处理第二个（以及可选的后续）文件名：

#! /usr/bin/perl -i

use strict;

# the dictionary is read from a file.
#
# file format is "searchpattern replacestring", with any
# number of whitespace characters (space or tab) separating
# the two fields.  You can add comments or comment out dictionary
# entries with a '#' character.
#
# NOTE: if you want to use any regex-special characters as a
# literal in either $searchpattern or $replacestring, you WILL
# need to escape them with `\`.  e.g. for a literal '?', use '\?'.
#
# this is very basic and could be improved.  a lot.

my %regex = ();

my $dictfile = shift ;
open(DICT,'<',$dictfile) || die "couldn't open $dictfile: $!\n";
while(<DICT>) {
    s/#.*// unless (m/\\#/); # remove comments, unless escaped.
                             # easily fooled if there is an escaped 
                             # '#' and a comment on the same line.

    s/^\s*|\s*$//g ;         # remove leading & trailing spaces
    next if (/^$/) ;         # skip empty lines

    my($search, $replace) = split;
    $regex{qr/$search/} = $replace;
};
close(DICT);


# now read in the input file(s) and modify them.
while (<>) {
      foreach my $key (keys %regex) {
            s/$key/$regex{$key}/g;
      }
}

Question 3

开始将其写为评论，但它变得太复杂，因此有第二个 perl 答案。给定你的源文件，你可以使用一个简洁的 perl 技巧来构建一个正则表达式：

#!/usr/bin/env perl

use strict;
use warnings; 
use Data::Dumper;

#build key-value pairs
my %replace = map { /"(.+)"\s*:\s*"(.+)"/ } <DATA>;
print Dumper \%replace; 

#take the keys of your hash, then build into capturing regex
my $search = join ( "|", map {quotemeta} keys %replace ); 
$search = qr/($search)/;

print "Using match regex of: $search\n";

#read stdin or files on command line, line by line
while ( <> ) { 
    #match regex repeatedly, replace with contents of hash. 
    s/$search/$replace{$1}/g;
    print;
}

__DATA__
"yes"      : "no"
"stop"     : "go, go, go!"
"wee-ooo"  : "ooooh nooo!"
"gooodbye" : "hello"

"high"     : "low"
"why?"     : "i don't know"

我们使用多行模式匹配生成哈希并map创建键值对。

我们构建一个搜索正则表达式，并使用其中捕获的值进行替换。

使用的<>是 perl 的神奇文件句柄 -STDIN或在命令行上指定的文件。 sed 是如何做到这一点的。（您可以使用一个文件并“正常”读取它的模式，使用DATA纯粹是说明性的）。

Answer

开始将其写为评论，但它变得太复杂，因此有第二个 perl 答案。给定你的源文件，你可以使用一个简洁的 perl 技巧来构建一个正则表达式：

#!/usr/bin/env perl

use strict;
use warnings; 
use Data::Dumper;

#build key-value pairs
my %replace = map { /"(.+)"\s*:\s*"(.+)"/ } <DATA>;
print Dumper \%replace; 

#take the keys of your hash, then build into capturing regex
my $search = join ( "|", map {quotemeta} keys %replace ); 
$search = qr/($search)/;

print "Using match regex of: $search\n";

#read stdin or files on command line, line by line
while ( <> ) { 
    #match regex repeatedly, replace with contents of hash. 
    s/$search/$replace{$1}/g;
    print;
}

__DATA__
"yes"      : "no"
"stop"     : "go, go, go!"
"wee-ooo"  : "ooooh nooo!"
"gooodbye" : "hello"

"high"     : "low"
"why?"     : "i don't know"

我们使用多行模式匹配生成哈希并map创建键值对。

我们构建一个搜索正则表达式，并使用其中捕获的值进行替换。

使用的<>是 perl 的神奇文件句柄 -STDIN或在命令行上指定的文件。 sed 是如何做到这一点的。（您可以使用一个文件并“正常”读取它的模式，使用DATA纯粹是说明性的）。

使用字典替换字符串

答案1

答案2

答案3

相关内容