有没有办法使用 sed 进行多次替换，而不用链接替换？

Question 1

在这种问题中，您需要一个循环，以便可以同时搜索两种模式。

awk '
    BEGIN {
        regex = "A|B"
        map["A"] = "BB"
        map["B"] = "AA"
    }
    {
        str = $0
        result = ""
        while (match(str, regex)) {
            found = substr(str, RSTART, RLENGTH)
            result = result substr(str, 1, RSTART-1) map[found]
            str = substr(str, RSTART+RLENGTH)
        }
        print result str
    }
'

当然，如果 Perl 可用，则有一个等效的 oneliner：

perl -pe '
    BEGIN { %map = ("A" => "BB", "B" => "AA"); }
    s/(A|B)/$map{$1}/g;
'

如果没有任何模式包含特殊字符，您还可以动态构建正则表达式：

perl -pe '
    BEGIN {
        %map = ("A" => "BB", "B" => "AA");
        $regex = join "|", keys %map;
    }
    s/($regex)/$map{$1}/g;
'

顺便说一句，Tcl 有一个内置命令，称为string map，但编写 Tcl oneliners 并不容易。

演示按长度对键排序的效果：

不排序

$ echo ABBA | perl -pe '
    BEGIN {
        %map = (A => "X", BB => "Y", AB => "Z");
        $regex = join "|", map {quotemeta} keys %map;
        print $regex, "\n";
    }
    s/($regex)/$map{$1}/g
'

A|AB|BB
XYX

带排序

$ echo ABBA | perl -pe '
      BEGIN {
          %map = (A => "X", BB => "Y", AB => "Z");
          $regex = join "|", map {quotemeta $_->[1]}
                             reverse sort {$a->[0] <=> $b->[0]}
                             map {[length, $_]}
                             keys %map;
          print $regex, "\n";
      }
      s/($regex)/$map{$1}/g
  '

BB|AB|A
ZBX

在 Perl 中对“普通”排序与 Schwartzian 排序进行基准测试：子例程中的代码直接从sort文档

#!perl
use Benchmark   qw/ timethese cmpthese /;

# make up some key=value data
my $key='a';
for $x (1..10000) {
    push @unsorted,   $key++ . "=" . int(rand(32767));
}

# plain sorting: first by value then by key
sub nonSchwartzian {
    my @sorted = 
        sort { ($b =~ /=(\d+)/)[0] <=> ($a =~ /=(\d+)/)[0] || uc($a) cmp uc($b) } 
        @unsorted
}

# using the Schwartzian transform
sub schwartzian {
    my @sorted =
        map  { $_->[0] }
        sort { $b->[1] <=> $a->[1] || $a->[2] cmp $b->[2] }
        map  { [$_, /=(\d+)/, uc($_)] } 
        @unsorted
}

# ensure the subs sort the same way
die "different" unless join(",", nonSchwartzian()) eq join(",", schwartzian());

# benchmark
cmpthese(
    timethese(-10, {
        nonSchwartzian => 'nonSchwartzian()',
        schwartzian    => 'schwartzian()',
    })
);

运行它：

$ perl benchmark.pl
Benchmark: running nonSchwartzian, schwartzian for at least 10 CPU seconds...
nonSchwartzian: 11 wallclock secs (10.43 usr +  0.05 sys = 10.48 CPU) @  9.73/s (n=102)
schwartzian: 11 wallclock secs (10.13 usr +  0.03 sys = 10.16 CPU) @ 49.11/s (n=499)
                 Rate nonSchwartzian    schwartzian
nonSchwartzian 9.73/s             --           -80%
schwartzian    49.1/s           405%             --

使用 Schwartzian 变换的代码速度快了 4 倍。

其中比较函数是仅有的 length每个元素的：

Benchmark: running nonSchwartzian, schwartzian for at least 10 CPU seconds...
nonSchwartzian: 11 wallclock secs (10.06 usr +  0.03 sys = 10.09 CPU) @ 542.52/s (n=5474)
schwartzian: 10 wallclock secs (10.21 usr +  0.02 sys = 10.23 CPU) @ 191.50/s (n=1959)
                Rate    schwartzian nonSchwartzian
schwartzian    191/s             --           -65%
nonSchwartzian 543/s           183%             --

Schwartzian 使用这种廉价的排序函数要慢得多。

我们现在可以摆脱谩骂的评论了吗？

Answer

在这种问题中，您需要一个循环，以便可以同时搜索两种模式。

awk '
    BEGIN {
        regex = "A|B"
        map["A"] = "BB"
        map["B"] = "AA"
    }
    {
        str = $0
        result = ""
        while (match(str, regex)) {
            found = substr(str, RSTART, RLENGTH)
            result = result substr(str, 1, RSTART-1) map[found]
            str = substr(str, RSTART+RLENGTH)
        }
        print result str
    }
'

当然，如果 Perl 可用，则有一个等效的 oneliner：

perl -pe '
    BEGIN { %map = ("A" => "BB", "B" => "AA"); }
    s/(A|B)/$map{$1}/g;
'

如果没有任何模式包含特殊字符，您还可以动态构建正则表达式：

perl -pe '
    BEGIN {
        %map = ("A" => "BB", "B" => "AA");
        $regex = join "|", keys %map;
    }
    s/($regex)/$map{$1}/g;
'

顺便说一句，Tcl 有一个内置命令，称为string map，但编写 Tcl oneliners 并不容易。

演示按长度对键排序的效果：

不排序

$ echo ABBA | perl -pe '
    BEGIN {
        %map = (A => "X", BB => "Y", AB => "Z");
        $regex = join "|", map {quotemeta} keys %map;
        print $regex, "\n";
    }
    s/($regex)/$map{$1}/g
'

A|AB|BB
XYX

带排序

$ echo ABBA | perl -pe '
      BEGIN {
          %map = (A => "X", BB => "Y", AB => "Z");
          $regex = join "|", map {quotemeta $_->[1]}
                             reverse sort {$a->[0] <=> $b->[0]}
                             map {[length, $_]}
                             keys %map;
          print $regex, "\n";
      }
      s/($regex)/$map{$1}/g
  '

BB|AB|A
ZBX

在 Perl 中对“普通”排序与 Schwartzian 排序进行基准测试：子例程中的代码直接从sort文档

#!perl
use Benchmark   qw/ timethese cmpthese /;

# make up some key=value data
my $key='a';
for $x (1..10000) {
    push @unsorted,   $key++ . "=" . int(rand(32767));
}

# plain sorting: first by value then by key
sub nonSchwartzian {
    my @sorted = 
        sort { ($b =~ /=(\d+)/)[0] <=> ($a =~ /=(\d+)/)[0] || uc($a) cmp uc($b) } 
        @unsorted
}

# using the Schwartzian transform
sub schwartzian {
    my @sorted =
        map  { $_->[0] }
        sort { $b->[1] <=> $a->[1] || $a->[2] cmp $b->[2] }
        map  { [$_, /=(\d+)/, uc($_)] } 
        @unsorted
}

# ensure the subs sort the same way
die "different" unless join(",", nonSchwartzian()) eq join(",", schwartzian());

# benchmark
cmpthese(
    timethese(-10, {
        nonSchwartzian => 'nonSchwartzian()',
        schwartzian    => 'schwartzian()',
    })
);

运行它：

$ perl benchmark.pl
Benchmark: running nonSchwartzian, schwartzian for at least 10 CPU seconds...
nonSchwartzian: 11 wallclock secs (10.43 usr +  0.05 sys = 10.48 CPU) @  9.73/s (n=102)
schwartzian: 11 wallclock secs (10.13 usr +  0.03 sys = 10.16 CPU) @ 49.11/s (n=499)
                 Rate nonSchwartzian    schwartzian
nonSchwartzian 9.73/s             --           -80%
schwartzian    49.1/s           405%             --

使用 Schwartzian 变换的代码速度快了 4 倍。

其中比较函数是仅有的 length每个元素的：

Benchmark: running nonSchwartzian, schwartzian for at least 10 CPU seconds...
nonSchwartzian: 11 wallclock secs (10.06 usr +  0.03 sys = 10.09 CPU) @ 542.52/s (n=5474)
schwartzian: 10 wallclock secs (10.21 usr +  0.02 sys = 10.23 CPU) @ 191.50/s (n=1959)
                Rate    schwartzian nonSchwartzian
schwartzian    191/s             --           -65%
nonSchwartzian 543/s           183%             --

Schwartzian 使用这种廉价的排序函数要慢得多。

我们现在可以摆脱谩骂的评论了吗？

Question 2

您无法通过中的单个替换来完成整个操作sed，但您可以根据两个子字符串A和B是单个字符还是较长的字符串，以不同的方式正确完成整个操作。

假设两个子字符串A和B是单个字符......

你想变身AYB为BBYAA.去做这个，

将每个更改A为B和B使用A.y/AB/BA/
A将新字符串中的每个替换为AAusing s/A/AA/g。
B将新字符串中的每个替换为BBusing s/B/BB/g。

$ echo AYB | sed 'y/AB/BA/; s/B/BB/g; s/A/AA/g'
BBYAA

结合最后两个步骤得到

$ echo AYB | sed 'y/AB/BA/; s/[AB]/&&/g'
BBYAA

事实上，这里操作的顺序并不重要：

$ echo AYB | sed 's/[AB]/&&/g; y/AB/BA/'
BBYAA

编辑sed命令y///将其第一个参数中的字符转换为其第二个参数中的相应字符，有点像实用tr程序所做的那样。这是在单个操作中完成的，因此您不需要使用临时文件来交换A和Bin y/AB/BA/。一般来说，y///是很多翻译单个字符比 eg 更快s///g（因为不涉及正则表达式），并且它还能够使用向字符串中插入换行符\n，这是标准s///命令无法做到的（s///在 GNU 中sed显然可以这样做，因为这是一种不可移植的便利）扩大）。

&命令替换部分中的字符将s///被第一个参数中匹配的任何表达式替换，因此会将输入数据中的任何or字符s/[AB]/&&/g加倍。AB

对于多字符子字符串，假设子字符串是不同的（即在另一个子字符串中找不到一个子字符串，如ooand的情况foo），请使用类似

$ echo fooxbar | sed 's/foo/@/g; s/bar/foofoo/g; s/@/barbar/g'
barbarxfoofoo

即，通过数据中未找到的中间字符串交换两个字符串。请注意，中间字符串可以是数据中未找到的任何字符串，而不仅仅是单个字符。

Answer