仅对子字符串进行更改操作

Question 1

perl -CSD -ne '
    if (my ($before, $between, $after) = /^(.*START)(.*)(END.*)/) {
        s/_this_//, s/modi/MODI/, tr/as/45/ for $between;
        print "$before$between$after\n";
    } else { print; }' -- file

-CSD将输入从 UTF-8 解码并将输出编码为 UTF-8

我们可以使用and来代替填充三个变量$before, $between, 和，但我没有找到更好的解决方案： $after/p${^PREMATCH}${^POSTMATCH}

if (my ($s) = /START(.*)END/p) {
    s/_this_//, s/modi/MODI/, tr/as/45/ for $s;
    print "${^PREMATCH}START${s}END${^POSTMATCH}";
} else { print; }

如果 START...END 部分可以在一行上重复，则需要循环每一行。

for my $part (split /(START.*?END)/) {
    if ($part =~ /^START.*END$/) {
        s/_this_//, s/modi/MODI/, tr/as/45/ for $part;
    }
    print "$part";
}

Answer

perl -CSD -ne '
    if (my ($before, $between, $after) = /^(.*START)(.*)(END.*)/) {
        s/_this_//, s/modi/MODI/, tr/as/45/ for $between;
        print "$before$between$after\n";
    } else { print; }' -- file

-CSD将输入从 UTF-8 解码并将输出编码为 UTF-8

我们可以使用and来代替填充三个变量$before, $between, 和，但我没有找到更好的解决方案： $after/p${^PREMATCH}${^POSTMATCH}

if (my ($s) = /START(.*)END/p) {
    s/_this_//, s/modi/MODI/, tr/as/45/ for $s;
    print "${^PREMATCH}START${s}END${^POSTMATCH}";
} else { print; }

如果 START...END 部分可以在一行上重复，则需要循环每一行。

for my $part (split /(START.*?END)/) {
    if ($part =~ /^START.*END$/) {
        s/_this_//, s/modi/MODI/, tr/as/45/ for $part;
    }
    print "$part";
}

Question 2

使用标准sed并假设每一行恰好包含一个START和一个END子字符串（按该顺序）：

# Skip (pass through) lines that does not have START followed by END.
/.*START\(.*\)END.*/ !b

# Save the original line in the hold space.
h

# Remove the start and the end from the line.
# This leaves the bit of the line that we want to modify.
# (This reuses the previous regular expression.)
s//\1/

# Modify what's left.
s/_this_//
s/modi/MODI/
y/as/45/

# Append the original line from the hold space,
# with a newline as delimiter.
G

# Move the modified bit into the correct spot with a substitution,
# while deleting the old substring between START and END.
s/\(.*\)\n\(.*START\).*\(END.*\)/\2\1\3/

测试：

$ cat file
aomodi3hriq32| ¶³r 0q93aoiSTART_this_is_to_be_modified_ENDaqsdofuha23uru| ²23i ii3uhfia
oawpo3<9"§ A hSTART_this_also_needs_modification_ENDqa 032/a237(°1Q"§ >A_this_
START changeme ENDnot_this_modias

$ sed -f script file
aomodi3hriq32| ¶³r 0q93aoiSTARTi5_to_be_MODIfied_ENDaqsdofuha23uru| ²23i ii3uhfia
oawpo3<9"§ A hSTART4l5o_need5_MODIfic4tion_ENDqa 032/a237(°1Q"§ >A_this_
START ch4ngeme ENDnot_this_modias

内联，在命令行上：

sed -e '/.*START\(.*\)END.*/!b' -e h -e 's//\1/' \
    -e 's/_this_//' -e 's/modi/MODI/' -e 'y/as/45/' \
    -e G -e 's/\(.*\)\n\(.*START\).*\(END.*\)/\2\1\3/' file

Answer

使用标准sed并假设每一行恰好包含一个START和一个END子字符串（按该顺序）：

# Skip (pass through) lines that does not have START followed by END.
/.*START\(.*\)END.*/ !b

# Save the original line in the hold space.
h

# Remove the start and the end from the line.
# This leaves the bit of the line that we want to modify.
# (This reuses the previous regular expression.)
s//\1/

# Modify what's left.
s/_this_//
s/modi/MODI/
y/as/45/

# Append the original line from the hold space,
# with a newline as delimiter.
G

# Move the modified bit into the correct spot with a substitution,
# while deleting the old substring between START and END.
s/\(.*\)\n\(.*START\).*\(END.*\)/\2\1\3/

测试：

$ cat file
aomodi3hriq32| ¶³r 0q93aoiSTART_this_is_to_be_modified_ENDaqsdofuha23uru| ²23i ii3uhfia
oawpo3<9"§ A hSTART_this_also_needs_modification_ENDqa 032/a237(°1Q"§ >A_this_
START changeme ENDnot_this_modias

$ sed -f script file
aomodi3hriq32| ¶³r 0q93aoiSTARTi5_to_be_MODIfied_ENDaqsdofuha23uru| ²23i ii3uhfia
oawpo3<9"§ A hSTART4l5o_need5_MODIfic4tion_ENDqa 032/a237(°1Q"§ >A_this_
START ch4ngeme ENDnot_this_modias

内联，在命令行上：

sed -e '/.*START\(.*\)END.*/!b' -e h -e 's//\1/' \
    -e 's/_this_//' -e 's/modi/MODI/' -e 'y/as/45/' \
    -e G -e 's/\(.*\)\n\(.*START\).*\(END.*\)/\2\1\3/' file

Question 3

您始终可以构建自己的多个 OFS：

awk -v FS='START|END' -v OFS= -v map='_this_\r\rmodi\rMODI\ra\r4\rs\r5' '
  BEGIN{ split(FS, mOFS, "|") }
  { n=split(map, tr, "\r"); for(i=1; i<n; i+=2) gsub(tr[i], tr[i+1], $2);
  print $1, mOFS[1], $2, mOFS[2], $3
}' infile

请注意，gsub() 的第一个参数是正则表达式，因此在定义时要小心map=....;右手映射也不应该有一些特殊字符，例如&Ì back-references\1等；然而，当您手动编写映射时，您可以转义任何特殊字符，以避免它们被 gsub() 专门解释。

我使用 CR\r来分隔映射，正如您提到的，这是输入文件中唯一不存在的东西，除此之外，\0它不能在 split() 和 awk 中的其他函数（或者也可能在其他编程语言中）中使用，因为 awk 会只考虑\0字符串中最多可以存在一个。因此，每个左侧正则表达式tr[i]（此处为字符串）都将替换为数组tr[i+1]中的下一个右侧正则表达式tr。

使用这种方法可以让您不必为每一对编写多个 gsub() 。

Answer