使用 sed 替换前导制表符和空格

Question 1

我知道您说过您想使用sed，它通常是一种很棒的工具。但当有选择和循环时，我发现它awk比它更出色。

#!/usr/bin/gawk -f
{ while (/^\s/) {
    if (sub(/^ /,"")) printf "<space>";
    if (sub(/^\t/,"")) printf "<tab>";
    }
  print;
}

如果我们创建一个input.txt包含输入示例的文件，并将脚本命名为replace，它将按如下方式运行，从而产生所需的输出。

replace input.txt

更新：哎呀。该代码中有一个无限循环。该序列\s匹配[ \t\n\r\f\v]，因此如果有杂散的换页符，它将永远旋转。但[:blank:]只匹配空格和制表符，因此第二行应该是这样的。

{ while (/^[[:blank:]]/) {

Answer

我知道您说过您想使用sed，它通常是一种很棒的工具。但当有选择和循环时，我发现它awk比它更出色。

#!/usr/bin/gawk -f
{ while (/^\s/) {
    if (sub(/^ /,"")) printf "<space>";
    if (sub(/^\t/,"")) printf "<tab>";
    }
  print;
}

如果我们创建一个input.txt包含输入示例的文件，并将脚本命名为replace，它将按如下方式运行，从而产生所需的输出。

replace input.txt

更新：哎呀。该代码中有一个无限循环。该序列\s匹配[ \t\n\r\f\v]，因此如果有杂散的换页符，它将永远旋转。但[:blank:]只匹配空格和制表符，因此第二行应该是这样的。

{ while (/^[[:blank:]]/) {

Question 2

一个使用 sed 的解决方案是，它将行拆分为开头的制表符和空格与行的其余部分分开，以避免替换文本中的任何制表符和空格。

echo -e '\t\tline with\ttabs
  line with spaces
\t \tintermixed' | sed -r '

    # On the lines that start with tab or space.
    /^[\t ]/ {

        # Put the whole line in the hold space.
        h

        # Delete all tabs and spaces at the start of line.
        s/^[\t ]+//

        # Exchange pattern and hold spaces.
        # This saves the text part to the hold space and
        # bring back the original line to the pattern space.
        x

        # Now let in pattern space only tabs and spaces
        # at the start of line (the rest is on hold space).
        s/^([\t ]+).*/\1/

        # At least make the substitutions.
        s/\t/<TAB>/g
        s/ /<SPACE>/g

        # Add a \n (new line) at the end of pattern space,
        # then get the content of hold space and append it
        # to pattern space.   
        G

        # Delete the extra \n added above.
        s/\n//
    }'
<TAB><TAB>line with     tabs
<SPACE><SPACE>line with spaces
<TAB><SPACE><TAB>intermixed

Answer

一个使用 sed 的解决方案是，它将行拆分为开头的制表符和空格与行的其余部分分开，以避免替换文本中的任何制表符和空格。

echo -e '\t\tline with\ttabs
  line with spaces
\t \tintermixed' | sed -r '

    # On the lines that start with tab or space.
    /^[\t ]/ {

        # Put the whole line in the hold space.
        h

        # Delete all tabs and spaces at the start of line.
        s/^[\t ]+//

        # Exchange pattern and hold spaces.
        # This saves the text part to the hold space and
        # bring back the original line to the pattern space.
        x

        # Now let in pattern space only tabs and spaces
        # at the start of line (the rest is on hold space).
        s/^([\t ]+).*/\1/

        # At least make the substitutions.
        s/\t/<TAB>/g
        s/ /<SPACE>/g

        # Add a \n (new line) at the end of pattern space,
        # then get the content of hold space and append it
        # to pattern space.   
        G

        # Delete the extra \n added above.
        s/\n//
    }'
<TAB><TAB>line with     tabs
<SPACE><SPACE>line with spaces
<TAB><SPACE><TAB>intermixed

使用 sed 替换前导制表符和空格

答案1

答案2

相关内容