将行首的空格替换为“-”

Question 1

对于sed，您需要一个类似以下的循环：

sed -e :1 -e 's/^\( *\) /\1-/; t1' < file

或者做类似的事情：

sed '
s/ */&\
/; # add a newline after the leading spaces
h; # save a copy on the hold space
y/ /-/; # replace *every* space with -
G; # append our saved copy
s/\n.*\n//; # remove the superflous part' < file

使用perl，您可以执行以下操作：

perl -pe 's{^ *}{$& =~ y/ /-/r}e' < file

或者

perl -pe 's/(^|\G) /-/g' < file

\G在 PCRE 匹配（零宽度）中，在上一个匹配的末尾（在//g上下文中）。因此，在这里，我们将替换行首^或上一个匹配项末尾后面的空格（即先前替换的空格）。

（该方法也可以与sed支持 PCRE 的实现一起使用，例如ssed -R）。

使用awk，您可以执行以下操作：

awk '
  match($0, /^ +/) {
    space = substr($0, 1, RLENGTH)
    gsub(" ", "-", space)
    $0 = space substr($0, RLENGTH+1)
  }
  {print}' < file

如果您还想转换制表符（例如<space><tab>foo将转换为--------foo），您可以使用预处理输入expand。使用 GNU expand，您可以expand -i只转换行中前导空格中的制表符。您可以使用该选项指定制表位的间隔距离（默认为每 8 列）-t。

要将其推广到所有水平间距字符，或者至少是那些属于[:blank:]您所在区域设置类别的字符，情况会变得更加复杂。

如果没有 TAB 字符，这只是一个问题：

perl -Mopen=locale -MText::CharWidth=mbswidth -pe 's/^\h+/"-" x mbswidth($&)/e'

但 TAB 字符是控制字符的宽度为-1，mbswidth()而实际上它有宽度从 1 到 8 列不等，具体取决于它在线上的位置。

该expand命令负责将其扩展到正确的空格的数量，但是expand当存在多字节字符（例如 UTF-8 语言环境中除制表符和空格之外的所有空白字符）时，包括 GNU 在内的多种实现都无法正确处理，甚至某些支持多字节的实现字符可能会被零宽度或双宽度字符所欺骗（例如 U+3000，[:blank:]至少在典型的 GNU 语言环境中属于此类）。因此，必须手动进行 TAB 扩展，如下所示：

perl -Mopen=locale -MText::CharWidth=mbswidth -pe 's{^\h+}{
  $s = $&;
  while ($s =~ /(.*?)\t(.*)/) {
    $s = $1 . (" " x ((7-mbswidth($1)) % 8 + 1)) . $2;
  }
  "-" x mbswidth($s)}e'

Answer

对于sed，您需要一个类似以下的循环：

sed -e :1 -e 's/^\( *\) /\1-/; t1' < file

或者做类似的事情：

sed '
s/ */&\
/; # add a newline after the leading spaces
h; # save a copy on the hold space
y/ /-/; # replace *every* space with -
G; # append our saved copy
s/\n.*\n//; # remove the superflous part' < file

使用perl，您可以执行以下操作：

perl -pe 's{^ *}{$& =~ y/ /-/r}e' < file

或者

perl -pe 's/(^|\G) /-/g' < file

\G在 PCRE 匹配（零宽度）中，在上一个匹配的末尾（在//g上下文中）。因此，在这里，我们将替换行首^或上一个匹配项末尾后面的空格（即先前替换的空格）。

（该方法也可以与sed支持 PCRE 的实现一起使用，例如ssed -R）。

使用awk，您可以执行以下操作：

awk '
  match($0, /^ +/) {
    space = substr($0, 1, RLENGTH)
    gsub(" ", "-", space)
    $0 = space substr($0, RLENGTH+1)
  }
  {print}' < file

如果您还想转换制表符（例如<space><tab>foo将转换为--------foo），您可以使用预处理输入expand。使用 GNU expand，您可以expand -i只转换行中前导空格中的制表符。您可以使用该选项指定制表位的间隔距离（默认为每 8 列）-t。

要将其推广到所有水平间距字符，或者至少是那些属于[:blank:]您所在区域设置类别的字符，情况会变得更加复杂。

如果没有 TAB 字符，这只是一个问题：

perl -Mopen=locale -MText::CharWidth=mbswidth -pe 's/^\h+/"-" x mbswidth($&)/e'

但 TAB 字符是控制字符的宽度为-1，mbswidth()而实际上它有宽度从 1 到 8 列不等，具体取决于它在线上的位置。

该expand命令负责将其扩展到正确的空格的数量，但是expand当存在多字节字符（例如 UTF-8 语言环境中除制表符和空格之外的所有空白字符）时，包括 GNU 在内的多种实现都无法正确处理，甚至某些支持多字节的实现字符可能会被零宽度或双宽度字符所欺骗（例如 U+3000，[:blank:]至少在典型的 GNU 语言环境中属于此类）。因此，必须手动进行 TAB 扩展，如下所示：

perl -Mopen=locale -MText::CharWidth=mbswidth -pe 's{^\h+}{
  $s = $&;
  while ($s =~ /(.*?)\t(.*)/) {
    $s = $1 . (" " x ((7-mbswidth($1)) % 8 + 1)) . $2;
  }
  "-" x mbswidth($s)}e'

Question 2

Stephane 已经提供了正确的sed解决方案。这是一个小且更明确的 Python 3 替代方案：

#!/usr/bin/env python3
import sys

with open(sys.argv[1]) as f:
    for line in f:
        beginning = True
        for char in line:
            if beginning and char == " ":
                print("-",end="")
            else:
               beginning = False
               print(char,end="")

测试运行：

# This is the input text
$ cat -A input.txt
 wqdq$
 wqdqgrhehr$
 cnkzjncicoajc$
 hello space$
    oejwfoiwejfow$
    wqodojw$
    more spaces$
    more$
    $
 $
  $

# And this is the output with the given python script
$ ./add_dashes.py ./input.txt                                                                                            
-wqdq
-wqdqgrhehr
-cnkzjncicoajc
-hello space
----oejwfoiwejfow
----wqodojw
----more spaces
----more
----
-
--

Answer

Stephane 已经提供了正确的sed解决方案。这是一个小且更明确的 Python 3 替代方案：

#!/usr/bin/env python3
import sys

with open(sys.argv[1]) as f:
    for line in f:
        beginning = True
        for char in line:
            if beginning and char == " ":
                print("-",end="")
            else:
               beginning = False
               print(char,end="")

测试运行：

# This is the input text
$ cat -A input.txt
 wqdq$
 wqdqgrhehr$
 cnkzjncicoajc$
 hello space$
    oejwfoiwejfow$
    wqodojw$
    more spaces$
    more$
    $
 $
  $

# And this is the output with the given python script
$ ./add_dashes.py ./input.txt                                                                                            
-wqdq
-wqdqgrhehr
-cnkzjncicoajc
-hello space
----oejwfoiwejfow
----wqodojw
----more spaces
----more
----
-
--

Question 3

另一种awk方法：

awk 'match($0, /^[[:space:]]+/){ p=""; l=RLENGTH; while(l--) p=p"-";
     sub(/^[[:space:]]+/,p); print}' yourfile

输出：

-wqdq
-wqdqgrhehr
-cnkzjncicoajc
-hello space
----oejwfoiwejfow
----wqodojw
----more spaces
----more
----
-
--

match($0, /^[[:space:]]+/)- 匹配前导空格的序列

l=RLENGTH- 每行匹配序列的大小

while(l--) p=p"-"- 构造替换子字符串

选择Python3.x方法：

空格到连字符.py脚本：

import sys, re
with open(sys.argv[1], 'r') as f:  # reading input file
    for l in f.read().splitlines():
        m = re.match(r'^ +', l)    # capture sequence of leading spaces 
        print(l if not m else l.replace(' ', '-', m.end()))

用法：

python3 space_to_hyphen.py yourfile

Answer

另一种awk方法：

awk 'match($0, /^[[:space:]]+/){ p=""; l=RLENGTH; while(l--) p=p"-";
     sub(/^[[:space:]]+/,p); print}' yourfile

输出：

-wqdq
-wqdqgrhehr
-cnkzjncicoajc
-hello space
----oejwfoiwejfow
----wqodojw
----more spaces
----more
----
-
--

match($0, /^[[:space:]]+/)- 匹配前导空格的序列

l=RLENGTH- 每行匹配序列的大小

while(l--) p=p"-"- 构造替换子字符串

选择Python3.x方法：

空格到连字符.py脚本：

import sys, re
with open(sys.argv[1], 'r') as f:  # reading input file
    for l in f.read().splitlines():
        m = re.match(r'^ +', l)    # capture sequence of leading spaces 
        print(l if not m else l.replace(' ', '-', m.end()))

用法：

python3 space_to_hyphen.py yourfile

Question 4

在职的

我们设置一个do-while循环，并继续转换与第一个非空格相邻的最后一个空格，同时该行仍然有一个前导空格。

sed -e '
   :loop
      /^ /s/ \([^ ]\|$\)/-\1/
   tloop
' filename.ext


while IFS= read -r l; do
   read -r ll <<<"$(printf '%ss\n' "$l")"
   printf '%s%s\n' \
      "$(seq -s= 0 "$(expr "$l" : '[   ]*')" | tr = - | tr -cd -)" \
      "${ll%?}"
done < filename.ext

结果

-wqdq
-wqdqgrhehr
-cnkzjncicoajc
-hello space
----oejwfoiwejfow
----wqodojw
----more spaces
----more
----
-
--

工作原理

设置一个while循环来逐行读取文件，并将其IFS设置为NULL.这样做的目的是保留行中的所有空白。
接下来使用默认值对同一行进行虚拟读取IFS。这将剪掉任何前导空格。我们在末尾添加一个虚拟非换行符，以防止在命令扩展阶段尾随换行符崩溃。我们在印刷时将其剥离。
该expr命令的目的是查找匹配的数量，在我们的例子中，是该行前缘的空格。
使用这个数字，我们通过适当的设置seq和tr命令生成一系列破折号。
最后，我们将破折号与trimmed行一起打印，即通过默认 IFS 读入的行。

Answer

在职的

我们设置一个do-while循环，并继续转换与第一个非空格相邻的最后一个空格，同时该行仍然有一个前导空格。

sed -e '
   :loop
      /^ /s/ \([^ ]\|$\)/-\1/
   tloop
' filename.ext


while IFS= read -r l; do
   read -r ll <<<"$(printf '%ss\n' "$l")"
   printf '%s%s\n' \
      "$(seq -s= 0 "$(expr "$l" : '[   ]*')" | tr = - | tr -cd -)" \
      "${ll%?}"
done < filename.ext

结果

-wqdq
-wqdqgrhehr
-cnkzjncicoajc
-hello space
----oejwfoiwejfow
----wqodojw
----more spaces
----more
----
-
--

工作原理

设置一个while循环来逐行读取文件，并将其IFS设置为NULL.这样做的目的是保留行中的所有空白。
接下来使用默认值对同一行进行虚拟读取IFS。这将剪掉任何前导空格。我们在末尾添加一个虚拟非换行符，以防止在命令扩展阶段尾随换行符崩溃。我们在印刷时将其剥离。
该expr命令的目的是查找匹配的数量，在我们的例子中，是该行前缘的空格。
使用这个数字，我们通过适当的设置seq和tr命令生成一系列破折号。
最后，我们将破折号与trimmed行一起打印，即通过默认 IFS 读入的行。

将行首的空格替换为“-”

答案1

答案2

答案3

答案4

在职的

结果

工作原理

相关内容