awk

Question 1

awk

这些方法对每对行（1 和 2；3 和 4；等等）重复，处理与#每对第一行中的字符一样多的字符，并假设每对的两行长度相同。

与 GNU awk (Linux) 和 BSD awk (Mac) 兼容。

使用子字符串：

awk '{ a=$0 ; gsub(/#/,"",$0) ; print $0 ; getline ; for (n=1;n<=length(a);n++) if ( substr(a,n,1) != "#" ) printf "%s",substr($0,n,1) ; printf "%s",RS }' file.txt

相同的代码，针对更窄的屏幕重新格式化：

awk '{
  a=$0 ;
  gsub(/#/,"",$0) ;
  print $0 ;
  getline ;
  for (n=1;n<=length(a);n++)
    if ( substr(a,n,1) != "#" )
      printf "%s",substr($0,n,1) ;
  printf "%s",RS
  }' file.txt

a=$0
保存第一行的副本。
gsub(/#/,"",$0) ; print $0
删除#第一行中的所有内容（不是从副本中），然后打印修改后的第一行。
getline
转到下一行。
for (n=1;n<=length(a);n++)
逐步浏览第一行副本的每个字符。
- if ( substr(a,n,1) != "#" )
  如果这个单字符子字符串不是#，...
  - printf "%s",substr($0,n,1)
    …然后打印第二行相应位置的字符。
printf "%s",RS
以换行符结束第二行。

使用数组：

awk '{ c=d="" ; elements=split($0,a,"") ; getline ; split($0,b,"") ; for (n=1;n<=elements;n++) if (a[n]!="#") { c = c a[n] ; d = d b[n] } ; print c ; print d }' file.txt

针对更窄的屏幕重新格式化：

awk '{
  c=d="" ;
  elements=split($0,a,"") ;
  getline ;
  split($0,b,"") ;
  for (n=1;n<=elements;n++)
    if (a[n]!="#")
      { c = c a[n] ; d = d b[n] } ;
  print c ;
  print d
  }' file.txt

c=d=""
初始化两个空字符串。这些将成为两行输入的修改版本。如果输入的行数超过两行，则此步骤是必要的。
elements=split($0,a,"")
将第一行输入转换为数组，每个数组元素一个字符。将数组元素的数量存储为变量elements。
getline
转到下一行。
split($0,b,"")
将第二行输入转换为数组，每个数组元素一个字符。
for (n=1;n<=elements;n++)
逐步遍历第一行数组的每个元素。
- if (a[n]!="#")
  如果这个单字符数组元素不是#，...
  - { c = c a[n] ; d = d b[n] }
    ...然后，对于两行中的每一行，保留位置中的字符n。
print c ; print d
打印这两行的新版本。

警告：Mac (BSD) 版本的 awk 不会自动按数字顺序处理数组元素。这最初给了我令人惊讶的结果。

“for (indx in array)”循环遍历数组的顺序在 POSIX awk 中未定义，并且因实现而异。 gawk 允许您通过向 PROCINFO["sorted_in"] 分配特殊的预定义值来控制顺序。

–GNU Awk 用户指南

元素1,2,3,...在创建时仍然用进行编号split，如 GNU awk 中一样，但 BSD awk 在使用时不一定按该顺序看到它们for (n in array)。因此，你会得到乱码。

为了解决这个问题，您可以在创建数组时存储数组的长度（元素数量） - 例如elements=split($0,a,"")- 然后使用迭代元素for (n=1;n<=elements;n++)，就像我在这里所做的那样。

输入示例 ( file.txt)：

abcdb#lae#blabl#a
abc~bola~xblabl~a
#alpha#beta#gamma#delta#epsilon#
abcdefghijklmnopqrstuvwxyzabcdef

输出示例：

abcdblaeblabla
abc~bla~blabla
alphabetagammadeltaepsilon
bcdefhijkmnopqstuvwyzabcde

Answer

awk

这些方法对每对行（1 和 2；3 和 4；等等）重复，处理与#每对第一行中的字符一样多的字符，并假设每对的两行长度相同。

与 GNU awk (Linux) 和 BSD awk (Mac) 兼容。

使用子字符串：

awk '{ a=$0 ; gsub(/#/,"",$0) ; print $0 ; getline ; for (n=1;n<=length(a);n++) if ( substr(a,n,1) != "#" ) printf "%s",substr($0,n,1) ; printf "%s",RS }' file.txt

相同的代码，针对更窄的屏幕重新格式化：

awk '{
  a=$0 ;
  gsub(/#/,"",$0) ;
  print $0 ;
  getline ;
  for (n=1;n<=length(a);n++)
    if ( substr(a,n,1) != "#" )
      printf "%s",substr($0,n,1) ;
  printf "%s",RS
  }' file.txt

a=$0
保存第一行的副本。
gsub(/#/,"",$0) ; print $0
删除#第一行中的所有内容（不是从副本中），然后打印修改后的第一行。
getline
转到下一行。
for (n=1;n<=length(a);n++)
逐步浏览第一行副本的每个字符。
- if ( substr(a,n,1) != "#" )
  如果这个单字符子字符串不是#，...
  - printf "%s",substr($0,n,1)
    …然后打印第二行相应位置的字符。
printf "%s",RS
以换行符结束第二行。

使用数组：

awk '{ c=d="" ; elements=split($0,a,"") ; getline ; split($0,b,"") ; for (n=1;n<=elements;n++) if (a[n]!="#") { c = c a[n] ; d = d b[n] } ; print c ; print d }' file.txt

针对更窄的屏幕重新格式化：

awk '{
  c=d="" ;
  elements=split($0,a,"") ;
  getline ;
  split($0,b,"") ;
  for (n=1;n<=elements;n++)
    if (a[n]!="#")
      { c = c a[n] ; d = d b[n] } ;
  print c ;
  print d
  }' file.txt

c=d=""
初始化两个空字符串。这些将成为两行输入的修改版本。如果输入的行数超过两行，则此步骤是必要的。
elements=split($0,a,"")
将第一行输入转换为数组，每个数组元素一个字符。将数组元素的数量存储为变量elements。
getline
转到下一行。
split($0,b,"")
将第二行输入转换为数组，每个数组元素一个字符。
for (n=1;n<=elements;n++)
逐步遍历第一行数组的每个元素。
- if (a[n]!="#")
  如果这个单字符数组元素不是#，...
  - { c = c a[n] ; d = d b[n] }
    ...然后，对于两行中的每一行，保留位置中的字符n。
print c ; print d
打印这两行的新版本。

警告：Mac (BSD) 版本的 awk 不会自动按数字顺序处理数组元素。这最初给了我令人惊讶的结果。

“for (indx in array)”循环遍历数组的顺序在 POSIX awk 中未定义，并且因实现而异。 gawk 允许您通过向 PROCINFO["sorted_in"] 分配特殊的预定义值来控制顺序。

–GNU Awk 用户指南

元素1,2,3,...在创建时仍然用进行编号split，如 GNU awk 中一样，但 BSD awk 在使用时不一定按该顺序看到它们for (n in array)。因此，你会得到乱码。

为了解决这个问题，您可以在创建数组时存储数组的长度（元素数量） - 例如elements=split($0,a,"")- 然后使用迭代元素for (n=1;n<=elements;n++)，就像我在这里所做的那样。

输入示例 ( file.txt)：

abcdb#lae#blabl#a
abc~bola~xblabl~a
#alpha#beta#gamma#delta#epsilon#
abcdefghijklmnopqrstuvwxyzabcdef

输出示例：

abcdblaeblabla
abc~bla~blabla
alphabetagammadeltaepsilon
bcdefhijkmnopqstuvwyzabcde

Question 2

您可以通过以下方式使用 sed 来完成此操作。将两条线都带入图案空间后，在两条线的开头放置两个标记。

然后开始将它们一次向右移动一个字符。在此移动过程中，请注意标记右侧的内容并采取相应的行动。

当标记到达图案空间的末尾时停止。现在，当标记的工作完成后，将其拿走，剩下的就是您想要的。注意标记是\n

 sed -Ee '
   /#/N;/\n/!b
   s/\n/&&/;s/^/\n/
   :a
       /\n#(.*\n.*\n)./{
          s//\n\1/;ba
       }
      s/\n(.)(.*\n.*)\n(.)/\1\n\2\3\n/
   /\n$/!ba
   s/\n//;s///2
'    input

使用 Perl 可以按照以下思路解决：

 perl -pe  ' 
     next unless /#/;

     my($n,$p) = (scalar <>);

     while ( /#/g ) {
        pos($n) = pos() - 1 - $p++;
        $n =~ s/\G.//;
     }

     y/#//d;s/\z/$n/;
'      input_file

在职的：

1. Skip lines that donot have hash char.
 2. Save the next line in $n and init. $p counter which keeps track of the number of hash chars erased till now.
3.  Monitor the position of the hash char in a while loop and using info generate the position of the char to be deleted in next line.
4.  Erase it using the \G metachar in s///
5.  In the final step remove the hash chars from present line and append the next line to it.

显示了另一种方法，这次使用数组：

perl -aF'' -ne '
    print,next unless /#/;
    print,last if eof;

    my @I = grep { $F[$_] ne "#" } 0 .. $#F;
    my @N = split //, <>;

    print @F[@I], @N[@I];
'    input_file

在职的：

1. Invoke Perl to split each line on a per character basis and have it stored in the array @F anew for every line read.
2.  Record the array indices for which the array element is a non hash character.
3.  Readin the next line, split it on a per character basis and store in array @N.
4. Now its a matter of selecting the indices we stored in @I and fetch those from arrays @F and @N.

正则表达式的方法：

perl -pe '
   $_ .= <> unless eof;

    s/\G.(.*\n.{@{[+pos]}})./$1/ while /(?=#.*\n.)/g;
'        input_file

描述：

° 将下一行追加到当前行，只要它不是最后一行。

° 通过while循环记录第一行哈希字符的位置。

° 然后去掉原行中的hash字符以及下一行对应位置的字符。

° 退出 while 循环后，-p 选项将自动将 $_ 打印到标准输出。

纯字符串操作的方法：

perl -pe '
   last if eof;
   my $n = <>;
   while ( (my $p = index($_,"#")) > -1 ) {
      substr($_, $p, 1) = "" for $_, $n;
   }
   $_ .= $n;
'       input_file

这涉及使用内置索引来检查哈希的位置，然后在内置的 substr 中使用它两次......在第一行和下一行上。

Answer

您可以通过以下方式使用 sed 来完成此操作。将两条线都带入图案空间后，在两条线的开头放置两个标记。

然后开始将它们一次向右移动一个字符。在此移动过程中，请注意标记右侧的内容并采取相应的行动。

当标记到达图案空间的末尾时停止。现在，当标记的工作完成后，将其拿走，剩下的就是您想要的。注意标记是\n

 sed -Ee '
   /#/N;/\n/!b
   s/\n/&&/;s/^/\n/
   :a
       /\n#(.*\n.*\n)./{
          s//\n\1/;ba
       }
      s/\n(.)(.*\n.*)\n(.)/\1\n\2\3\n/
   /\n$/!ba
   s/\n//;s///2
'    input

使用 Perl 可以按照以下思路解决：

 perl -pe  ' 
     next unless /#/;

     my($n,$p) = (scalar <>);

     while ( /#/g ) {
        pos($n) = pos() - 1 - $p++;
        $n =~ s/\G.//;
     }

     y/#//d;s/\z/$n/;
'      input_file

在职的：

1. Skip lines that donot have hash char.
 2. Save the next line in $n and init. $p counter which keeps track of the number of hash chars erased till now.
3.  Monitor the position of the hash char in a while loop and using info generate the position of the char to be deleted in next line.
4.  Erase it using the \G metachar in s///
5.  In the final step remove the hash chars from present line and append the next line to it.

显示了另一种方法，这次使用数组：

perl -aF'' -ne '
    print,next unless /#/;
    print,last if eof;

    my @I = grep { $F[$_] ne "#" } 0 .. $#F;
    my @N = split //, <>;

    print @F[@I], @N[@I];
'    input_file

在职的：

1. Invoke Perl to split each line on a per character basis and have it stored in the array @F anew for every line read.
2.  Record the array indices for which the array element is a non hash character.
3.  Readin the next line, split it on a per character basis and store in array @N.
4. Now its a matter of selecting the indices we stored in @I and fetch those from arrays @F and @N.

正则表达式的方法：

perl -pe '
   $_ .= <> unless eof;

    s/\G.(.*\n.{@{[+pos]}})./$1/ while /(?=#.*\n.)/g;
'        input_file

描述：

° 将下一行追加到当前行，只要它不是最后一行。

° 通过while循环记录第一行哈希字符的位置。

° 然后去掉原行中的hash字符以及下一行对应位置的字符。

° 退出 while 循环后，-p 选项将自动将 $_ 打印到标准输出。

纯字符串操作的方法：

perl -pe '
   last if eof;
   my $n = <>;
   while ( (my $p = index($_,"#")) > -1 ) {
      substr($_, $p, 1) = "" for $_, $n;
   }
   $_ .= $n;
'       input_file

这涉及使用内置索引来检查哈希的位置，然后在内置的 substr 中使用它两次......在第一行和下一行上。

Question 3

这在awk.当您看到时#，请确定它在行中的位置。然后，对于该行和所有后续行，将该字符位置从该行中删除。

awk '
    /#/ { pound=index($0, "#") }
        {
                if (pound)
                        print substr($0, 1, pound-1) substr($0, pound+1)
                else
                        print
        }
    '

Answer

这在awk.当您看到时#，请确定它在行中的位置。然后，对于该行和所有后续行，将该字符位置从该行中删除。

awk '
    /#/ { pound=index($0, "#") }
        {
                if (pound)
                        print substr($0, 1, pound-1) substr($0, pound+1)
                else
                        print
        }
    '

Question 4

与 gnu awk 配合使用 gensub

awk '
/#/{
  a=$0
  b=length()
  getline
  $0=a RS$0
  while($0!=a){
    a=$0
    $0=gensub("([^#]*)#(.{"b--"}).","\\1\\2",1)}
}1' infile

解释：

/#/ ：每行带有#

a=$0 ：将行保存在 a 中

b=length() ：获取 b 中的长度

getline : 获取下一行

$0=a RS$0 ：将存储在 a 中的上一行添加到缓冲区 $0 的开头，后跟 RS 记录分隔符

现在 $0 包含 2 行

while($0!=a) : while a 中存储的行与缓冲区 $0 不同

a=$0 : 获取a中的缓冲区$0

$0=gensub("([^#]*)#(.{"b--"}).","\\1\\2",1) ：删除 $0 中的第一个 # 以及 $0 中相应的字符第二行

同时将第一行的长度减 1 (b--)，因为 1 # 被删除

1：当第一行不再有#时打印$0

Answer

与 gnu awk 配合使用 gensub

awk '
/#/{
  a=$0
  b=length()
  getline
  $0=a RS$0
  while($0!=a){
    a=$0
    $0=gensub("([^#]*)#(.{"b--"}).","\\1\\2",1)}
}1' infile

解释：

/#/ ：每行带有#

a=$0 ：将行保存在 a 中

b=length() ：获取 b 中的长度

getline : 获取下一行

$0=a RS$0 ：将存储在 a 中的上一行添加到缓冲区 $0 的开头，后跟 RS 记录分隔符

现在 $0 包含 2 行

while($0!=a) : while a 中存储的行与缓冲区 $0 不同

a=$0 : 获取a中的缓冲区$0

$0=gensub("([^#]*)#(.{"b--"}).","\\1\\2",1) ：删除 $0 中的第一个 # 以及 $0 中相应的字符第二行

同时将第一行的长度减 1 (b--)，因为 1 # 被删除

1：当第一行不再有#时打印$0

awk

答案1

awk

答案2

答案3

答案4

相关内容