仅对 X 行到 Y 行满足的条件进行文本处理

Question 1

使用 GNU awk（4.1.0 或更高版本的inplace扩展 1）：

gawk -i /usr/share/awk/inplace.awk '
  NR >= 10 && NR <= 20 {
    if ($0 in seen) next
    seen[$0]
  }
  {print}' ./file

或者与perl：

perl -ni -e 'print if $. < 10 or $. > 20 or !$seen{$_}++' ./file

处理多个文件：

gawk -i /usr/share/awk/inplace.awk '
  BEGINFILE{delete seen}
  FNR >= 10 && FNR <= 20 {
    if ($0 in seen) next
    seen[$0]
  }
  {print}' ./*.txt

或者与perl：

perl -ni -e '
  print if $. < 10 or $. > 20 or !$seen{$_}++;
  if (eof) {close ARGV; undef %seen}' ./*.txt

^{^不使用-i inplaceas尝试首先从当前工作目录gawk加载inplace扩展（asinplace或），有人可能已经在其中植入了恶意软件。随系统提供的扩展inplace.awk的路径可能会有所不同，请参阅输出inplacegawkgawk 'BEGIN{print ENVIRON["AWKPATH"]}'}

Answer

使用 GNU awk（4.1.0 或更高版本的inplace扩展 1）：

gawk -i /usr/share/awk/inplace.awk '
  NR >= 10 && NR <= 20 {
    if ($0 in seen) next
    seen[$0]
  }
  {print}' ./file

或者与perl：

perl -ni -e 'print if $. < 10 or $. > 20 or !$seen{$_}++' ./file

处理多个文件：

gawk -i /usr/share/awk/inplace.awk '
  BEGINFILE{delete seen}
  FNR >= 10 && FNR <= 20 {
    if ($0 in seen) next
    seen[$0]
  }
  {print}' ./*.txt

或者与perl：

perl -ni -e '
  print if $. < 10 or $. > 20 or !$seen{$_}++;
  if (eof) {close ARGV; undef %seen}' ./*.txt

^{^不使用-i inplaceas尝试首先从当前工作目录gawk加载inplace扩展（asinplace或），有人可能已经在其中植入了恶意软件。随系统提供的扩展inplace.awk的路径可能会有所不同，请参阅输出inplacegawkgawk 'BEGIN{print ENVIRON["AWKPATH"]}'}

Question 2

awk是你的朋友

awk '{
      if(NR>=10 && NR<=20)
      {
        if($0 in record){
         next
        }else{
         print;
         record[$0];
        }
     }
     else{
        print
     }
     }' file > temp && mv temp file

Answer

awk是你的朋友

awk '{
      if(NR>=10 && NR<=20)
      {
        if($0 in record){
         next
        }else{
         print;
         record[$0];
        }
     }
     else{
        print
     }
     }' file > temp && mv temp file

Question 3

如果 OP 需要删除 10-20 行内重复的行：

sed -i '
    :a; 10,19!b; N; s/\(^\|\n\)\([^\n]*\)\n\(\(.\+\n\|\)\2$\)/\1\3/; ba
       ' file1 file2 ...

Answer

如果 OP 需要删除 10-20 行内重复的行：

sed -i '
    :a; 10,19!b; N; s/\(^\|\n\)\([^\n]*\)\n\(\(.\+\n\|\)\2$\)/\1\3/; ba
       ' file1 file2 ...

Question 4

基于 Perl 的答案中应用的相同技巧也可以用于缩短 Awk 代码，并且最终会变得更小、更干净：

awk 'NR < 10 || NR > 20 || !seen[$0]++'
   ^ ^          ^           ^
   | |          |           |
   | \__________\___________\______ no sigil noise
   |
   \_ no options here to remember
      (unless we want that Gawk inplace semantics)

计数器不会溢出，因为范围限制为十行，而且 GNU Awk 无论如何都有 bignum 整数。

Answer

基于 Perl 的答案中应用的相同技巧也可以用于缩短 Awk 代码，并且最终会变得更小、更干净：

awk 'NR < 10 || NR > 20 || !seen[$0]++'
   ^ ^          ^           ^
   | |          |           |
   | \__________\___________\______ no sigil noise
   |
   \_ no options here to remember
      (unless we want that Gawk inplace semantics)

计数器不会溢出，因为范围限制为十行，而且 GNU Awk 无论如何都有 bignum 整数。

仅对 X 行到 Y 行满足的条件进行文本处理

答案1

答案2

答案3

答案4

相关内容