我想在文件中查找重复项，并在第一个匹配项的行尾添加一个字符

Question 1

您需要处理该文件两次。在第一次运行中，您将欺骗内容写入文件中：

awk '{if (++dup[$2] == 1) print $2;}' test.html > dupes.txt

第二次运行将所有行与文件内容进行比较：

awk 'BEGIN { while (getline var <"dupes.txt") { dup2[var]=1; }};
  { num=++dup[$2]
    if (num == 1) { if (1 == dup2[$2]) print $0 " acked"; else print $0;} }' \
test.html

Answer

您需要处理该文件两次。在第一次运行中，您将欺骗内容写入文件中：

awk '{if (++dup[$2] == 1) print $2;}' test.html > dupes.txt

第二次运行将所有行与文件内容进行比较：

awk 'BEGIN { while (getline var <"dupes.txt") { dup2[var]=1; }};
  { num=++dup[$2]
    if (num == 1) { if (1 == dup2[$2]) print $0 " acked"; else print $0;} }' \
test.html

Question 2

如果我们有整个文件，这会容易得多。您是否只对以host=或开头的行感兴趣任何第二个字段？对于通用解决方案，请尝试以下操作：

perl -e '@file=<>; 
         foreach(map{/.+?\s+(.+)/;}@file){$dup{$_}++};  
         foreach(@file){
              chomp; 
              /.+?\s+(.+)/; 
              if($dup{$1}>1 && not defined($p{$1})){
                 print "$_ acked\n";
                 $p{$1}++;}
              else{print "$_\n"}
          }' test.html

上面的脚本将首先读取整个文件，检查重复项，然后打印每个重复行，然后打印“acked”。

如果我们可以假设您只对以下开头的行感兴趣，那么整个事情就会简单得多down X：

grep down test.html | awk '{printf $2}' | 
 perl -e 'while(<>){$dup{$_}++}open(A,"test.html"); 
   while(<A>){
    if(/host=\s+(.+)/ && defined($dup{$1})){
      chomp; print "$_ acked\n"}
    else{print}}'

Answer

如果我们有整个文件，这会容易得多。您是否只对以host=或开头的行感兴趣任何第二个字段？对于通用解决方案，请尝试以下操作：

perl -e '@file=<>; 
         foreach(map{/.+?\s+(.+)/;}@file){$dup{$_}++};  
         foreach(@file){
              chomp; 
              /.+?\s+(.+)/; 
              if($dup{$1}>1 && not defined($p{$1})){
                 print "$_ acked\n";
                 $p{$1}++;}
              else{print "$_\n"}
          }' test.html

上面的脚本将首先读取整个文件，检查重复项，然后打印每个重复行，然后打印“acked”。

如果我们可以假设您只对以下开头的行感兴趣，那么整个事情就会简单得多down X：

grep down test.html | awk '{printf $2}' | 
 perl -e 'while(<>){$dup{$_}++}open(A,"test.html"); 
   while(<A>){
    if(/host=\s+(.+)/ && defined($dup{$1})){
      chomp; print "$_ acked\n"}
    else{print}}'

Question 3

这可以帮助：

单线：

awk 'NR==FNR{b[$2]++; next} $2 in b { if (b[$2]>1) { print $0" acked" ; delete b[$2]} else print $0}' inputFile inputFile

说明：

awk '
NR==FNR { 

        ## Loop through the file and check which line is repeated based on column 2

        b[$2]++

        ## Skip the rest of the actions until complete file is scanned

        next
} 

## Once the scan is complete, look for second column in the array

$2 in b { 

        ## If the count of the column is greater than 1 it means there is duplicate.

        if (b[$2]>1) { 

            ## So print that line with "acked" marker

            print $0" acked"

            ## and delete the array so that it is not printed again

            delete b[$2]
        } 

        ## If count is 1 it means there was no duplicate so print the line

        else 
            print $0
}' inputFile inputFile

Answer

这可以帮助：

单线：

awk 'NR==FNR{b[$2]++; next} $2 in b { if (b[$2]>1) { print $0" acked" ; delete b[$2]} else print $0}' inputFile inputFile

说明：

awk '
NR==FNR { 

        ## Loop through the file and check which line is repeated based on column 2

        b[$2]++

        ## Skip the rest of the actions until complete file is scanned

        next
} 

## Once the scan is complete, look for second column in the array

$2 in b { 

        ## If the count of the column is greater than 1 it means there is duplicate.

        if (b[$2]>1) { 

            ## So print that line with "acked" marker

            print $0" acked"

            ## and delete the array so that it is not printed again

            delete b[$2]
        } 

        ## If count is 1 it means there was no duplicate so print the line

        else 
            print $0
}' inputFile inputFile

我想在文件中查找重复项，并在第一个匹配项的行尾添加一个字符

答案1

答案2

答案3

单线：

说明：

相关内容