通过 AWK 查找重复值

通过 AWK 查找重复值

需要从具有重复条目的两个不同文件中进行 vlookup:

文件1

abc     10
xyz     20
bhy     30
hgf     40

文件2

a   abc     
b   xyz     
c   bhy     
d   abc     
e   abc     
f   xyz     

所需输出:

abc     10  a,d,e
xyz     20  b,f
bhy     30  c
hgf     40  Not_Available

答案1

Awk解决方案:

awk 'NR == FNR { a[$1] = $2 OFS; next }
     $2 in a { a[$2] = a[$2] (a[$2] ~ /\t$/? "" : ",") $1 }
     END { for (i in a) print i, a[i] }' OFS='\t' file1 file2

输出:

bhy 30  c
abc 10  a,d,e
xyz 20  b,f

对于最后一个条件,请使用以下修改:

awk 'NR == FNR { a[$1] = $2 OFS; next }
     $2 in a { a[$2] = a[$2] (a[$2] ~ /\t$/? "" : ",") $1 }
     END {
         for (i in a) print i, a[i] (a[i] ~ /\t$/? "Not_Available" : "")
     }' OFS='\t' file1 file2

答案2

用 awk

awk -v OFS='\t' '
    NR == FNR {val[$1]=$2; next} 
    {items[$2] = items[$2] " " $1}
    END {
        for (a in val) {
            sub(/^ /, "", items[a])
            gsub(/ /, ",", items[a]) 
            print a, val[a], items[a]
        }
    }
' file1 file2
bhy     30      c
abc     10      a,d,e
xyz     20      b,f

如果你想对其进行排序,请输入| sort -k2,2n

只是为了好玩,一点点 Perl 的技巧

perl -lae '
    if ($. == ++$nr) { $val{$F[0]} = $F[1] }
    else             { push @{$items{$F[1]}}, $F[0] }
  } continue {
    close ARGV if eof
  } END { 
    printf "%s\t%s\t%s\n", $_, $val{$_}, join(",", @{$items{$_}}) for keys %items 
' file1 file2

相关内容