awk - 处理两个文件

awk - 处理两个文件

我有两个文件,需要按某些字段进行比较:

参考文件:

42:B:0
43:A:1
44:A:1
45:A:1

目标文件:

42:!:1
43:B:0
44:A:1
45:B:2

我所需要的已经可以通过组合“while”循环+ awk 来实现:

$ cat reference|while IFS=: read a b c;do awk -F: -va=$a -vb=$b -vc=$c '$1==a{if($2!=b){if($3>c)if($2!="!"){ print a":target has bigger $3("$3">"c") and $2 different ("$2")" } else { print a":target has bigger $3("$3">"c") but $2 disabled ("$2")" }}}' target;done
42:target has bigger $3(1>0) but $2 disabled (!)
45:target has bigger $3(2>1) and $2 different (B)

我怎样才能摆脱“while”循环并直接在 awk 中处理这两个文件?

答案1

awk -F: '
    FNR == NR { c2[$1] = $2; c3[$1] = $3; next }
    !($1 in c2) {
        printf("%d: $1 not found in reference\n", $1)
        next
    }
    $3 > c3[$1] && $2 == "!"    {
        printf("%d: target has bigger $3 (%d>%d) but disabled $2 (%s)\n", $1,$3,c3[$1],$2)
        next
    }
    $3 > c3[$1] && $2 != c2[$1] {
        printf("%d: target has bigger $3 (%d>%d) but different $2 (%s)\n", $1,$3,c3[$1],$2)
    }' reference target

这将读取参考文件,然后读取目标文件。

读取参考文件 ( ) 时,它会收集两个数组和FNR == NR中第二列和第三列的值。使用的索引是第一列中的值。c2c3

当读取目标文件(FNR != NR)时,它将第三列中的值与数组中保存的值进行比较c3。然后,它还将第二列与!中的参考文件中保存的第二列进行比较c2

如果在参考文件中找不到目标文件的第一列,代码还会发出一条附加消息。

给出问题中的数据并添加目标行 ( 56:C:9) 的输出:

42: target has bigger $3 (1>0) but disabled $2 (!)
45: target has bigger $3 (2>1) but different $2 (B)
56: $1 not found in reference

答案2

paste+awk魔法:

paste -d':' reference target | awk -F':' \
'$1 == $4 && $2 != $5 && $6 > $3{ 
     if ($5 == "!"){ p = "but"; state = "disabled" }
     else { p = "and"; state = "different" }
     printf "%s:target has bigger $3(%d > %d) %s $2 %s (%s)\n", $1, $6, $3, p, state, $5 
}'

输出:

42:target has bigger $3(1 > 0) but $2 disabled (!)
45:target has bigger $3(2 > 1) and $2 different (B)

奖金gawk解决方案(考虑到第一个字段值是有序且唯一的):

awk -F':' \
'NR == FNR{ 
    a[NR][1] = $1; a[NR][2] = $2; a[NR][3] = $3; next 
}
$1 == a[FNR][1] && $2 != a[FNR][2] && $3 > a[FNR][3]{ 
   if ($2 == "!"){ p = "but"; state = "disabled" }
   else { p = "and"; state = "different" }
   printf "%s:target has bigger $3(%d > %d) %s $2 %s (%s)\n", $1, $3, a[FNR][3], p, state, $2 
}' reference target

相关内容