我有两个文件,需要按某些字段进行比较:
参考文件:
42:B:0
43:A:1
44:A:1
45:A:1
目标文件:
42:!:1
43:B:0
44:A:1
45:B:2
我所需要的已经可以通过组合“while”循环+ awk 来实现:
$ cat reference|while IFS=: read a b c;do awk -F: -va=$a -vb=$b -vc=$c '$1==a{if($2!=b){if($3>c)if($2!="!"){ print a":target has bigger $3("$3">"c") and $2 different ("$2")" } else { print a":target has bigger $3("$3">"c") but $2 disabled ("$2")" }}}' target;done
42:target has bigger $3(1>0) but $2 disabled (!)
45:target has bigger $3(2>1) and $2 different (B)
我怎样才能摆脱“while”循环并直接在 awk 中处理这两个文件?
答案1
awk -F: '
FNR == NR { c2[$1] = $2; c3[$1] = $3; next }
!($1 in c2) {
printf("%d: $1 not found in reference\n", $1)
next
}
$3 > c3[$1] && $2 == "!" {
printf("%d: target has bigger $3 (%d>%d) but disabled $2 (%s)\n", $1,$3,c3[$1],$2)
next
}
$3 > c3[$1] && $2 != c2[$1] {
printf("%d: target has bigger $3 (%d>%d) but different $2 (%s)\n", $1,$3,c3[$1],$2)
}' reference target
这将读取参考文件,然后读取目标文件。
读取参考文件 ( ) 时,它会收集两个数组和FNR == NR
中第二列和第三列的值。使用的索引是第一列中的值。c2
c3
当读取目标文件(FNR != NR
)时,它将第三列中的值与数组中保存的值进行比较c3
。然后,它还将第二列与!
中的参考文件中保存的第二列进行比较c2
。
如果在参考文件中找不到目标文件的第一列,代码还会发出一条附加消息。
给出问题中的数据并添加目标行 ( 56:C:9
) 的输出:
42: target has bigger $3 (1>0) but disabled $2 (!)
45: target has bigger $3 (2>1) but different $2 (B)
56: $1 not found in reference
答案2
paste
+awk
魔法:
paste -d':' reference target | awk -F':' \
'$1 == $4 && $2 != $5 && $6 > $3{
if ($5 == "!"){ p = "but"; state = "disabled" }
else { p = "and"; state = "different" }
printf "%s:target has bigger $3(%d > %d) %s $2 %s (%s)\n", $1, $6, $3, p, state, $5
}'
输出:
42:target has bigger $3(1 > 0) but $2 disabled (!)
45:target has bigger $3(2 > 1) and $2 different (B)
奖金gawk
解决方案(考虑到第一个字段值是有序且唯一的):
awk -F':' \
'NR == FNR{
a[NR][1] = $1; a[NR][2] = $2; a[NR][3] = $3; next
}
$1 == a[FNR][1] && $2 != a[FNR][2] && $3 > a[FNR][3]{
if ($2 == "!"){ p = "but"; state = "disabled" }
else { p = "and"; state = "different" }
printf "%s:target has bigger $3(%d > %d) %s $2 %s (%s)\n", $1, $3, a[FNR][3], p, state, $2
}' reference target