我正在尝试比较 2 个制表符分隔文件以及与列标题和文件中无主键的输出差异。
我非常接近它,但我面临的问题是,当且仅当它有主键时,我所触及的代码片段才有效-
awk '
NR==1 {
for (i=1; i<=NF; i++)
header[i] = $i
}
NR==FNR {
for (i=1; i<=NF; i++) {
A[i,NR] = $i
}
next
}
{
for (i=1; i<=NF; i++)
if (A[i,FNR] != $i)
print "ID#-" $1 ": " header[i] "- " ARGV[1] " value= ", A[i,FNR]" / " ARGV[2] " value= "$i
}' t1.csv t2.csv
谁能帮我如何实现它
- 当我没有任何主键时
- 当行数不相同且一个文件缺少记录时
t1.csv
Month ClientSegment ClientType IssuerClientSegment NetworkID VD
2020-12 COMMUNITY EXEMPT COMMUNITY 0 OTHER
2020-12 COMMUNITY EXEMPT COMMUNITY 2 OTHER
2020-12 COMMUNITY EXEMPT COMMUNITY 5 OTHER
t2.csv
Month ClientSegment ClientType IssuerClientSegment NetworkID VD
2020-12 COMMUNITY EXEMPT COMMUNITY 0 OTHER
2020-12 COMMUNITY EXEMPT COMMUNITY 2 OTHER1
2020-13 COMMUNITY EXEMPT COMMUNITY 2 PUSH
2020-13 COMMUNITY EXEMPT COMMUNITY 3 OTHER
期望输出如下:
Row 2, Column: VD- t1.csv value= OTHER / t2.csv value= OTHER1
Missing in t2.csv
Month Client Segment Client Type Issuer Client Segment Network ID VD
2020-12 COMMUNITY EXEMPT COMMUNITY 5 OTHER
Missing in t1.csv
Month Client Segment Client Type Issuer Client Segment Network ID VD
2020-13 COMMUNITY EXEMPT COMMUNITY 2 PUSH
2020-13 COMMUNITY EXEMPT COMMUNITY 3 OTHER
答案1
使用daff
:
daff --input-format tsv t1.csv t2.csv
@@ Month ClientSegment ClientType IssuerClientSegment NetworkID VD
2020-12 COMMUNITY EXEMPT COMMUNITY 0 OTHER
→ 2020-12 COMMUNITY EXEMPT COMMUNITY 2 OTHER→OTHER1
+++ 2020-13 COMMUNITY EXEMPT COMMUNITY 2 PUSH
+++ 2020-13 COMMUNITY EXEMPT COMMUNITY 3 OTHER
--- 2020-12 COMMUNITY EXEMPT COMMUNITY 5 OTHER
使用安装pip install daff
(您可能还需要sudo apt install python-pip
)。
答案2
awk '
{ key = $1 OFS $2 OFS $3 OFS $4 OFS $5 }
! secondInput {
file1[key] = $6
NRfile1[key] = NR
next
}
(key in file1) {
if (file1[key] != $NF) { print "diff-line#:", NRfile1[key] "|" FNR, $0 }
delete file1[key]
next
}
{ print "missing in file1: ", $0 }
END {
for (key in file1) {
print "missing in file2: ", key, file1[key]
}
}' file1 secondInput=1 file2
输出:
diff-line#: 3|3 2020-12 COMMUNITY EXEMPT COMMUNITY 2 OTHER1
missing in file1: 2020-13 COMMUNITY EXEMPT COMMUNITY 2 PUSH
missing in file1: 2020-13 COMMUNITY EXEMPT COMMUNITY 3 OTHER
missing in file2: 2020-12 COMMUNITY EXEMPT COMMUNITY 5 OTHER