比较 2 个制表符分隔文件并输出与列标题和文件中无主键的差异

比较 2 个制表符分隔文件并输出与列标题和文件中无主键的差异

我正在尝试比较 2 个制表符分隔文件以及与列标题和文件中无主键的输出差异。

我非常接近它,但我面临的问题是,当且仅当它有主键时,我所触及的代码片段才有效-

awk '
NR==1 { 
  for (i=1; i<=NF; i++)
    header[i] = $i
}
NR==FNR {
  for (i=1; i<=NF; i++) {
    A[i,NR] = $i
  }
  next
}
{
  for (i=1; i<=NF; i++)
    if (A[i,FNR] != $i)
      print "ID#-" $1 ": " header[i] "- " ARGV[1] " value= ", A[i,FNR]" / " ARGV[2] " value= "$i
}' t1.csv t2.csv

谁能帮我如何实现它

  1. 当我没有任何主键时
  2. 当行数不相同且一个文件缺少记录时

t1.csv

Month   ClientSegment   ClientType  IssuerClientSegment NetworkID   VD
2020-12 COMMUNITY   EXEMPT  COMMUNITY   0   OTHER   
2020-12 COMMUNITY   EXEMPT  COMMUNITY   2   OTHER   
2020-12 COMMUNITY   EXEMPT  COMMUNITY   5   OTHER

t2.csv

Month   ClientSegment   ClientType  IssuerClientSegment NetworkID   VD  
2020-12 COMMUNITY   EXEMPT  COMMUNITY   0   OTHER
2020-12 COMMUNITY   EXEMPT  COMMUNITY   2   OTHER1
2020-13 COMMUNITY   EXEMPT  COMMUNITY   2   PUSH
2020-13 COMMUNITY   EXEMPT  COMMUNITY   3   OTHER

期望输出如下:

Row 2, Column: VD- t1.csv value=  OTHER / t2.csv value= OTHER1

Missing in t2.csv
Month   Client Segment  Client Type Issuer Client Segment   Network ID  VD
2020-12 COMMUNITY   EXEMPT  COMMUNITY   5   OTHER

Missing in t1.csv
Month   Client Segment  Client Type Issuer Client Segment   Network ID  VD 
2020-13 COMMUNITY   EXEMPT  COMMUNITY   2   PUSH
2020-13 COMMUNITY   EXEMPT  COMMUNITY   3   OTHER

答案1

使用daff

daff --input-format tsv t1.csv t2.csv
@@  Month   ClientSegment   ClientType  IssuerClientSegment NetworkID   VD
    2020-12 COMMUNITY       EXEMPT      COMMUNITY           0           OTHER
→   2020-12 COMMUNITY       EXEMPT      COMMUNITY           2           OTHER→OTHER1
+++ 2020-13 COMMUNITY       EXEMPT      COMMUNITY           2           PUSH
+++ 2020-13 COMMUNITY       EXEMPT      COMMUNITY           3           OTHER
--- 2020-12 COMMUNITY       EXEMPT      COMMUNITY           5           OTHER

使用安装pip install daff(您可能还需要sudo apt install python-pip)。

答案2

awk '
{ key = $1 OFS $2 OFS $3 OFS $4 OFS $5 }
! secondInput {
        file1[key] = $6
        NRfile1[key] = NR
        next
}
(key in file1) {
        if (file1[key] != $NF) { print "diff-line#:", NRfile1[key] "|" FNR, $0 }
        delete file1[key]
        next
}
{ print "missing in file1: ", $0 }
END {
        for (key in file1) {
                print "missing in file2: ", key, file1[key]
        }
}' file1 secondInput=1 file2

输出:

diff-line#: 3|3 2020-12 COMMUNITY   EXEMPT  COMMUNITY   2   OTHER1
missing in file1:  2020-13 COMMUNITY   EXEMPT  COMMUNITY   2   PUSH
missing in file1:  2020-13 COMMUNITY   EXEMPT  COMMUNITY   3   OTHER
missing in file2:  2020-12 COMMUNITY EXEMPT COMMUNITY 5 OTHER

相关内容