如何比较两个 CSV 文件并显示唯一记录?

如何比较两个 CSV 文件并显示唯一记录?

我有两个 csv 文件,如下所示,我想将 file1 与 file2 进行比较,如果 file2 中存在记录,则应从 file1 中删除完整行。这里要比较的字段是 ID,在原始文件中它位于第 11 位。

文件1.CSV

"NAME","CITY","MARKS","ID","C","NAME1","TYPE"
"A","XY","100","12","","31420","TYPE1"
"A","XY","100","13","","31420","TYPE1"

文件2.CSV

"ID"
"11"
"12"
"25"

预期产出

"NAME","CITY","MARKS","ID","C","NAME1","TYPE"
"A","XY","100","13","","31420","TYPE1"

答案1

使用以下实用程序csvkit(我的Ubuntu系统上的包python3-csvkit):

$ csvsql --query '
  SELECT * FROM FILE1 WHERE ID NOT IN (SELECT ID FROM FILE2)
' FILE1.CSV FILE2.CSV | csvformat -U1
"NAME","CITY","MARKS","ID","C","NAME1","TYPE"
"A","XY","100","13","","31420","TYPE1"

答案2

与米勒(https://github.com/johnkerl/miller/releases/tag/5.4.0) 是

mlr --csv join --np --ul -j ID -f input_01.csv input_02.csv

一些注意事项:

  • --np不发出配对记录
  • --ul从左侧文件中发出不配对的记录

左边的文件是input_01.csv

"NAME","CITY","MARKS","ID","C","NAME1","TYPE"
"A","XY","100","12","","31420","TYPE1"
"A","XY","100","13","","31420","TYPE1"

答案3

grep -v -wf file2.csv file1.csv将打印其中file1.csv不包含任何单词的每一行file1.csv

当然,现在标题行丢失了。如果您需要它,请执行以下操作:

$ head -n1 file1.csv;grep -v -wf file2.csv file1.csv

如果第一行file2.csv定义了单词不应匹配的列,则awk解决方案如下:

$ awk -v FS="," '
    NR==FNR && NR==1 {column=$1; next;} # save the column name to which one will compare
    NR==FNR {data[$1]++; next;} # save the list of words to match again
    NR!=FNR && FNR==1 {print; for(i=1;i<=NF;i++) {if($i==column) c=i}; next;}  # print header line of file1, find column number to which one will compare
    {if ($c in data == 0) print} # for any following line check if the word is not in our list
' file2.csv file1.csv

相关内容