我有两个 csv 文件,如下所示,我想将 file1 与 file2 进行比较,如果 file2 中存在记录,则应从 file1 中删除完整行。这里要比较的字段是 ID,在原始文件中它位于第 11 位。
文件1.CSV
"NAME","CITY","MARKS","ID","C","NAME1","TYPE"
"A","XY","100","12","","31420","TYPE1"
"A","XY","100","13","","31420","TYPE1"
文件2.CSV
"ID"
"11"
"12"
"25"
预期产出
"NAME","CITY","MARKS","ID","C","NAME1","TYPE"
"A","XY","100","13","","31420","TYPE1"
答案1
使用以下实用程序csvkit(我的Ubuntu系统上的包python3-csvkit
):
$ csvsql --query '
SELECT * FROM FILE1 WHERE ID NOT IN (SELECT ID FROM FILE2)
' FILE1.CSV FILE2.CSV | csvformat -U1
"NAME","CITY","MARKS","ID","C","NAME1","TYPE"
"A","XY","100","13","","31420","TYPE1"
答案2
与米勒(https://github.com/johnkerl/miller/releases/tag/5.4.0) 是
mlr --csv join --np --ul -j ID -f input_01.csv input_02.csv
一些注意事项:
--np
不发出配对记录--ul
从左侧文件中发出不配对的记录
左边的文件是input_01.csv
"NAME","CITY","MARKS","ID","C","NAME1","TYPE"
"A","XY","100","12","","31420","TYPE1"
"A","XY","100","13","","31420","TYPE1"
答案3
grep -v -wf file2.csv file1.csv
将打印其中file1.csv
不包含任何单词的每一行file1.csv
。
当然,现在标题行丢失了。如果您需要它,请执行以下操作:
$ head -n1 file1.csv;grep -v -wf file2.csv file1.csv
如果第一行file2.csv
定义了单词不应匹配的列,则awk
解决方案如下:
$ awk -v FS="," '
NR==FNR && NR==1 {column=$1; next;} # save the column name to which one will compare
NR==FNR {data[$1]++; next;} # save the list of words to match again
NR!=FNR && FNR==1 {print; for(i=1;i<=NF;i++) {if($i==column) c=i}; next;} # print header line of file1, find column number to which one will compare
{if ($c in data == 0) print} # for any following line check if the word is not in our list
' file2.csv file1.csv