我有两个文件 file1.csv 和 file2.csv。
以下是file1.csv的内容
AL.jar;d8c06ebedd7954681f34ab5c94fdc4fb
AR.jar;9a553dd203d0979aa60004e19cc98c12
BI.jar;8022f6c5f83ba040394ff0b0a0323e8e
BV.jar;f53c4a8c988aa8806b54063ebc682803
CaseUtilities.jar;e5f653d899298f5e5d56f357b6f781c5
CO.jar;b2f7a0ab6e646d6793631e5c97e05096
文件2.csv
AL.jar;d8c06ebedd7954681f34ab5c94fdc4fb
AR.jar;4e6e584dd852684ba21ae63990e2a1a6
BV.jar;213d9df82095764702ef4929424a1a0c
CaseUtilities.jar;5b787f1f3d57922bd980ebbfe9a5343e
CO.jar;cfb994078ff4373c7e0f15de19830a3d
Common.jar;a09b520288870aa3888194ce59179dbd
我们需要根据内容比较两个文件。
我想制作仅基于第一列的值的 diff,所以结果应该是
AL.jar;d8c06ebedd7954681f34ab5c94fdc4fb AL.jar;d8c06ebedd7954681f34ab5c94fdc4fb
AR.jar;9a553dd203d0979aa60004e19cc98c12 AR.jar;4e6e584dd852684ba21ae63990e2a1a6
BI.jar;8022f6c5f83ba040394ff0b0a0323e8e <NULL>
BV.jar;f53c4a8c988aa8806b54063ebc682803 BV.jar;213d9df82095764702ef4929424a1a0c
CaseUtilities.jar;e5f653d899298f5e5d56f357b6f781c5 CaseUtilities.jar;5b787f1f3d57922bd980ebbfe9a5343e
CO.jar;b2f7a0ab6e646d6793631e5c97e05096 CO.jar;cfb994078ff4373c7e0f15de19830a3d
<NULL> Common.jar;a09b520288870aa3888194ce59179dbd
我已经尝试过以下命令
diff -y file1.csv file2.csv
但下面的输出并不符合预期。
AL.jar;d8c06ebedd7954681f34ab5c94fdc4fb AL.jar;d8c06ebedd7954681f34ab5c94fdc4fb
AR.jar;9a553dd203d0979aa60004e19cc98c12 | AR.jar;4e6e584dd852684ba21ae63990e2a1a6
BI.jar;8022f6c5f83ba040394ff0b0a0323e8e | BV.jar;213d9df82095764702ef4929424a1a0c
BV.jar;f53c4a8c988aa8806b54063ebc682803 | CaseUtilities.jar;5b787f1f3d57922bd980ebbfe9a5343e
CaseUtilities.jar;e5f653d899298f5e5d56f357b6f781c5 | CO.jar;cfb994078ff4373c7e0f15de19830a3d
CO.jar;b2f7a0ab6e646d6793631e5c97e05096 | Common.jar;a09b520288870aa3888194ce59179dbd
知道如何实现我的预期输出!
答案1
awk -F "\"*;\"*" '{print $1}' file1.csv > file1 # get first column from file1.csv with awk as stream and redirect to file1 var
awk -F "\"*;\"*" '{print $1}' file2.csv > file2 # get first column from file2.csv with awk as stream and redirect to file2 var
diff -y file1 file2 # diff file1 and file2 bash vars
或者与单个命令相同:
diff -y <(awk -F "\"*;\"*" '{print $1}' file1.csv) <(awk -F "\"*;\"*" '{print $1}' file2.csv)
结果:
AL.jar AL.jar
AR.jar AR.jar
BI.jar <
BV.jar BV.jar
CaseUtilities.jar CaseUtilities.jar
CO.jar CO.jar
| Common.jar
答案2
另一种方法是使用join
andcolumn
如果您的文件按照示例进行排序......
join -t\; -e "<NULL>" -a 1 -a 2 -o 1.1 1.2 2.1 2.2 file1 file2 | column -t -s\;
输出
AL.jar d8c06ebedd7954681f34ab5c94fdc4fb AL.jar d8c06ebedd7954681f34ab5c94fdc4fb
AR.jar 9a553dd203d0979aa60004e19cc98c12 AR.jar 4e6e584dd852684ba21ae63990e2a1a6
BI.jar 8022f6c5f83ba040394ff0b0a0323e8e <NULL> <NULL>
BV.jar f53c4a8c988aa8806b54063ebc682803 BV.jar 213d9df82095764702ef4929424a1a0c
CaseUtilities.jar e5f653d899298f5e5d56f357b6f781c5 CaseUtilities.jar 5b787f1f3d57922bd980ebbfe9a5343e
CO.jar b2f7a0ab6e646d6793631e5c97e05096 CO.jar cfb994078ff4373c7e0f15de19830a3d
<NULL> <NULL> Common.jar a09b520288870aa3888194ce59179dbd
join
默认行为是在第一个字段上连接,因此只需设置分隔符以-t\;
包含任一文件的所有不匹配项-a 1 -a 2
并填充任何空字段-e "<NULL>"
,然后指定输出字段-o ......
。
join
输出具有相同的分隔符,因此通过管道column -t
使用相同的分隔符来格式化打印。
输出不太准确,但打字较少......
答案3
你总是可以在 awk 中做到这一点:
$ awk -F';' '{
if(NR==FNR){a[$1]=$0}
else{b[$1]=$0}
ids[$1]++;
}
END{
for(id in ids){
printf "%s\t%s\n",a[id],b[id];
}
}' file1.csv file2.csv | column -t
CO.jar;b2f7a0ab6e646d6793631e5c97e05096 CO.jar;cfb994078ff4373c7e0f15de19830a3d
BV.jar;f53c4a8c988aa8806b54063ebc682803 BV.jar;213d9df82095764702ef4929424a1a0c
Common.jar;a09b520288870aa3888194ce59179dbd
AL.jar;d8c06ebedd7954681f34ab5c94fdc4fb AL.jar;d8c06ebedd7954681f34ab5c94fdc4fb
AR.jar;9a553dd203d0979aa60004e19cc98c12 AR.jar;4e6e584dd852684ba21ae63990e2a1a6
BI.jar;8022f6c5f83ba040394ff0b0a0323e8e
CaseUtilities.jar;e5f653d899298f5e5d56f357b6f781c5 CaseUtilities.jar;5b787f1f3d57922bd980ebbfe9a5343e
或者,包含NULL
示例输出中的:
$ awk -F';' '{
if(NR==FNR){
a[$1]=$0;
b[$1]="<NULL>"
}
else{
b[$1]=$0;
a[$1] = a[$1] ? a[$1] : "<NULL>";
}
ids[$1]++;
}
END{
for(id in ids){
printf "%s\t%s\n",a[id],b[id];
}
}' file1.csv file2.csv | column -t
CO.jar;b2f7a0ab6e646d6793631e5c97e05096 CO.jar;cfb994078ff4373c7e0f15de19830a3d
BV.jar;f53c4a8c988aa8806b54063ebc682803 BV.jar;213d9df82095764702ef4929424a1a0c
<NULL> Common.jar;a09b520288870aa3888194ce59179dbd
AL.jar;d8c06ebedd7954681f34ab5c94fdc4fb AL.jar;d8c06ebedd7954681f34ab5c94fdc4fb
AR.jar;9a553dd203d0979aa60004e19cc98c12 AR.jar;4e6e584dd852684ba21ae63990e2a1a6
BI.jar;8022f6c5f83ba040394ff0b0a0323e8e <NULL>
CaseUtilities.jar;e5f653d899298f5e5d56f357b6f781c5 CaseUtilities.jar;5b787f1f3d57922bd980ebbfe9a5343e
答案4
awk 'FNR==NR{arr[$0]=1;next} !arr[$0]{print}' checkagainst.csv sourcefile.csv > testinationfile.csv