我有两个围脖文件,一些条目是重复的,重复条目在段落中,或者可以用相同的模式来识别,例如
a.bib
好像
@InProceedings{Arranged,
author = {Transcribed by hofei Arranged and by hofei},
title = {ALL OF ME},
file = {:All of Me.pdf:PDF},
groups = {Solo Tab},
}
@InProceedings{P,
author = {P and = and V V and V V},
title = {ANGELS WE HAVE HEARD ON HIGH Transcribed by hofei},
file = {:Angels We Have Heard on High.pdf:PDF},
groups = {Solo Tab},
}
和b.bib
@InProceedings{Arranged,
author = {Transcribed by hofei Arranged and by hofei},
title = {ALL OF ME},
file = {:All of Me.pdf:PDF},
groups = {Solo Tab},
}
@InProceedings{,
title = {This Is My Father's World Standard Tuning Traditional Fast Tempo - “Thumbpicking” Style Arrangement by Mark Hanson},
year = {2005},
file = {:MyFathersWorld_p2.pdf:PDF},
groups = {Solo Tab},
}
我知道要显示两个文件中的重复段落是
$ awk -v RS="" '{gsub(/\n/," "); print}' a.bib b.bib | sort | uniq -c | grep -vE '^\s*1 '
2 @InProceedings{Arranged, author = {Transcribed by hofei Arranged and by hofei}, title = {ALL OF ME}, file = {:All of Me.pdf:PDF}, groups = {Solo Tab}, }
但是如何自动删除重复项呢b.bib
?
答案1
就像我们有线条一样,但现在我们有了段落。解析两个文件,将第一个文件的段落放入哈希中,仅当第二个文件不存在时才打印第二个段落。
awk -v RS="" -v ORS="\n\n" 'FNR==NR{a[$0]; next} !($0 in a)' a.bib b.bib
输出:
@InProceedings{,
title = {This Is My Father's World Standard Tuning Traditional Fast Tempo - “Thumbpicking” Style Arrangement by Mark Hanson},
year = {2005},
file = {:MyFathersWorld_p2.pdf:PDF},
groups = {Solo Tab},
}
请小心,因为任何地方的空白字符都可能导致其丢失重复项。也许您想diff
一起运行一些来确认结果。