查找两个文件中重复的段落并删除一个

查找两个文件中重复的段落并删除一个

我有两个围脖文件,一些条目是重复的,重复条目在段落中,或者可以用相同的模式来识别,例如

a.bib好像

@InProceedings{Arranged,
  author = {Transcribed by hofei Arranged and by hofei},
  title  = {ALL OF ME},
  file   = {:All of Me.pdf:PDF},
  groups = {Solo Tab},
}

@InProceedings{P,
  author = {P and = and V V and V V},
  title  = {ANGELS WE HAVE HEARD ON HIGH Transcribed by hofei},
  file   = {:Angels We Have Heard on High.pdf:PDF},
  groups = {Solo Tab},
}

b.bib

@InProceedings{Arranged,
  author = {Transcribed by hofei Arranged and by hofei},
  title  = {ALL OF ME},
  file   = {:All of Me.pdf:PDF},
  groups = {Solo Tab},
}

@InProceedings{,
  title  = {This Is My Father's World Standard Tuning Traditional Fast Tempo - “Thumbpicking” Style Arrangement by Mark Hanson},
  year   = {2005},
  file   = {:MyFathersWorld_p2.pdf:PDF},
  groups = {Solo Tab},
}

我知道要显示两个文件中的重复段落

$ awk -v RS=""  '{gsub(/\n/," "); print}' a.bib b.bib | sort | uniq -c | grep -vE '^\s*1 '
      2 @InProceedings{Arranged,   author = {Transcribed by hofei Arranged and by hofei},   title  = {ALL OF ME},   file   = {:All of Me.pdf:PDF},   groups = {Solo Tab}, }

但是如何自动删除重复项呢b.bib

答案1

就像我们有线条一样,但现在我们有了段落。解析两个文件,将第一个文件的段落放入哈希中,仅当第二个文件不存在时才打印第二个段落。

awk -v RS="" -v ORS="\n\n" 'FNR==NR{a[$0]; next} !($0 in a)' a.bib b.bib

输出:

@InProceedings{,
  title  = {This Is My Father's World Standard Tuning Traditional Fast Tempo - “Thumbpicking” Style Arrangement by Mark Hanson},
  year   = {2005},
  file   = {:MyFathersWorld_p2.pdf:PDF},
  groups = {Solo Tab},
}

请小心,因为任何地方的空白字符都可能导致其丢失重复项。也许您想diff一起运行一些来确认结果。

相关内容