使用 bibtool 删除重复项并添加缺失字段

使用 bibtool 删除重复项并添加缺失字段

如果您不介意的话,我在这里问几个问题,我在谷歌上搜索过,但无法找到一些连贯的例子。

我的用例包括为我的工作创建 .bib 文件,并偶尔将它们合并为更大的 .bib 文件。我使用的引用键采用以下模式:author_year_unique_consecutive_letter(如果同一作者在同一年出版多部作品),不带下划线。

例子:

doe2018
doe2018b
doe2018c
doe2018d
etc...

我希望最终在将 .bib 文件合并到较大的主文件时实现以下目标。

1)搜索单独的 .bib 文件中的所有条目,如果缺少字段,journal则添加用户定义的文本

我尝试了以下命令并查看了 bibtool 手册但似乎没有产生所需的输出。

bibtool 'add.field={journal="(journal){%N(journal)}{--no-journal--}"}' ./biblio.bib

2)在将所有内容合并到更大的 .bib 文件之前,请检查重复的条目,即找到具有相同键和相同标题的条目并通过删除多余的条目来解决这些问题,如果只有重复的键但这些条目之间的标题不同,则提取这些条目并将它们转储到名为 duplicates.bib 的附加文件中以供手动检查,避免在更大的 .bib 文件中将它们完全合并。

谢谢!

答案1

似乎bibtool没有真正适合此功能的功能(add.field首先删除一个字段(如果存在),然后添加,这当然不合适)。但是,有一个名为的 Perl 库Text::BibTeX可用于编写您自己的 BibTeX 操作脚本,该脚本有相当完善的文档记录且易于使用。示例addjournal.pl

use Text::BibTeX;

my $bibfile = Text::BibTeX::File->new("$ARGV[0]") or die "error: $ARGV[0] not found\n";
my $newfile = Text::BibTeX::File->new(">$ARGV[1]") or die "error: cannot write $ARGV[1]\n";

while ($entry = Text::BibTeX::Entry->new($bibfile)){
    if($entry->type eq "article"){
        $journal = $entry->get('journal');
        if(!defined($journal)){
            $entry->set('journal','--no-journal--');
        }
    }
    $entry->write($newfile);
}

示例modifyref.bib文件:

@article{hasjournal,
    author = {Mary Jones},
    title = {First Things},
    journal = {Journal of Things}
}

@article{nojournala,
    author = {John Doe},
    title = {On Things},
}

@inproceedings{nojournalb,
    author = {Joe Peterson},
    title = {Briefly Explained},
    booktitle = {Conference of Briefness}
}

运行后的结果perl addjournal.pl modifyref.bib newref.bib(注意只是article修改了条目,并没有修改inproceedings条目):

@article{hasjournal,
  author = {Mary Jones},
  title = {First Things},
  journal = {Journal of Things},
}

@article{nojournala,
  author = {John Doe},
  title = {On Things},
  journal = {--no-journal--},
}

@inproceedings{nojournalb,
  author = {Joe Peterson},
  title = {Briefly Explained},
  booktitle = {Conference of Briefness},
}

第二个问题稍微复杂一些,但仍然相当简单(bibmerge.pl):

use Text::BibTeX;

my $bibfile1 = Text::BibTeX::File->new("$ARGV[0]") or die "error: $ARGV[0] not found\n";
my $bibfile2 = Text::BibTeX::File->new("$ARGV[1]") or die "error: $ARGV[1] not found\n";
my $merged = Text::BibTeX::File->new(">$ARGV[2]") or die "error: cannot write $ARGV[2]\n";
my $duplicatefile = Text::BibTeX::File->new(">duplicates.bib") or die "error: cannot write duplicates.bib\n";

foreach $bibfile ($bibfile1, $bibfile2){
    while ($entry = Text::BibTeX::Entry->new($bibfile)){
        $bibkey = $entry->key;
        $title = $entry->get('title');
        # if a key is found previously
        if(exists($all_keys{$bibkey})){
            # if the title is not equal to the previous occurrence of this key
            # (no else case: if the title is equal then do nothing with the current entry and do not mark the previous entry)
            if($all_keys{$bibkey}{"title"} ne $title){
                # write previous to duplicate file
                $all_keys{$bibkey}{"entry"}->write($duplicatefile);
                # write current to duplicate file
                $entry->write($duplicatefile);
                # mark previous not to print in merged file
                $all_keys{$bibkey}{"duplicate"} = 1;
            }
        }else{
            # new key, store title and entry object for comparison with following entries
            $all_keys{$bibkey}{"title"} = $title;
            $all_keys{$bibkey}{"entry"} = $entry;
        }
    }
}
# print all entries to merged file except entries marked as duplicate
foreach $bibkey (keys %all_keys){
    if(!exists($all_keys{$bibkey}{"duplicate"})){
        $all_keys{$bibkey}{"entry"}->write($merged);
    }
}

样本modifyref2.bib

@article{hasjournal,
    author = {Sue Jones},
    title = {First Things},
    journal = {Journal of Duplicate Titles}
}

@article{nojournalb,
    author = {Jane Smith},
    title = {On Other Things},
}

@article{nojournala,
    author = {John Doe},
    title = {On Different Things},
}

运行后结果perl bibmerge.pl modifyref.bib modifyref2.bib merged.bib

merged.bib

@article{hasjournal,
  author = {Mary Jones},
  title = {First Things},
  journal = {Journal of Things},
}

duplicates.bib

@inproceedings{nojournalb,
  author = {Joe Peterson},
  title = {Briefly Explained},
  booktitle = {Conference of Briefness},
}

@article{nojournalb,
  author = {Jane Smith},
  title = {On Other Things},
}

@article{nojournala,
  author = {John Doe},
  title = {On Things},
}

@article{nojournala,
  author = {John Doe},
  title = {On Different Things},
}

请注意,还有几项改进要做,但这可能足以作为概念证明。

相关内容