基于辅助文件从集中式 bib 文件自动复制参考文献的最佳脚本

Question 1

（这是一个答案）

在我学会如何正确管理 BibTeX 文件之前，我曾经用过这个脚本.bib！它唯一没有做的就是从.aux文件中提取参考文献。但这并不难做到。这个脚本的绝大部分内容是设置配置。它的实际核心是在最后一个循环中，它浏览文件.bib并查找适当的条目。这个循环之所以这么短，是因为其他人已经编写了一个Text::BibTeXperl 模块，其余的只是装饰而已。如果这是你想要的东西，那么将它调整为 (a) 搜索文件.aux以查找参考文献和 (b) 不必理会我用它来做的所有其他事情（例如自动从 arXiv 下载参考文献）将非常容易。

#! /usr/bin/perl -w

use strict;
#use Getopt::Long qw(:config auto_help bundling);
use Getopt::Long qw(:config bundling);
use Pod::Usage;
use Text::BibTeX;
use LWP::MediaTypes qw(guess_media_type);

# This is the only variables that should need customising

my $conffile = "$ENV{HOME}/.refsrc";

###
# Nothing below here should need customisation
###

my (
    $bibbase,
    $hostname,
    $docbase,
    $reffile,
    $lynx,
    $arxivdir,
    $ext,
    $authors,
    $refs,
    $reduced,
    $silent,
    $titles,
    $append,
    $view,
    $show,
    $help,
    $man,
    $gpl,
    $bibfile,
    $entry,
    $output
    );

my @reffiles;
my %mime;
my %values;
my @shrt;
my %tests;

@shrt = ("title","author");

%tests = (
    "authors" => [0, sub {
    my ($e,$b) = @_;
    my @a;
    if ($$e->exists('author')) {
        @a = $$e->split('author');
    } elsif ($$e->exists('editor')) {
        @a = $$e->split('editor');
    } else {
        return 0;
    }
    for (my $j = 0; $j <= $#$b; $j++) {
        for (my $i = 0; $i <= $#a; $i++) {
        return 1 if $a[$i] =~ /$$b[$j]/i;
        }
    }
    return 0;
    }],
    "refs" => [0, sub {
    my ($e,$r) = @_;
    for (my $i = 0; $i <= $#$r; $i++) {
        return 1 if ($$e->key =~ /^$$r[$i]$/);
    }
    return 0;
    }],
    "titles" => [0, sub {
    my ($e,$t) = @_;
    for (my $i = 0; $i <= $#$t; $i++) {
        return 1 if ($$e->get('title') =~ /$$t[$i]/i);
    }
    return 0;
    }]
    );

# Default display routine
$output = sub {
    my ($e) = @_;
    if ($append) {
    $$e->set_key($append . $$e->key);
    }
    $$e->print();
};

GetOptions (
# List of things to look for in the author/editor entries
    "a|authors=s@" => sub {
    my ($a, $b) = @_;
    push @{$values{"authors"}}, split(",",$b);
    $tests{"authors"}[0] = 1;
    },
# List of things to look for in the reference entries, array?
    "e|refs=s@" =>  sub {
    my ($a, $b) = @_;
    push @{$values{"refs"}}, split(",",$b);
    $tests{"refs"}[0] = 1;
    },
# Whether to print the full entry or a basic summary
    "r|reduced" => sub {
    $output = sub {
        my ($e) = @_;
        print $append if ($append);
        print $$e->key . "\n";
        for (my $i = 0; $i <= $#shrt; $i++) {
        $$e->exists("$shrt[$i]") and print $$e->get("$shrt[$i]") and print "\n";
        }
    }
    },
# Whether to produce any output, or just an exit code
    "s|silent" => sub {
    $output = sub {
        exit 1;
    }
    },
# List of things to look for in the title entries
    "t|title=s@" => sub {
    my ($a, $b) = @_;
    push @{$values{"titles"}}, split(",",$b);
    $tests{"titles"}[0] = 1;
    },
# Modify references by appending this string
    "u|append=s" => \$append,
# Try to display an appropriate file
    "v|view" => sub {
    $output = sub {
        my ($e) = @_;
        my ($mi,$ex);
# if the key has three digits, then it's highly likely to be an arXiv ref
        my $u = ($entry->key =~ /\d\d\d/ ? "" : chr(95));
        my @p = glob("$docbase/*/" . $entry->key . "$u*");
        if (!@p) {
        print "Couldn't find a file for: " . $entry->key . "\n";
        }
        for (my $i = 0; $i <= $#p; $i++) {
        $mi = guess_media_type($p[$i]);
        if (exists($mime{$mi})) {
            ($ex = $mime{$mi}) =~ s/%s/$p[$i]/;
            if (!fork) {
            exec $ex
                or die "Couldn't execute $ex";
            }

        } else {
            print "Couldn't find a program to view: " . $entry->key . " file: $p[$i] mimetype: $mi\n";
        }
        }
    }
    },
# Try to locate an appropriate file
    "x|locate" => sub {
    $output = sub {
        my ($e) = @_;
# if the key has three digits, then it's highly likely to be an arXiv ref
        my $u = ($entry->key =~ /\d\d\d/ ? "" : chr(95));
        my @p = glob("$docbase/*/" . $entry->key . "$u*");
        if (!@p) {
        print "Couldn't find a file for: " . $entry->key . "\n";
        }
        for (my $i = 0; $i <= $#p; $i++) {
        print $p[$i];
        }
    }
    },
    "h|?|help" => \$help,
    "m|man" => \$man,
    "gpl" => \$gpl
    ) or pod2usage(2);

pod2usage(-exitval => 1, -verbose => 0) if $help;
pod2usage(-verbose => 2 ) if $man;
exec 'perldoc perlgpl' if $gpl;

if (!-e $conffile) {
    print STDERR "$conffile does not exist.\n";
    exit 1;
}

open (CONF, $conffile)
    or die "Couldn't open $conffile for reading.\n";

while (<CONF>) {
# Set configuration variables; currently only bibbase and docbase
    if (/=/) {
    /(\w*)\s*=\s*(.*?)\s*$/;
    my $s = "\$$1 = \"$2\"";
    eval($s);
    $@ and die $@;
    next;
    }

# Stuff of the form 'name.bib' is a bibtex file to parse
    if (/(\w*\.bib)/) {
    push @reffiles, $1;
    next;
    }

# Stuff of the form 'mimetype; application' tells us how to view
# certain files

    if (/^\s*([a-z]*\/[a-z-]*);\s*(.*)/) {
    $mime{$1} = $2;
    }
}

die "$bibbase is not a directory.\n" unless -d $bibbase;
die "$docbase is not a directory.\n" unless -d $docbase;


# Loop over the bibtex reference files looking for the entries

for (my $i = 0; $i <= $#reffiles; $i++) {
    if (! -f "$bibbase/$reffiles[$i]" ) {
    print STDERR "$reffiles[$i] is not a regular file.\n";
    next;
    }

    $bibfile = new Text::BibTeX::File "$bibbase/$reffiles[$i]";

  ENTRY: while ($entry = new Text::BibTeX::Entry $bibfile) {
      if (!$entry->parse_ok) {
      print STDERR "Error in input, skipping entry\n";
      next;
      }

      next unless ($entry->metatype eq BTE_REGULAR);

      foreach my $t (keys %tests) {
      next unless $tests{$t}[0];
      if (&{$tests{$t}[1]}(\$entry,$values{$t})) {
          &$output(\$entry);
          next ENTRY;
      }
      }

  }

    $bibfile->close;
}

配置文件示例：

bibbase = /home/astacey/texmf/bibtex/bib
docbase = /home/astacey/docs/MathsPapers
hostname = front.math.ucdavis.edu
reffile = /home/astacey/texmf/bibtex/bib/arxiv.bib
lynx = /bin/lynx
arxivdir = /home/astacey/docs/MathsPapers/arXiv/
ext = .pdf


arxiv arxiv.bib
article articles.bib
book books.bib
other misc.bib

application/pdf; /usr/bin/xpdf %s
application/postscript; /usr/bin/gv %s
application/x-dvi; /usr/bin/xdvi %s

book          -> book
article       -> article
booklet       -> book
conference    -> book
inbook        -> article
incollection  -> article
inproceedings -> article
manual        -> book
mastersthesis -> other
misc          -> other
phdthesis     -> other
proceedings   -> book
techreport    -> article
unpublished   -> other

Answer

（这是一个答案）

在我学会如何正确管理 BibTeX 文件之前，我曾经用过这个脚本.bib！它唯一没有做的就是从.aux文件中提取参考文献。但这并不难做到。这个脚本的绝大部分内容是设置配置。它的实际核心是在最后一个循环中，它浏览文件.bib并查找适当的条目。这个循环之所以这么短，是因为其他人已经编写了一个Text::BibTeXperl 模块，其余的只是装饰而已。如果这是你想要的东西，那么将它调整为 (a) 搜索文件.aux以查找参考文献和 (b) 不必理会我用它来做的所有其他事情（例如自动从 arXiv 下载参考文献）将非常容易。

#! /usr/bin/perl -w

use strict;
#use Getopt::Long qw(:config auto_help bundling);
use Getopt::Long qw(:config bundling);
use Pod::Usage;
use Text::BibTeX;
use LWP::MediaTypes qw(guess_media_type);

# This is the only variables that should need customising

my $conffile = "$ENV{HOME}/.refsrc";

###
# Nothing below here should need customisation
###

my (
    $bibbase,
    $hostname,
    $docbase,
    $reffile,
    $lynx,
    $arxivdir,
    $ext,
    $authors,
    $refs,
    $reduced,
    $silent,
    $titles,
    $append,
    $view,
    $show,
    $help,
    $man,
    $gpl,
    $bibfile,
    $entry,
    $output
    );

my @reffiles;
my %mime;
my %values;
my @shrt;
my %tests;

@shrt = ("title","author");

%tests = (
    "authors" => [0, sub {
    my ($e,$b) = @_;
    my @a;
    if ($$e->exists('author')) {
        @a = $$e->split('author');
    } elsif ($$e->exists('editor')) {
        @a = $$e->split('editor');
    } else {
        return 0;
    }
    for (my $j = 0; $j <= $#$b; $j++) {
        for (my $i = 0; $i <= $#a; $i++) {
        return 1 if $a[$i] =~ /$$b[$j]/i;
        }
    }
    return 0;
    }],
    "refs" => [0, sub {
    my ($e,$r) = @_;
    for (my $i = 0; $i <= $#$r; $i++) {
        return 1 if ($$e->key =~ /^$$r[$i]$/);
    }
    return 0;
    }],
    "titles" => [0, sub {
    my ($e,$t) = @_;
    for (my $i = 0; $i <= $#$t; $i++) {
        return 1 if ($$e->get('title') =~ /$$t[$i]/i);
    }
    return 0;
    }]
    );

# Default display routine
$output = sub {
    my ($e) = @_;
    if ($append) {
    $$e->set_key($append . $$e->key);
    }
    $$e->print();
};

GetOptions (
# List of things to look for in the author/editor entries
    "a|authors=s@" => sub {
    my ($a, $b) = @_;
    push @{$values{"authors"}}, split(",",$b);
    $tests{"authors"}[0] = 1;
    },
# List of things to look for in the reference entries, array?
    "e|refs=s@" =>  sub {
    my ($a, $b) = @_;
    push @{$values{"refs"}}, split(",",$b);
    $tests{"refs"}[0] = 1;
    },
# Whether to print the full entry or a basic summary
    "r|reduced" => sub {
    $output = sub {
        my ($e) = @_;
        print $append if ($append);
        print $$e->key . "\n";
        for (my $i = 0; $i <= $#shrt; $i++) {
        $$e->exists("$shrt[$i]") and print $$e->get("$shrt[$i]") and print "\n";
        }
    }
    },
# Whether to produce any output, or just an exit code
    "s|silent" => sub {
    $output = sub {
        exit 1;
    }
    },
# List of things to look for in the title entries
    "t|title=s@" => sub {
    my ($a, $b) = @_;
    push @{$values{"titles"}}, split(",",$b);
    $tests{"titles"}[0] = 1;
    },
# Modify references by appending this string
    "u|append=s" => \$append,
# Try to display an appropriate file
    "v|view" => sub {
    $output = sub {
        my ($e) = @_;
        my ($mi,$ex);
# if the key has three digits, then it's highly likely to be an arXiv ref
        my $u = ($entry->key =~ /\d\d\d/ ? "" : chr(95));
        my @p = glob("$docbase/*/" . $entry->key . "$u*");
        if (!@p) {
        print "Couldn't find a file for: " . $entry->key . "\n";
        }
        for (my $i = 0; $i <= $#p; $i++) {
        $mi = guess_media_type($p[$i]);
        if (exists($mime{$mi})) {
            ($ex = $mime{$mi}) =~ s/%s/$p[$i]/;
            if (!fork) {
            exec $ex
                or die "Couldn't execute $ex";
            }

        } else {
            print "Couldn't find a program to view: " . $entry->key . " file: $p[$i] mimetype: $mi\n";
        }
        }
    }
    },
# Try to locate an appropriate file
    "x|locate" => sub {
    $output = sub {
        my ($e) = @_;
# if the key has three digits, then it's highly likely to be an arXiv ref
        my $u = ($entry->key =~ /\d\d\d/ ? "" : chr(95));
        my @p = glob("$docbase/*/" . $entry->key . "$u*");
        if (!@p) {
        print "Couldn't find a file for: " . $entry->key . "\n";
        }
        for (my $i = 0; $i <= $#p; $i++) {
        print $p[$i];
        }
    }
    },
    "h|?|help" => \$help,
    "m|man" => \$man,
    "gpl" => \$gpl
    ) or pod2usage(2);

pod2usage(-exitval => 1, -verbose => 0) if $help;
pod2usage(-verbose => 2 ) if $man;
exec 'perldoc perlgpl' if $gpl;

if (!-e $conffile) {
    print STDERR "$conffile does not exist.\n";
    exit 1;
}

open (CONF, $conffile)
    or die "Couldn't open $conffile for reading.\n";

while (<CONF>) {
# Set configuration variables; currently only bibbase and docbase
    if (/=/) {
    /(\w*)\s*=\s*(.*?)\s*$/;
    my $s = "\$$1 = \"$2\"";
    eval($s);
    $@ and die $@;
    next;
    }

# Stuff of the form 'name.bib' is a bibtex file to parse
    if (/(\w*\.bib)/) {
    push @reffiles, $1;
    next;
    }

# Stuff of the form 'mimetype; application' tells us how to view
# certain files

    if (/^\s*([a-z]*\/[a-z-]*);\s*(.*)/) {
    $mime{$1} = $2;
    }
}

die "$bibbase is not a directory.\n" unless -d $bibbase;
die "$docbase is not a directory.\n" unless -d $docbase;


# Loop over the bibtex reference files looking for the entries

for (my $i = 0; $i <= $#reffiles; $i++) {
    if (! -f "$bibbase/$reffiles[$i]" ) {
    print STDERR "$reffiles[$i] is not a regular file.\n";
    next;
    }

    $bibfile = new Text::BibTeX::File "$bibbase/$reffiles[$i]";

  ENTRY: while ($entry = new Text::BibTeX::Entry $bibfile) {
      if (!$entry->parse_ok) {
      print STDERR "Error in input, skipping entry\n";
      next;
      }

      next unless ($entry->metatype eq BTE_REGULAR);

      foreach my $t (keys %tests) {
      next unless $tests{$t}[0];
      if (&{$tests{$t}[1]}(\$entry,$values{$t})) {
          &$output(\$entry);
          next ENTRY;
      }
      }

  }

    $bibfile->close;
}

配置文件示例：

bibbase = /home/astacey/texmf/bibtex/bib
docbase = /home/astacey/docs/MathsPapers
hostname = front.math.ucdavis.edu
reffile = /home/astacey/texmf/bibtex/bib/arxiv.bib
lynx = /bin/lynx
arxivdir = /home/astacey/docs/MathsPapers/arXiv/
ext = .pdf


arxiv arxiv.bib
article articles.bib
book books.bib
other misc.bib

application/pdf; /usr/bin/xpdf %s
application/postscript; /usr/bin/gv %s
application/x-dvi; /usr/bin/xdvi %s

book          -> book
article       -> article
booklet       -> book
conference    -> book
inbook        -> article
incollection  -> article
inproceedings -> article
manual        -> book
mastersthesis -> other
misc          -> other
phdthesis     -> other
proceedings   -> book
techreport    -> article
unpublished   -> other

Question 2

.aux您可以在 UNIX shell 中执行此操作，对和文件的结构有一些通常正确的假设.bib，使用 most awk。这不如维护程序那么强大，但如果您具有 shell 编程技能，它会更灵活，并且对于几百个 Bibtex 项目来说，工作量可能比评估复杂的软件要少。首先，您可以.aux使用 shell 函数从文件中获取引用的键，该函数接受多个文件的文件名.aux作为参数：

getbibkeys () { awk -F{ '$1=="\\bibcite" { print substr($2,0,length($2)-1)}' "$@"; }

然后，您可以从 bibfiles 中获取，第一个参数是 Bibtex 键，以下参数是.bib要查找的文件：

fetchbibitem () { 
    key="$1"; shift; 
    awk -v key="$key" 'BEGIN {RS="@"} $1~".*{" key "," {print "@" $0}' "$@";
}

使用上述简单的读取循环即可提供所需的功能：

getbibkeys $AUXFILE | while read -r key; do fetchbibitem "$key" $BIBFILE; done

（我已经测试了两个 shell 函数，但没有测试 read-while 循环）。

要明确的是：即使 Bibtex 接受了这些，它们也会破坏一些非常规的 .bib 文件。

Answer

.aux您可以在 UNIX shell 中执行此操作，对和文件的结构有一些通常正确的假设.bib，使用 most awk。这不如维护程序那么强大，但如果您具有 shell 编程技能，它会更灵活，并且对于几百个 Bibtex 项目来说，工作量可能比评估复杂的软件要少。首先，您可以.aux使用 shell 函数从文件中获取引用的键，该函数接受多个文件的文件名.aux作为参数：

getbibkeys () { awk -F{ '$1=="\\bibcite" { print substr($2,0,length($2)-1)}' "$@"; }

然后，您可以从 bibfiles 中获取，第一个参数是 Bibtex 键，以下参数是.bib要查找的文件：

fetchbibitem () { 
    key="$1"; shift; 
    awk -v key="$key" 'BEGIN {RS="@"} $1~".*{" key "," {print "@" $0}' "$@";
}

使用上述简单的读取循环即可提供所需的功能：

getbibkeys $AUXFILE | while read -r key; do fetchbibitem "$key" $BIBFILE; done

（我已经测试了两个 shell 函数，但没有测试 read-while 循环）。

要明确的是：即使 Bibtex 接受了这些，它们也会破坏一些非常规的 .bib 文件。

Question 3

（这不是一个答案：这是一个扩展的评论，但是评论框太过限制。）

我可能有点愚钝，但我不明白你试图做什么。让我描述一下我看到的情况，这样你就可以说出这张图片中的问题所在。

有一个中心.bib文件包含您（或您的合著者）曾经想到的几乎所有参考文献。
您（和您的各位合著者）拥有多种引用该中心.bib文件中内容的文档。
您需要一个程序在.aux文件中查找参考列表，然后转到该.bib文件，提取相关条目，然后将它们放入本地.bib文件中，以备插入到文档中。

是对的吗？

如果是的话，让我把上面的两个字母改一下。让我把最后的 .bib至.bbl。因此现在内容为：

有一个中心.bib文件包含您（或您的合著者）曾经想到的几乎所有参考文献。
您（和您的各位合著者）拥有多种引用该中心.bib文件中内容的文档。
您需要一个程序在.aux文件中查找参考列表，然后转到该.bib文件，提取相关条目，然后将它们放入本地.bbl文件中，以备插入到文档中。

如果我错了请纠正我，但这不是确切地BibTeX 起什么作用？

拥有本地.bib文件有什么好处？我唯一能看到的是，它允许你更改参考书目样式，而无需再次访问中央存储库。但你多久做一次？是否经常这样做，以至于重新创建文件.bbl如此耗时？当然，当你将论文提交给期刊时，你会这样做一次（或者如果你像我一样，一次每个将其提交给期刊的时间）。

如果您希望能够对本地文件进行更改，那么我认为使用文件.bbl比使用文件更不容易.bib；并且在这两种情况下，您都会遇到更改在文件重新创建后仍然存在的问题，因此我认为这不是问题所在。

Answer