转换参考书目以与 Latex 一起使用

转换参考书目以与 Latex 一起使用

我收到了一份很长的 Word 文档,我必须将其移植到 Latex。在文档中,所有引文均以经典形式显示,并附有作者和年份。就像是

Lorem ipsum dolor (Sit, 1998) amet, consectetur adipiscing (Slit 2000, Sed and So 2002, Eiusmod et al. 1976).
Tempor incididunt ut labore et dolore magna aliqua (Ut et al. 1312)

该参考文献需要获取正确的关键参考文献,因为它出现在参考书目参考文献列表中。换句话说,文本应该翻译为

Lorem ipsum dolor \cite{sit1998} amet, consectetur adipiscing \cite{slit2000,sed2002,eiusmod1976}.
Tempor incididunt ut labore et dolore magna aliqua \cite{ut1312}

这意味着:

  • 提取由括号内的名称和年份组成的所有字符串
  • 去掉该字符串中的空格、第二个名字(名字后面的所有内容)和大写字母
  • 使用结果字符串形成新的 \cite{string}

我知道这可能是一项相当复杂的任务。我想知道也许有人已经为这个特定任务编写了脚本。或者,也欢迎任何部分建议。我目前在 MacOS 上工作。

答案1

下面的awk程序应该可以工作。它查找( ... )每一行中的元素,并检查它们是否符合“作者,年份”或“作者 1 年 1,作者 2 年 2,...”模式。如果是,它会创建一个引用命令并替换该( ... )组;否则该组将保持原样。

#!/usr/bin/awk -f


# This small function creates an 'authorYYYY'-style string from
# separate author and year fields. We split the "author" field
# additionally at each space in order to strip leading/trailing
# whitespace and further authors.
function contract(author, year)
{
    split(author,auth_fields," ");
    auth=tolower(auth_fields[1]);
    return sprintf("%s%4d",auth,year);
}



# This function checks if two strings correspond to "author name(s)" and
# "year", respectively.
function check_entry(string1, string2)
{
    if (string1 ~ /^ *([[:alpha:].-]+ *)+$/ && string2 ~ /^ *[[:digit:]]{4} *$/) return 1;
    return 0;
}




# This function creates a 'citation' command from a raw element. If the
# raw element does not conform to the reference syntax of 'author, year' or
# 'author1 year1,author2 year2, ...', we should leave it "as is", and return
# a "0" as indicator.
function create_cite(raw_elem)
{
    cite_argument=""

    # Split at ','. The single elements are either name(list) and year,
    # or space-separated name(list)-year statements.
    n_fields=split(raw_elem,sgl_elem,",");

    if (n_fields == 2 && check_entry(sgl_elem[1],sgl_elem[2]))
    {
        cite_argument=contract(sgl_elem[1],sgl_elem[2]);
    }
    else
    {
        for (k=1; k<=n_fields; k++)
        {
            n_subfield=split(sgl_elem[k],subfield," ");

            if (check_entry(subfield[1],subfield[n_subfield]))
            {
                new_elem=contract(subfield[1],subfield[n_subfield]);
                if (cite_argument == "")
                {
                    cite_argument=new_elem;
                }
                else
                {
                    cite_argument=sprintf("%s,%s",cite_argument,new_elem);
                }
            }
            else
            {
                return 0;
            }
        }
    }


    cite=sprintf("\\{%s}",cite_argument);
    return cite;
}




# Actual program
# For each line, create a "working copy" so we can replace '(...)' pairs
# already processed with different text (here: 'X ... Y'); otherwise 'sub'
# would always stumble across the same opening parentheses.
# For each '( ... )' found, check if it fits the pattern. If so, we replace
# it with a 'cite' command; otherwise we leave it as it is.

{
    working_copy=$0;

    # Allow for unmatched ')' at the beginning of the line:
    # if a ')' was found before the first '(', mark is as processed
    i=index(working_copy,"(");
    j=index(working_copy,")");
    if (i>0 && j>0 && j<i) sub(/\)/,"Y",working_copy);

    while (i=index(working_copy,"("))
    {
        sub(/\(/,"X",working_copy); # mark this '(' as "already processed

        j=index(working_copy,")");
        if (!j)
        {
            continue;
        }
        sub(/\)/,"Y",working_copy); # mark this ')', too


        elem=substr(working_copy,i+1,j-i-1);

        replacement=create_cite(elem);
        if (replacement != "0")
        {
            elem="\\(" elem "\\)"
            sub(elem,replacement);
        }

    }
    print $0;
}

调用该程序

~$ awk -f transform_citation.awk input.tex

笔记程序期望输入的格式“合理”,即一行上的所有括号都应该成对匹配(尽管允许在行开头使用一个右括号,并且将忽略不匹配的左括号)。

另请注意上面的一些语法需要 GNU awk。要移植到其他实现,请替换

if (string1 ~ /^ *([[:alpha:].-]+ *)+$/ && string2 ~ /^ *[[:digit:]]{4} *$/) return 1;

if (string1 ~ /^ *([a-zA-Z.-]+ *)+$/ && string2 ~ /^ *[0123456789][0123456789][0123456789][0123456789] *$/) return 1;

并确保您已将排序规则区域设置设置为C

相关内容