限制

限制

我需要制作一份列表,列出我的 LaTeX 文档中强调的所有单词(= 用 括起来的单词\emph{...})。

我如何使用awk来处理.tex文件并提取所有以 开头\emph{和结尾的模式},并且其中不包含右花括号(否则几乎整个文档都会与该模式匹配)。最后,搜索结果应该是这样的列表:

  1. 第一个强调的词
  2. 另一个强调的词
  3. 最后强调的一句话

(如果列表已经被格式化为编号的 LaTeX 列表就好了)当我想用于awk这项工作时,正则表达式匹配的正确输入是什么?

答案1

如果您不介意使用外部脚本来执行此操作,您可以通过重新定义\emph{}宏来执行 LaTeX 操作:

在此处输入图片描述

笔记:

  • 注意嵌套使用是如何\emph{}处理的(列表中的第 4 项和第 5 项)。这可能是也可能不是所需的行为,但根据所需的结果,应该很容易更改。

参考:

代码:

\documentclass{article}
\usepackage{letltxmacro}

% https://tex.stackexchange.com/questions/14393/how-keep-a-running-list-of-strings-and-then-process-them-one-at-a-time
\def\ListOfEmphText{}
\makeatletter
\newcommand{\AddToListOfEmphText}[1]{%
    \g@addto@macro\ListOfEmphText{\item {#1}}%
}
\makeatother

\LetLtxMacro\OldEmph\emph
\DeclareRobustCommand{\emph}[1]{\AddToListOfEmphText{#1}\OldEmph{#1}}

\begin{document}
\emph{First emphasized Text,  with a comma} followed by
some normal text.
\emph{Second emphasized Text} followed by
some more normal text.

Some normal text before
\emph{Third emphasized Text} also followed by
some normal text

Here is some nested usage: \emph{abc \emph{def} ghi}, just to test that it works.

\bigskip\noindent
The macro \verb|\emph{}| was used on:
\begin{enumerate}
    \ListOfEmphText
\end{enumerate}

\end{document}

答案2

用这个做awk会比较棘手,因为(据我所知)它很难处理非贪婪正则表达式。请注意,我们需要它们是非贪婪的,因为有以下几行

\emph{multiple}  occurrences on the \emph{same line}

Perl可以很容易地处理这个问题;你可以运行以下脚本

perl emphasize.pl myfile.tex

它将输出到终端。当然,你可以将其输出到另一个文件中,例如

perl emphasize.pl myfile.tex> emphasizedtext.tex

这是一个测试用例:

我的文件.tex

\documentclass{article}
\begin{document}

\emph{first} word is \emph{emphasized} and here's another \emph{one}.

here's another \emph{word} that is \emph{also} emphacized
\begin{enumerate}
  \item \emph{should} also work
  \item anywhere \emph{else too with multiple words}
\end{enumerate}
will struggle with nested: \emph{here is a \emph{nest}}
\end{document}

强调.pl

#!/usr/bin/perl

use strict;
use warnings;

# for emphasized content
my @emphcontent=();
my $emphphrase='';
my $counter=0;

# loop through the lines in the INPUT file
while(<>)
{
    # check for 
    #   \emph{}...
    # and make sure not to match
    #   %\emph{}...
    # which is commented
    if($_ =~ m/\\emph{(.*?)}/ and $_ !~ m/^%/)
    {
      # store the emphasized content
      while($_ =~ m/\\emph{(.*?)}/g){
        push(@emphcontent,$1);
      }
    }
}

# output the emphasized content
foreach $emphphrase (@emphcontent)
{
    $counter++;
    print $counter,". ",$emphphrase,"\n";
}

exit

输出

1. first
2. emphasized
3. one
4. word
5. also
6. should
7. else too with multiple words
8. here is a \emph{nest

或者你可能更喜欢

# start the enumerate environment
print "\\begin{enumerate}\n";

# output the emphasized content
foreach $emphphrase (@emphcontent)
{
    print "\\item ",$emphphrase,"\n";
}
# end the enumerate environment
print "\\end{enumerate}\n";

这使

\begin{enumerate}
\item first
\item emphasized
\item one
\item word
\item also
\item should
\item else too with multiple words
\item here is a \emph{nest
\end{enumerate}

限制

正如您所看到的,该脚本因嵌套而失败{},我不知道如何在不使用一些非常复杂的代码的情况下修复它。

答案3

这是一个使用模式匹配的 LuaLaTeX 解决方案。它可以轻松处理嵌套(如果需要,可以通过删除一行来忽略嵌套)。此外,它可以根据需要打开和关闭。缺点是,由于它使用回调process_input_buffer\emph因此也包括注释。我不确定这对你来说会有多大的问题。

  • \startgetemph打开emph收集。
  • \stopgetemph关闭emph收集。
  • \printemphs返回 的编号列表emphs

输出


在此处输入图片描述


\documentclass{article}
\usepackage{luacode,luatexbase}
\begin{luacode*}

emphs = {}
getemph = function(head)
    for hit in string.gmatch(head,"\emph(%b{})") do
        emphs[#emphs + 1] = hit
        -- if you don't want nesting, then delete the following line
        getemph(hit)
    end
    return true
end

printemphs = function()
    tex.print("\\begin{enumerate}")
    for i = 1, #emphs do
        tex.print("\\item "..emphs[i])
    end
    tex.print("\\end{enumerate}")
end

\end{luacode*}

\def\startgetemph{\directlua{luatexbase.add_to_callback("process_input_buffer", getemph, "getemph")}}
\def\stopgetemph{\directlua{luatexbase.remove_from_callback("process_input_buffer","getemph")}}
\def\printemphs{\directlua{printemphs()}}

\begin{document}

\startgetemph

\noindent
Here is some text where some of it is \emph{emphasized}.  Other parts will be \emph{emphasized \emph{twice}}.
%\emph{Oops, this is commented emphasized text!}
\stopgetemph
After using \verb=\stopgetemph=, \emph{this final emphasized text} will not be grabbed.

\printemphs
\end{document}

答案4

在 Textwrangler (mac) 中,搜索/替换 单击\\emph\{[a-z ]+\} “查找全部”,然后可以将结果复制/粘贴到新文件中。对所有换行符进行新的搜索/替换,插入\\item。非常低端。

相关内容