将 Latex 输出为单个单词和图形

Question 1

optional以下是使用包和辅助 bash 脚本执行此操作的方法。

您的文档看起来将如下所示：

\documentclass{article}

\usepackage{optional}

\begin{document}

Content in the document scope will be included in every version. 

\opt{opta}%
{
This is some content selected by opta.
}

\opt{optb}%
{%
This is some content selected by optb.
}

Some more content in the global scope. 

\opt{opta}%
{
More content selected by opta.
}

\opt{optb}%
{
More content selected by optb.
}

More content in the global scope. 

\end{document}

我们假设您将其保存为myfile.tex。

现在，将以下内容保存为helper.sh：

#!/bin/bash
inputfile=${1%.tex}

# find all \opt declarations, remove duplicates
optional=$(grep -P '\\opt\{\w+\}+' $inputfile.tex | awk -F '[{}]' '{print $(NF-1)}'  | xargs -n1 | sort -u | xargs )

# compile and rename output separately for each optional declaration
for opt in $optional; do
    echo "processing part $opt"
    pdflatex -interaction=nonstopmode "\def\UseOption{$opt}\input{$inputfile}" > /dev/null
    mv $inputfile.pdf $opt.pdf
done

使用以下命令使脚本可执行

chmod +x helper.sh

并使用

./helper.sh myfile.tex

然后您将得到单独的 pdf 文件opta.pdf和optb.pdf相应的内容。

请注意，由于我编写 bash 脚本的方式，您必须使用上面的形式

\opt{opta}%
{
This is the content selected by opta.
}

而不是

\opt{opta}{
This is the content selected by opta.
}

如果你确实将第二个左括号放在同一行，那么模式匹配将不起作用。也许有人比我更懂脚本，可以解决这个问题。

Answer

optional以下是使用包和辅助 bash 脚本执行此操作的方法。

您的文档看起来将如下所示：

\documentclass{article}

\usepackage{optional}

\begin{document}

Content in the document scope will be included in every version. 

\opt{opta}%
{
This is some content selected by opta.
}

\opt{optb}%
{%
This is some content selected by optb.
}

Some more content in the global scope. 

\opt{opta}%
{
More content selected by opta.
}

\opt{optb}%
{
More content selected by optb.
}

More content in the global scope. 

\end{document}

我们假设您将其保存为myfile.tex。

现在，将以下内容保存为helper.sh：

#!/bin/bash
inputfile=${1%.tex}

# find all \opt declarations, remove duplicates
optional=$(grep -P '\\opt\{\w+\}+' $inputfile.tex | awk -F '[{}]' '{print $(NF-1)}'  | xargs -n1 | sort -u | xargs )

# compile and rename output separately for each optional declaration
for opt in $optional; do
    echo "processing part $opt"
    pdflatex -interaction=nonstopmode "\def\UseOption{$opt}\input{$inputfile}" > /dev/null
    mv $inputfile.pdf $opt.pdf
done

使用以下命令使脚本可执行

chmod +x helper.sh

并使用

./helper.sh myfile.tex

然后您将得到单独的 pdf 文件opta.pdf和optb.pdf相应的内容。

请注意，由于我编写 bash 脚本的方式，您必须使用上面的形式

\opt{opta}%
{
This is the content selected by opta.
}

而不是

\opt{opta}{
This is the content selected by opta.
}

如果你确实将第二个左括号放在同一行，那么模式匹配将不起作用。也许有人比我更懂脚本，可以解决这个问题。

Question 2

这是对您对我的第一个回答的评论的回应。

这是一个可怕的黑客攻击，你可能会觉得有用——或者没用。它仅有的处理提取单个单词；如果您发现从同一文档中自动提取图形很重要，那么您必须以某种方式扩展它。

我们使用布尔开关来生成常规文档（\setboolean{mangle}{false}）或对其进行处理，使得每个单词都出现在单独的小页面上（\setboolean{mangle}{true}）。

我们还滥用\makeindex将每个单词的索引条目写入文件。

\documentclass{article}
\usepackage{geometry, ifthen, xparse}
\makeindex

\newboolean{mangle}
\setboolean{mangle}{true}% false for regular output

\ifthenelse{\boolean{mangle}}{\geometry{papersize={3in,1in},margin=0.2in}}

\newcommand{\doword}[1]{%
\ifthenelse{\boolean{mangle}}%
    {\clearpage\fbox{#1\strut}\index{#1}\clearpage}%
    { #1}%
}

\ExplSyntaxOn
\NewDocumentCommand {\dowords}
  { > { \SplitList { ~ } } m }
  { \tl_map_inline:nn {#1} { \doword{##1} }}
\ExplSyntaxOff

\begin{document}

\dowords{A stretch of words typed into the document}. 

\dowords{And some more stuff}.

\dowords{Note that we keep the punctuation separate},\dowords{because otherwise we will end up in trouble}.

\dowords{We can also put some math here $\frac{1}{2}$ and see what happens}.

\end{document}

如果您在 mangle 模式下运行此文件，您将得到一个 PDF，其中每页只有一个单词，位于 fbox 内并带有一个 strut，这样“sauce”、“egg”和“leg”之间的基线和图像高度将保持一致。您还将得到一个如下所示的 .idx 文件：

\indexentry{put}{35}
\indexentry{some}{36}
\indexentry{math}{37}
\indexentry{here}{38}
\indexentry{${\begingroup 1\endgroup \over 2}$}{39}
\indexentry{and}{40}

后期处理的思路如下：

将 pdf 文件批量转换为位图。Imagemagick 可以从多页 pdf 文件的页面生成编号的 .png 文件。它还可以修剪每个单词周围的空格和 fbox。
使用脚本（Python/Bash/其他）从 .idx 文件中提取排版的单词及其对应的页码。检索该页码的 .png 文件并以适当的单词名称保存。显然，您必须在文件命名方面发挥一点创造力，其中输入的内容类似于 ${\begingroup 1\endgroup \over 2}$ 。
你可能还想以某种方式处理多余的单词。最好的方法可能是使用 Luatex 并存储每个新单词并仅输出一次。

这是我能想到的最好的方法，但总的来说，我认为它仍然很糟糕 :-/ 我会尝试想出一些更直接的方法，而不使用 LaTeX 在网页上移动我的单个单词。

Answer