是否有可能在 LaTeX 中自动枚举句子？

Question 1

正如其他人指出的那样，完全自动执行此操作可能非常困难。如果您想使用\label-\ref机制，则无论如何都必须插入标签。让我们选择一些通常不用于输入的字符，例如竖线|。十分钟的黑客攻击，我们最终得到：

\documentclass{article}

\newcounter{sentence}
\newcounter{para}

\makeatletter
\@addtoreset{sentence}{para}
\@addtoreset{para}{section}

\catcode`\|=\active
\def|{\@ifnextchar[%] to keep my editor happy
  \start@label\start@nolabel}

\def\start@label[#1]{\ifvmode \start@para@label[#1]\else \start@sent@label[#1]\fi}
\def\start@nolabel{\ifvmode \start@para@nolabel\else \start@sent@nolabel\fi}

\def\start@para@label[#1]{%
  \refstepcounter{para}%
  \label{#1}\leavevmode}

\def\start@sent@label[#1]{%
  \refstepcounter{sentence}%
  \label{#1}%
  \thesentence~}

\def\start@para@nolabel{%
  \stepcounter{para}\leavevmode}

\def\start@sent@nolabel{%
  \stepcounter{sentence}%
  \thesentence~}

\makeatother

\renewcommand{\thepara}{\thesection.\arabic{para}}
\renewcommand{\thesentence}{\thepara.\arabic{sentence}}

\begin{document}

\parindent=0pt
\parskip=1em

||These rules must be followed. |The end of a paragraph is indicated
as usual with a blank line.

||[parstart]A new paragraph must start with a vertical
bar. |[sentstart]Each sentence must also start with a vertical
bar. |It follows from~\ref{parstart} and~\ref{sentstart} that a new
paragraph actually starts with two vertical bars.

Without vertical bars, nothing special happens. This might be useful
to comment on the formal rules above or below.

|[parref]|Each vertical bar takes an optional argument. |If
given, it is used as a label. |[sentref]For example, this is
sentence~\ref{sentref} of paragraph~\ref{parref}.

\end{document}

我使用在之后我们处于垂直模式这一事实\par来区分的两种用法|。但随后我们需要明确地\leavevmode，因为段落的开头本身并没有插入材料，导致我们切换到水平模式。如果您希望能够引用整个段落和单个句子，则需要||在每个段落的开头使用两个（它们可能都有一个可选参数）。如果您永远不需要引用整个段落，则很容易更改语法，以便|在段落开头只需要一个；事实上，它会简单得多，因为|可以作为带有可选参数的单个宏来实现（当处于时\ifvmode，它会在执行其他操作之前先执行段落计数器）。

添加我避免使用，\everypar因为如果发生很多其他事情，它就不太可靠。但是，在环境中包装内容可能会允许使用\everypar并提供更简单的语法。最大的问题实际上是允许使用标签；我们必须告诉 LaTeX 何时以及如何查找标签。

Answer

正如其他人指出的那样，完全自动执行此操作可能非常困难。如果您想使用\label-\ref机制，则无论如何都必须插入标签。让我们选择一些通常不用于输入的字符，例如竖线|。十分钟的黑客攻击，我们最终得到：

\documentclass{article}

\newcounter{sentence}
\newcounter{para}

\makeatletter
\@addtoreset{sentence}{para}
\@addtoreset{para}{section}

\catcode`\|=\active
\def|{\@ifnextchar[%] to keep my editor happy
  \start@label\start@nolabel}

\def\start@label[#1]{\ifvmode \start@para@label[#1]\else \start@sent@label[#1]\fi}
\def\start@nolabel{\ifvmode \start@para@nolabel\else \start@sent@nolabel\fi}

\def\start@para@label[#1]{%
  \refstepcounter{para}%
  \label{#1}\leavevmode}

\def\start@sent@label[#1]{%
  \refstepcounter{sentence}%
  \label{#1}%
  \thesentence~}

\def\start@para@nolabel{%
  \stepcounter{para}\leavevmode}

\def\start@sent@nolabel{%
  \stepcounter{sentence}%
  \thesentence~}

\makeatother

\renewcommand{\thepara}{\thesection.\arabic{para}}
\renewcommand{\thesentence}{\thepara.\arabic{sentence}}

\begin{document}

\parindent=0pt
\parskip=1em

||These rules must be followed. |The end of a paragraph is indicated
as usual with a blank line.

||[parstart]A new paragraph must start with a vertical
bar. |[sentstart]Each sentence must also start with a vertical
bar. |It follows from~\ref{parstart} and~\ref{sentstart} that a new
paragraph actually starts with two vertical bars.

Without vertical bars, nothing special happens. This might be useful
to comment on the formal rules above or below.

|[parref]|Each vertical bar takes an optional argument. |If
given, it is used as a label. |[sentref]For example, this is
sentence~\ref{sentref} of paragraph~\ref{parref}.

\end{document}

我使用在之后我们处于垂直模式这一事实\par来区分的两种用法|。但随后我们需要明确地\leavevmode，因为段落的开头本身并没有插入材料，导致我们切换到水平模式。如果您希望能够引用整个段落和单个句子，则需要||在每个段落的开头使用两个（它们可能都有一个可选参数）。如果您永远不需要引用整个段落，则很容易更改语法，以便|在段落开头只需要一个；事实上，它会简单得多，因为|可以作为带有可选参数的单个宏来实现（当处于时\ifvmode，它会在执行其他操作之前先执行段落计数器）。

添加我避免使用，\everypar因为如果发生很多其他事情，它就不太可靠。但是，在环境中包装内容可能会允许使用\everypar并提供更简单的语法。最大的问题实际上是允许使用标签；我们必须告诉 LaTeX 何时以及如何查找标签。

Question 2

^使用TeX 或 LaTeX 或其他任何东西都不容易。

问题在于自然语言处理中通常所说的句子边界歧义消除

句子边界识别很困难，因为标点符号通常具有歧义性。句号可能表示缩写、小数点、省略号或电子邮件地址 - 而不是句子的结尾。此外，句子可以以感叹号或问号结尾。

更好的方法是在 TeX 之外预处理文件。神经科学知识库用 Python 编写可能是一个起点。

¹不容易的意思是，如果你投入大量的时间，你也许能够定义一个 TeX 解析器来捕获 95% 使用 TeX 的情况。

Answer

^使用TeX 或 LaTeX 或其他任何东西都不容易。

问题在于自然语言处理中通常所说的句子边界歧义消除

句子边界识别很困难，因为标点符号通常具有歧义性。句号可能表示缩写、小数点、省略号或电子邮件地址 - 而不是句子的结尾。此外，句子可以以感叹号或问号结尾。

更好的方法是在 TeX 之外预处理文件。神经科学知识库用 Python 编写可能是一个起点。

¹不容易的意思是，如果你投入大量的时间，你也许能够定义一个 TeX 解析器来捕获 95% 使用 TeX 的情况。

Question 3

显然扬尼斯是对的。

但是，如果您可以接受权衡，那么您也许可以重新定义宏\\和\par（只要您留下空行就会隐式插入）并像这样写句子：

First sentence.\\
Second sentence.\\

Third sentence.

最后得到：

1.1 第一句。 1.2 第二句。
2.1 第三句。

这需要两个计数器，一个计数句子，一个计数段落。

Answer

显然扬尼斯是对的。

但是，如果您可以接受权衡，那么您也许可以重新定义宏\\和\par（只要您留下空行就会隐式插入）并像这样写句子：

First sentence.\\
Second sentence.\\

Third sentence.

最后得到：

1.1 第一句。 1.2 第二句。
2.1 第三句。

这需要两个计数器，一个计数句子，一个计数段落。

Question 4

我使用自动生成的标签来生成命名法，并引用正文中引入和解释变量的地方。

您可能能够从章节、节、小节等片段中自动构建有意义的标签，并最终得到

\label{sentence:chapSnippet:sectionSnippet:subSectionSnippet:parSnippet}

在每个代码段的开头，重新定义代码片段。例如：

\chapter{Insects}
\renewcommand{\chapSnippet}{insects}

然后，您最终会得到像 Insects 中 humblebee 部分的第一句 (roman I) 这样的标签，如 Insects:humblebee:I。这是一个临时解决方案，但仍可以使用插入到各处的自定义标签。自定义引用命令可以为您提供几乎任何您喜欢的参考格式。

添加：有谁知道这个问题的解决方案：用柜台制作标签？我没有测试那里提供的答案（这台机器上没有乳胶）。

Answer