如何检查字符串是否是有效的 LaTex 规则？

Question 1

至少在 Linux 上（不知道 Windows 上），有latexdefMartin Scharrer 编写的脚本，它可以从命令行查找 LaTeX 定义：

latexdef section

将打印

\section
\long macro:->\@startsection {section}{1}{\z@ }{-3.5ex \@plus -1ex \@minus -.2ex}{2.3ex \@plus .2ex}{\normalfont \Large \bfseries }

然而

latexdef sausage

将打印

\sausage
undefined

我们可以latexdef像这样从 Python 调用：

import subprocess, re

def latexdef(command_list, *args):
    '''
    call latexdef on a list of commands to be looked up
    *args can be used to pass options to latexdef
    '''
    p = subprocess.Popen(['latexdef'] + list(args) + command_list, \
                        stdout=subprocess.PIPE, \
                        stderr=subprocess.STDOUT)
    return p.communicate()[0].strip()

def are_commands(command_list, *args):
    '''
    look up multiple commands and return results in a dict
    '''
    result = latexdef(command_list, *args)
    frags = [ f.splitlines() for f in re.split(r'\n{2,}', result, re.MULTILINE) ]
    return { command[1:] : defn != 'undefined' for command, defn in frags }

def is_command(command, *args):
    '''
    look up a single command
    '''
    return are_commands([command],*args).values()[0]

if __name__ == '__main__':
    commands = "chapter section sausage".split()

    for command in commands:
        print command, is_command(command)

    print "\nwith book class loaded"

    for command in commands:
        print command, is_command(command, '-c', 'book')

    print "\nall at once, with class book"
    print are_commands(commands, '-c', 'book')

这将打印

chapter False
section True
sausage False

with book class loaded
chapter True
section True
sausage False

all at once, with class book
{'sausage:': False, 'section:': True, 'chapter:': True}

每次调用latexdef都相当慢，但通过在一次调用中查找多个命令可以节省时间。这就是的目的are_commands，它返回字典中每个命令的查找结果。

还要注意，这latexdef是一个 Perl 脚本，因此，根据它对你的重要性，将整个脚本翻译成 Python 可能更有意义，这样就省去了中间环节。但这是一个较长的脚本，而且 Perl 有点伤眼...

Answer

至少在 Linux 上（不知道 Windows 上），有latexdefMartin Scharrer 编写的脚本，它可以从命令行查找 LaTeX 定义：

latexdef section

将打印

\section
\long macro:->\@startsection {section}{1}{\z@ }{-3.5ex \@plus -1ex \@minus -.2ex}{2.3ex \@plus .2ex}{\normalfont \Large \bfseries }

然而

latexdef sausage

将打印

\sausage
undefined

我们可以latexdef像这样从 Python 调用：

import subprocess, re

def latexdef(command_list, *args):
    '''
    call latexdef on a list of commands to be looked up
    *args can be used to pass options to latexdef
    '''
    p = subprocess.Popen(['latexdef'] + list(args) + command_list, \
                        stdout=subprocess.PIPE, \
                        stderr=subprocess.STDOUT)
    return p.communicate()[0].strip()

def are_commands(command_list, *args):
    '''
    look up multiple commands and return results in a dict
    '''
    result = latexdef(command_list, *args)
    frags = [ f.splitlines() for f in re.split(r'\n{2,}', result, re.MULTILINE) ]
    return { command[1:] : defn != 'undefined' for command, defn in frags }

def is_command(command, *args):
    '''
    look up a single command
    '''
    return are_commands([command],*args).values()[0]

if __name__ == '__main__':
    commands = "chapter section sausage".split()

    for command in commands:
        print command, is_command(command)

    print "\nwith book class loaded"

    for command in commands:
        print command, is_command(command, '-c', 'book')

    print "\nall at once, with class book"
    print are_commands(commands, '-c', 'book')

这将打印

chapter False
section True
sausage False

with book class loaded
chapter True
section True
sausage False

all at once, with class book
{'sausage:': False, 'section:': True, 'chapter:': True}

每次调用latexdef都相当慢，但通过在一次调用中查找多个命令可以节省时间。这就是的目的are_commands，它返回字典中每个命令的查找结果。

还要注意，这latexdef是一个 Perl 脚本，因此，根据它对你的重要性，将整个脚本翻译成 Python 可能更有意义，这样就省去了中间环节。但这是一个较长的脚本，而且 Perl 有点伤眼...

Question 2

这不是一个真正的答案，而是一个较长的评论。如果这些宏由核心包/类定义，Michael Palmer 给出的答案在大多数情况下都有效。

但是：您可能需要考虑某些情况。LaTeX 规则的制定方式可能意味着命令序列。典型的 LaTeX 命令序列（在以下示例中我将其称为“cmd”）可以生成为以下 ABNF：

cmd = "\" 1*ALPHA

但这还不够。您应该注意，您可能希望单独添加/排除一些内部宏。这意味着您必须检查类似

cmd = "\" 1*(ALPHA | "@")

对于内部宏。此类命令序列在使用时是否有效取决于上下文。虽然此规则会检查命令本身的有效性，但它大多必须在环境中使用才\makeatletter ... \makeatother有效（如果您的检查应涉及上下文）。

并且您的检查应涉及上下文，这可以通过类似这样的命令简单地显示出来，\frac即只有在数学模式下使用时才是“有效的 LaTeX 规则”。或者类似这样的命令仅在命令\meter中有效。siunitx

另一种情况是 expl3。如果 l3 命令包含在\ExplSyntaxOn和中，它们在 LaTeX 中也是有效的\ExplSyntaxOff。它们将使用类似下面的代码构建：

cmd = "\" 1*(ALPHA | "_") ":" 0*ALPHA

这实际上并不完全正确，因为冒号后的字符受到限制，但这应该足够了。

如果您想检查用户定义的宏的有效性，情况会变得更糟，\csname ...\endcsname因为用户在这里有更多选项。

更新：毕竟，最有趣的部分是还要检查调用是否有效。这意味着您还必须检查函数的签名，然后检查命令的调用。这意味着\frac只有在从数学模式中调用并且具有两个强制参数时才有效。Fi 喜欢 $\frac{1}{2}$ 。这时您可能想要编译示例文档，因为真正的解析器在这里会非常复杂。

所有这些方法都有一个警告：您不仅会获得 LaTeX 命令序列，还会获得 TeX 命令序列。如果您特意尝试获取 LaTeX 命令序列，但又想排除 TeX 命令序列，那么您会遇到问题。

更新 2：正如您对测试的实现感兴趣：这里有一些您可以用来匹配的正则表达式。只有完全匹配时，您才会真正看到一个有效的序列。对于上下文相关的部分，您可能需要使用前瞻和后瞻。

标准 LaTeX：\\[A-Za-z]*
内部 LaTeX：\\[A-Za-z@]*
expl 语法：\\[A-za-z@_]*:[DNncVvoOxfTFpw]*
\csname命令：类似于\\.*$

Answer