从文本中删除特定的乳胶命令及其后面的右括号

Question 1

\edit{...}下面是一个在最多只有一层命令的简单情况下工作的命令：

perl -00 -lpe 's,\\edit\{( (?: [^}\\]* | \\[a-z]+\{[^}]*\} )+ )\},$1,xg'

中间部分(?: [^}\\]* | \\[a-z]+\{[^}]*\} )+有替代方案： [^}\\]*匹配任何没有右大括号或反斜杠的字符串（常规文本）；并\\[a-z]+\{[^}]*\}用反斜杠、小写字母和一对匹配的大括号（例如\url{whatever...}）匹配任何内容。分组(?:...)+会重复这些替代项以及外部括号捕获，因此我们可以仅用内部的部分替换匹配项\edit{...}。

-00告诉 Perl 一次处理一个段落的输入，并用空行分隔段落。如果您需要处理跨段落的标签，请将其更改为-0777一次性处理整个输入（-0对于 NUL 分隔的输入也适用，因为文本文件不会有任何输入）。

对于您的示例，这似乎有效，给出：

We Introduce a model for analyzing \emph{data} from various
experimental designs, such as paired or \url{http://www/}
longitudinal; as was done 1984 by NN \cite{mycitation} and by NNN
\cite{mycitation2}.

然而，它（可以预见）对于包含两级命令的输入会失败\edit{...}：

Some \edit{\somecmd{\emph{nested} commands} here}.

转到：

Some \somecmd{\emph{nested} commands here}.

（删除了错误的右大括号）

实际上处理平衡括号有点棘手，例如在这个问题中对此进行了讨论： Perl正则表达式：匹配嵌套括号。

Answer

\edit{...}下面是一个在最多只有一层命令的简单情况下工作的命令：

perl -00 -lpe 's,\\edit\{( (?: [^}\\]* | \\[a-z]+\{[^}]*\} )+ )\},$1,xg'

中间部分(?: [^}\\]* | \\[a-z]+\{[^}]*\} )+有替代方案： [^}\\]*匹配任何没有右大括号或反斜杠的字符串（常规文本）；并\\[a-z]+\{[^}]*\}用反斜杠、小写字母和一对匹配的大括号（例如\url{whatever...}）匹配任何内容。分组(?:...)+会重复这些替代项以及外部括号捕获，因此我们可以仅用内部的部分替换匹配项\edit{...}。

-00告诉 Perl 一次处理一个段落的输入，并用空行分隔段落。如果您需要处理跨段落的标签，请将其更改为-0777一次性处理整个输入（-0对于 NUL 分隔的输入也适用，因为文本文件不会有任何输入）。

对于您的示例，这似乎有效，给出：

We Introduce a model for analyzing \emph{data} from various
experimental designs, such as paired or \url{http://www/}
longitudinal; as was done 1984 by NN \cite{mycitation} and by NNN
\cite{mycitation2}.

然而，它（可以预见）对于包含两级命令的输入会失败\edit{...}：

Some \edit{\somecmd{\emph{nested} commands} here}.

转到：

Some \somecmd{\emph{nested} commands here}.

（删除了错误的右大括号）

实际上处理平衡括号有点棘手，例如在这个问题中对此进行了讨论： Perl正则表达式：匹配嵌套括号。

Question 2

我有一个基于Python的解决方案，不够简洁，但使用嵌套命令表现良好。

def command_remove(tex_in, keywords):
    # Romove command with curly bracket
    # keywords: "hl textbf" mean removing \hl{} and \textbf{}
    pattern = '\\\\(' + keywords.replace(' ', '|') + '){'
    commands = re.finditer(pattern, tex_in)
    idxs_to_del = [] # The index of }
    for command in commands:
        stack = 0
        current_loc = command.span()[1]
        while not (tex_in[current_loc] == '}' and stack == 0):
            if tex_in[current_loc] == '}':
                stack = stack - 1
            if tex_in[current_loc] == '{':
                stack = stack + 1
            current_loc = current_loc + 1
        idxs_to_del.append(current_loc)

    idxs_to_del = sorted(idxs_to_del, reverse=True) # sort
    tex_list = list(tex_in)
    for idx in idxs_to_del:
        tex_list.pop(idx) # remove }

    tex_out = ''.join(tex_list)
    tex_out = re.sub(pattern, '', tex_out) # remove \xxx{
    return tex_out

它通过正则表达式定位目标命令，然后通过堆栈定位右括号的位置。对于：tex_out = command_remove(tex_in, "revise textbf")tex_in

\hl{Can you} \revise{can a \textbf{can} as a \emph{canner} can} can a can?

我们会得到tex_out：

\hl{Can you} can a can as a \emph{canner} can can a can?

更多详细信息，即命令行运行，位于Latex_命令_删除。

Answer

我有一个基于Python的解决方案，不够简洁，但使用嵌套命令表现良好。

def command_remove(tex_in, keywords):
    # Romove command with curly bracket
    # keywords: "hl textbf" mean removing \hl{} and \textbf{}
    pattern = '\\\\(' + keywords.replace(' ', '|') + '){'
    commands = re.finditer(pattern, tex_in)
    idxs_to_del = [] # The index of }
    for command in commands:
        stack = 0
        current_loc = command.span()[1]
        while not (tex_in[current_loc] == '}' and stack == 0):
            if tex_in[current_loc] == '}':
                stack = stack - 1
            if tex_in[current_loc] == '{':
                stack = stack + 1
            current_loc = current_loc + 1
        idxs_to_del.append(current_loc)

    idxs_to_del = sorted(idxs_to_del, reverse=True) # sort
    tex_list = list(tex_in)
    for idx in idxs_to_del:
        tex_list.pop(idx) # remove }

    tex_out = ''.join(tex_list)
    tex_out = re.sub(pattern, '', tex_out) # remove \xxx{
    return tex_out

它通过正则表达式定位目标命令，然后通过堆栈定位右括号的位置。对于：tex_out = command_remove(tex_in, "revise textbf")tex_in

\hl{Can you} \revise{can a \textbf{can} as a \emph{canner} can} can a can?

我们会得到tex_out：

\hl{Can you} can a can as a \emph{canner} can can a can?

更多详细信息，即命令行运行，位于Latex_命令_删除。

Question 3

要\edit{...}使用 LaTeX 命令（意味着其他{...}对）处理 s，您可以使用perl的能力在其正则表达式中处理递归：

perl -pe 's{\\edit(\{((?:[^{}]++|(?1))*)\})}{$2}g' file

其中(?1)回忆了第一对中的正则表达式(...)，这里是匹配一对的正则表达式{...}。

（这里不处理转义的大括号或\verb注释，并假设\edit{...}s 不跨越多行，如果需要，所有这些都可以相当容易地添加）。

Answer

要\edit{...}使用 LaTeX 命令（意味着其他{...}对）处理 s，您可以使用perl的能力在其正则表达式中处理递归：

perl -pe 's{\\edit(\{((?:[^{}]++|(?1))*)\})}{$2}g' file

其中(?1)回忆了第一对中的正则表达式(...)，这里是匹配一对的正则表达式{...}。

（这里不处理转义的大括号或\verb注释，并假设\edit{...}s 不跨越多行，如果需要，所有这些都可以相当容易地添加）。

从文本中删除特定的乳胶命令及其后面的右括号

答案1

答案2

答案3

相关内容