可扩展，标准 TeX 唯一用于比较平衡标记列表的方法

Question

我不确定将其作为我自己问题的答案发布是否合适，因为它并没有真正回答这个问题，但我不会将其标记为这样（即使假设我可以），所以如果有人灵光一现解决了原始问题，我会很乐意将其标记为真正的答案。

现在谈谈宏。我提前为它们的形状道歉。它们是从我多年来编写的各种代码中提取出来的（并重新重命名），所以风格有点……折衷主义，我们可以这么说。下面可以优化很多，但多次传递的问题仍然存在，我稍后会解释，所以如果有人有巧妙的技巧来解决这个问题，请告诉我。

有几点需要注意：

1) 缺少实际的比较宏，仅存在分析部分，它以“前缀可扩展”的方式（例如，使用技巧\romannumeral-1）提供第 11 和 12 类标记的字符串，该字符串包含足够的信息来识别序列中的每个标记，包括其类别、字符代码（如果有）、是否为括号、其字符代码等。如果需要，可以直接比较此类字符串。

2）嗯，1）在两个方面都是善意的谎言：

a) 任何可以作为参数抓取的标记（即非空格、非括号）都将被（抓取并）替换为括\meaning 在 t ... e 中的字符串（t 和 e 都是第 11 类）；请注意，字符代码不是 32 的第 10 类标记属于此类别（双关语）。\yygrabtokenraw可以进行调整以提供更好的分析（如果目标是比较任意平衡的标记列表，则必须这样做，但只需归结为几个精心编写的条件）。请注意，仅仅\string这样做还不够，因为\escapechar可以是 -1。

b) 缺少“顶层”递归步骤；这里的主要问题是字符代码为 32 的括号；它们在最后一个阶段处理，此时已知序列的长度，并且可以找出\string它们的每一个\meaning。好吧，不要这么快，因为如果它们的类别代码为 32，则\meaning和\string都会将它们变成普通空格（将以\meaning两个空格结尾，这也无济于事），这是一个\detokenize 被发明来纠正的问题。因此，我们需要决定如何抓住它们。代码做出的一个保证是，每个左括号都将被正确识别为字符代码 32（o1e或c1e）或 32 以外的字符代码（o2e，c2e）。执行此操作的代码会弄乱后面的一些右括号（它们的字符代码），以便安全地使用括号，因此c2e第一个括号后面的“标记”不可靠（但是，如果找到另一个o1e，o1e或，则它是字符代码为 32 的括号）。o2e下一次迭代可以抓取“解密”的括号，而不会弄乱下一个括号。经过多次迭代（不幸的是，最多有右括号），一切都可以解决。如果有人感兴趣，我可以完成宏来做到这一点。只有当 Knuth\meaning以点结尾时......

3) 代码花费大量时间“传播扩展”。一种典型的情况是 \somemacro{<long list of benign tokens>}{\string}；\string这里的需要先扩展，然后才能发生其他事情，因此\somemacro花费大量时间\expandafter在中插入 s <long list ...>。请注意，\romannumeral如果<long list ...>很长，将会失败，因此将所有内容编码为数字不会有帮助。使用\csname <long ...>\endcsname是可能的（有\expandafter后续内容），但在这种情况下我担心会污染 TeX 的哈希表。

宏尝试在第一遍中识别“有趣的空间”，这是下面\meaning和的唯一用途\yymatchblankspace 。只能用\string。

最后附上了一个宏的测试用例。如果我忽略了一些愚蠢的事情，我深表歉意（当 Joseph Wright 和其他人怀疑时，我往往也会怀疑）。

编辑：除了其他可能与之有关的内容之外，\long为了清楚起见，我在每个定义前面都省略了，因此\par会破坏它。

扩展提供更好的分析以上：为了解决病态情况（例如\escapechar=-1 \let\#=#），可以准备一组宏（每个字符一个（甚至两个），例如\expandafter\def\csname match#\endcsname #1\##{...}% last '#' is \catcode 13）或几个宏，其中一个\defed 负责\def\maintest #1<a list of all active characters and single letter cs's>{...}所有繁重的工作（通过递归插入“抓取”标记在潜在的“分隔符”中）。中间选项（用时间换取空间）也是可能的。至于“那是很多宏”，当然这是一个问题。我（不完美）对此的看法是：“如果一个人能负担得起那么多\catcode寄存器，那么他也能负担得起那些特殊的‘条件’。”

我担心扩张传播上面提到的问题只是在 TeX 中进行递归的代价。通过在第一遍中用\yysx ?where对标记进行编码，可以在一定程度上缓解此问题\def\yysx#1#2{\expandafter\space\expandafter\yysx\expandafter#1\romannumeral-1#2}。这样，条目\romannumeral-1列表前面的a\yysx ?会将扩展“传递”到列表末尾，同时保持完整。

“支架后处理”感觉就像应该是可以避免的。

最后，我被问过很多次“为什么没有 e-TeX？”。我不确定这里是否是讨论这个问题的合适地方，但我有（可能是主观的）理由避免它。如果有人能建议一个更好的地方来讨论这些偏好，我将不胜感激。

% helper macros (to build test cases, etc); @ is a letter

\def\yyreplacestring#1\in#2\with#3{%
      \expandafter\def\expandafter\r@placestring\expandafter##\expandafter1\the#1##2\end{%
          \def\r@placestring{##2}% is this the string at the very end?
          \ifx\r@placestring\empty % then it is the one we inserted, report
              \errmessage{string <\the#1> not present in \the#2}% do not change the register if the string is not there
          \else % remove the extra copy of #1\end at the end
              \expandafter#2\expandafter\expandafter\expandafter
                  {\expandafter\r@plac@string\expandafter{\the#3}{##1}##2\end}%
      \fi}% end of \r@placestring definition
      \expandafter\def\expandafter\r@plac@string
          \expandafter##\expandafter1%
          \expandafter##\expandafter2%
          \expandafter##\expandafter3%
          \the#1\end{##2##1##3}%
      \expandafter\expandafter\expandafter\r@placestring\expandafter\the\expandafter#2\the#1\end
}

\newtoks\toksa
\newtoks\toksb
\newtoks\toksc
\newtoks\toksd

\def\yybreak#1#2\yycontinue{\fi#1}

\def\eatone#1{}
\def\eatonespace#1 {}
\def\identity#1{#1}
\def\yyfirstoftwo#1#2{#1}
\def\yysecondoftwo#1#2{#2}
\def\yysecondofthree#1#2#3{#2}
\def\yythirdofthree#1#2#3{#3}

% #1 -- `call stack'
% #2 -- remaining sequence
% #3 -- `parsed' sequence

\def\yypreparsetokensequenc@#1#2#3{%
    \yystringempty{#2}{#1{#3}}{\yypreparsetokensequen@@{#1}{#2}{#3}}%
}

\def\yypreparsetokensequen@@#1#2#3{% remaining sequence is nonempty
    \yystartsinbrace{#2}{\yydealwithbracedgroup{#1}{#2}{#3}}{\yypreparsetokensequ@n@@{#1}{#2}{#3}}%
}

\def\yydealwithbracedgroup#1#2#3{% the first token of the remaining sequence is a brace
    \iffalse{\fi\yydealwithbracedgro@p#2}{#1}{#3}%
}

\def\yydealwithbracedgro@p#1{%
    \yypreparsetokensequenc@{\yyrepackagesequence}{#1}{}%
}

% #1 -- parsed sequence
% this is a sequence to `propagate expansion' into the next parameter.
% the same can be achieved by packaging the whole sequence with a 
% \csname ... \endcsname pair and using a simple \expandafter
% maybe that would be a better idea ...

\def\yyrepackagesequence#1{%
    \yyrepackagesequenc@{}#1\end
}

% #1 -- `packaged' sequence (\expandafter\expandafter\expandafter ? ...)
% #2 -- the next category 12 character or \end

\def\yyrepackagesequenc@#1#2{%
    \ifx#2\end
        \yybreak{\yyrepackagesequ@nc@{#1\expandafter\expandafter\expandafter}}%
    \else
        \yybreak{\yyrepackagesequenc@{#1\expandafter\expandafter\expandafter#2}}%
    \yycontinue
}

% #1 -- `packaged' sequence (\expandafter\expandafter\expandafter ? ...)
% this macro is followed by the remainder of the original sequence with a so far
% unmatched right brace, the `call stack' and the parsed sequence.

\def\yyrepackagesequ@nc@#1{%
    \expandafter\expandafter\expandafter\yyrepackagesequ@nc@swap#1{\expandafter\eatone\string}%
}

% #1 -- parsed sequence without packaging

\def\yyrepackagesequ@nc@swap#1#{%
    \yyrepackagesequ@nc@sw@p{#1}%
}

% #1 -- parsed `inner' sequence
% #2 -- remainder of the original sequence
% #3 -- `call stack'
% #4 -- parsed sequence so far

\def\yyrepackagesequ@nc@sw@p#1#2#3#4{%
    \yypreparsetokensequenc@{#3}{#2}{#4[#1]}%
}

% `braced group' thread ends here

% #1 -- `call stack'
% #2 -- remaining sequence
% #3 -- `parsed' sequence

\def\yypreparsetokensequ@n@@#1#2#3{% the remaining group in #2 is nonempty and does not start with a brace
    \yystartsinspace{#2}{\yyconsumetruespace{#1}{#2}{#3}}{\yypreparsetokenseq@@n@@{#1}{#2}{#3}}%
}

\def\yyconsumetruespace#1#2#3{%
    \expandafter\yyconsumetruespac@swap\expandafter{\eatonespace#2}{#1}{#3.}%
}

\def\yyconsumetruespac@swap#1#2#3{%
    \yypreparsetokensequenc@{#2}{#1}{#3}%
}

% `group starting with a true (character code 32, category code 10) space' thread ends here

% #1 -- `call stack'
% #2 -- remaining sequence
% #3 -- `parsed' sequence

\def\yypreparsetokenseq@@n@@#1#2#3{% a nonempty group, that does not start with a brace or a true space
    \yymatchblankspace{#2}{\yyrescanblankspace{#2}{#1}{#3}}{\yypreparsetokens@q@@n@@{#1}{#2}{#3}}%
}

% #1 -- remaining sequence
% #2 -- `call stack'
% #3 -- `parsed' sequence

\def\yyrescanblankspace#1#2#3{%
    \expandafter\expandafter\expandafter
        \yyrescanblankspac@swap
    \expandafter\expandafter\expandafter{\expandafter\yynormalizeblankspac@\meaning#1}{#2}{#3*}%
}

\def\yyrescanblankspac@swap#1#2#3{%
    \yystartsinspace{#1}{%
        \expandafter\yyrescanblankspac@sw@p\expandafter{\eatonespace#1}{#2}{#3}%
    }{%
        \expandafter\yyrescanblankspac@sw@p\expandafter{\eatone#1}{#2}{#3}%
    }%
}

\def\yyrescanblankspac@sw@p#1#2#3{%
    \yypreparsetokensequenc@{#2}{#1}{#3}%
}

% `group starting with a blank space' ends here

% #1 -- `call stack'
% #2 -- remaining sequence
% #3 -- `parsed' sequence

\def\yypreparsetokens@q@@n@@#1#2#3{% nonempty group starting with a non blank, non brace token
    \expandafter\yypreparsetokens@q@@n@@swap\expandafter{\eatone#2}{#1}{#30}%
}

\def\yypreparsetokens@q@@n@@swap#1#2#3{%
    \yypreparsetokensequenc@{#2}{#1}{#3}%
}

% #1 -- string of category code 12 or 10 characters
% #2 -- string of category code 12 or 10 characters

\def\yycomparesimplestrings#1#2{%
    \yystringempty{#1}{%
        \yystringempty{#2}{\yyfirstoftwo}{\yysecondoftwo}%
    }{\yycomparesimplestrings@{#1}{#2}}%
}

\def\yycomparesimplestrings@#1#2{% the first string is nonempty
    \yystringempty{#2}{\yysecondoftwo}{\yycomparesimplestrings@@{#1}{#2}}%
}

\def\yycomparesimplestrings@@#1#2{% both strings are nonempty
    \yystartsinspace{#1}{%
        \yystartsinspace{#2}{\yyabsorbfirstspace{#1}{#2}}{\yysecondoftwo}%
    }{%
        \yystartsinspace{#2}{\yysecondoftwo}{\yyabsorbfirstnonspace{#1}{#2}}%
    }    
}

\def\yyabsorbfirstspace#1#2{%
    \expandafter\yyabsorbfirstspac@swap\expandafter{\eatonespace#1}{#2}%
}

\def\yyabsorbfirstspac@swap#1#2{%
     \expandafter\yyabsorbfirst@swap\expandafter{\eatonespace#2}{#1}%
}

\def\yyabsorbfirstnonspace#1#2{%
    \expandafter\yyabsorbfirstnonspac@swap\expandafter{\eatone#1}{#2}%
}

\def\yyabsorbfirstnonspac@swap#1#2{%
     \expandafter\yyabsorbfirst@swap\expandafter{\eatone#2}{#1}%
}

\def\yyabsorbfirst@swap#1#2{%
     \yycomparesimplestrings{#2}{#1}%
}

% `compare strings of category code 12' thread ends here

% #1 -- remaining parsed sequence
% #2 -- analysed sequence

\def\yyanalysetokens@#1#2{%
    \yystringempty{#1}{{#2}}%
        {\yyanalysetok@ns@#1\end{#2}}%
}

\def\yyanalysetok@ns@#1#2\end{%
    \ifx#1.%
        \expandafter\yyfirstoftwo
    \else
        \expandafter\yysecondoftwo
    \fi
    {\yygrabablank{#2}}%
    {%
        \ifx#1[% not a space, an opening brace
            \expandafter\yyfirstoftwo
        \else
            \expandafter\yysecondoftwo
        \fi
        {%
            \yydisableobrace{#2}%
        }{% 
            \ifx#1]% not a space, a closing brace
                \expandafter\yyfirstoftwo
            \else
                \expandafter\yysecondoftwo
            \fi
            {%
                \yydisablecbrace{#2}%
            }{% neither space nor brace
                \yygrabtokenraw{#2}%
            }%
        }%
    }%
}

% #1 -- remaining parsed sequence
% #2 -- analysed sequence
% #3 -- next token

\def\yygrabtokenraw#1#2#3{%
    \expandafter\yyanalysetokens@swap\expandafter{\meaning#3}{#1}{#2}%
}

\def\yyanalysetokens@swap#1#2#3{%
    \yyanalysetokens@{#2}{#3t#1e}%
}

\def\yygrabablank#1#2 {%
    \yyanalysetokens@{#1}{#2s0e}%
}

% #1 -- remaining parsed sequence
% #2 -- analysed sequence

\def\yydisablecbrace#1#2{%
    \yydisablecbrac@{}#1\relax#2\end
}


\def\yydisablecbrac@#1#2{%
    \ifx#2\end
        \yybreak{\yydisablecbrac@@{#1\expandafter\expandafter\expandafter}}%
    \else
        \yybreak{\yydisablecbrac@{#1\expandafter\expandafter\expandafter#2}}%
    \yycontinue
}

\def\yydisablecbrac@@#1{%
    \expandafter\expandafter\expandafter
        \yydisablecbrace@@@#1\end
    \expandafter\expandafter\expandafter
        {\iffalse}\fi\string
}

\def\yydisablecbrace@@@#1\relax#2\end#3{%
    \yystartsinspace{#3}%
        {\expandafter\yyanalysetok@nsswap\expandafter{\eatonespace#3}{#1}{#2c1e}}%
        {\expandafter\yyanalysetok@nsswap\expandafter{\eatone#3}{#1}{#2c2e}}%
}

\def\yyanalysetok@nsswap#1#2#3{%
    \iffalse{\fi\yyanalysetokens@{#2}{#3}#1}%
}

% #1 -- remaining parsed sequence
% #2 -- analysed sequence

\def\yydisableobrace#1#2{%
    \yydisableobrac@{}#1\relax#2\end
}


\def\yydisableobrac@#1#2{%
    \ifx#2\end
        \yybreak{\yydisableobrac@@{#1\expandafter\expandafter\expandafter}}%
    \else
        \yybreak{\yydisableobrac@{#1\expandafter\expandafter\expandafter#2}}%
    \yycontinue
}

\def\yydisableobrac@@#1{%
    \expandafter\expandafter\expandafter
        \yydisableobrace@@@#1\end
    \expandafter\expandafter\expandafter
        {\iffalse}\fi\string
}

\def\yydisableobrace@@@#1\relax#2\end#3{%
    \yystartsinspace{#3}%
        {\expandafter\yyanalysetok@nsswap\expandafter{\eatonespace#3}{#1}{#2o1e}}%
        {\expandafter\yyanalysetok@nsswap\expandafter{\eatone#3}{#1}{#2o2e}}%
}

\uccode`\ =`\-

% \dotspace expands into a character code `\-, category code 10 token (funny space)

\uppercase{\def\dotspace{ }}

\toksa\expandafter\expandafter\expandafter{\expandafter\meaning\dotspace}

\toksb{-}

\toksc{#2}

\toksd\toksa

\yyreplacestring\toksb\in\toksa\with\toksc

\toksc{}
\yyreplacestring\toksb\in\toksd\with\toksc

\expandafter\def\expandafter\yymatchblankspac@\expandafter#\expandafter1\the\toksd{%
    \yystringempty{#1}{\expandafter\yysecondofthree\expandafter{\string}}%
        {\expandafter\yythirdofthree\expandafter{\string}}%
}

\edef\yymatchblankspace#1{% is it \catcode 10 token?
    \noexpand\iffalse{\noexpand\fi
    \noexpand\expandafter
    \noexpand\yymatchblankspac@
    \noexpand\meaning#1\the\toksd}%
}

% the idea behind the sequence below is that a leading character of category code 10
% is replaced either by a character of category code 10 and charachter code 32 or a character
% of category code 12 and character code other than 32
% note that while it is tempting to replace the definition below by something that ends in
% ... blank space #2{ ... with the hope of absorbing the result of \meaning in one step,
% this will not give the desired result in case of an active character,
% say, `~' that had been \let to the normal blank space

\expandafter\def\expandafter\yynormalizeblankspac@\expandafter#\expandafter1\the\toksd{}

\def\yystartsinspace#1{% is it \charcode 32, \catcode 10 token?
    \iffalse{\fi\yystartsinspac@#1 }%
}

\def\yystartsinspac@#1 {%
    \yystringempty{#1}{\expandafter\yysecondofthree\expandafter{\string}}{\expandafter\yythirdofthree\expandafter{\string}}%
}

\def\yystartsinbrace#1{%
  \iffalse{{\fi
  \if!\yytoks@mpty#1}}!%
    \expandafter\yysecondoftwo
  \else
    \expandafter\yyfirstoftwo
  \fi
}

\def\yystringempty#1{%
  \iffalse{{{\fi
  \ifcase\yytoks@mpty#1}}\@ne}\z@
    \expandafter\yyfirstoftwo
  \else
    \expandafter\yysecondoftwo
  \fi
}

\def\yytoks@mpty{%
    \expandafter\eatone\expandafter{\expandafter{%
        \ifcase\expandafter1\expandafter}\expandafter}\expandafter\fi\string
}

%% test code begins here

%\tracingmacros=3
%\tracingonline=3

\catcode`\ =13\relax%
\def\actspace{ }%
\catcode`\ =10\relax%

\catcode`\.=13\relax%
\def\actdotspace{.}%
\catcode`\.=12\relax%

\edef\makefunkydotspace{\let\expandafter\noexpand\actdotspace= \dotspace}
\edef\makefunkyspace{\let\expandafter\noexpand\actspace= \space}

\makefunkyspace
\makefunkydotspace

\catcode`\<=1
\catcode`\>=2
\uccode`\<=32
\uccode`\>=32

% inside the following sequence, < and > will become braces with character code 32 (space),
% \actspace will expand into an active character with character code 32, that has been \let to a
% character code 32, category code 10 token (space)

\uppercase{\edef\temptest{{ } \space\space\dotspace\expandafter\noexpand\actspace\expandafter\noexpand\actdotspace{<> {{}{{ u o l k kk
    \end\noexpand\fi\noexpand\else\noexpand\iffalse{}} }}}}}

%\uppercase{\edef\temptest{\dotspace E <>}}

\show\temptest

\def\displaypreparse#1{%
    \expandafter\errmessage\expandafter{\romannumeral-1\yypreparsetokensequenc@{\yyanalysetokens@}{#1}{}{}#1}%
}

\expandafter\displaypreparse\expandafter{\temptest}

\end

Answer 1