可扩展,标准 TeX 唯一用于比较平衡标记列表的方法

可扩展,标准 TeX 唯一用于比较平衡标记列表的方法

是否有一个(合理)有效的宏可以执行类似于 \long\def\comparets#1#2{\def\aa{#1}\def\bb{#2}\ifx\aa\bb true\else false\fi} except is expandable 的操作(即,\newcomparets{<tokens1>}{<tokens2>}可以扩展为 'true' 或 'false',包括 inside \edef)?我正在寻找一个“纯”TeX(即没有扩展,例如 e-TeX)解决方案。我查看过l3tl宏,但它们似乎使用 e-TeX。该解决方案应该适用于任意标记序列(包括包含各种“有趣空格”和括号以及任意控制序列的序列)。我似乎无法找到一种方法来做到这一点,而无需执行几次传递。

答案1

我不确定将其作为我自己问题的答案发布是否合适,因为它并没有真正回答这个问题,但我不会将其标记为这样(即使假设我可以),所以如果有人灵光一现解决了原始问题,我会很乐意将其标记为真正的答案。

现在谈谈宏。我提前为它们的形状道歉。它们是从我多年来编写的各种代码中提取出来的(并重新重命名),所以风格有点……折衷主义,我们可以这么说。下面可以优化很多,但多次传递的问题仍然存在,我稍后会解释,所以如果有人有巧妙的技巧来解决这个问题,请告诉我。

有几点需要注意:

1) 缺少实际的比较宏,仅存在分析部分,它以“前缀可扩展”的方式(例如,使用技巧\romannumeral-1)提供第 11 和 12 类标记的字符串,该字符串包含足够的信息来识别序列中的每个标记,包括其类别、字符代码(如果有)、是否为括号、其字符代码等。如果需要,可以直接比较此类字符串。

2)嗯,1)在两个方面都是善意的谎言:

a) 任何可以作为参数抓取的标记(即非空格、非括号)都将被(抓取并)替换为括\meaning 在 t ... e 中的字符串(t 和 e 都是第 11 类);请注意,字符代码不是 32 的第 10 类标记属于此类别(双关语)。\yygrabtokenraw可以进行调整以提供更好的分析(如果目标是比较任意平衡的标记列表,则必须这样做,但只需归结为几个精心编写的条件)。请注意,仅仅\string这样做还不够,因为\escapechar可以是 -1。

b) 缺少“顶层”递归步骤;这里的主要问题是字符代码为 32 的括号;它们在最后一个阶段处理,此时已知序列的长度,并且可以找出\string它们的每一个\meaning。好吧,不要这么快,因为如果它们的类别代码为 32,则\meaning\string都会将它们变成普通空格(将以\meaning两个空格结尾,这也无济于事),这是一个\detokenize 被发明来纠正的问题。因此,我们需要决定如何抓住它们。代码做出的一个保证是,每个左括号都将被正确识别为字符代码 32(o1ec1e)或 32 以外的字符代码(o2ec2e)。执行此操作的代码会弄乱后面的一些右括号(它们的字符代码),以便安全地使用括号,因此c2e第一个括号后面的“标记”不可靠(但是,如果 找到另一个o1eo1e或,则它是字符代码为 32 的括号)。o2e下一次迭代可以抓取“解密”的括号,而不会弄乱下一个括号。经过多次迭代(不幸的是,最多有右括号),一切都可以解决。如果有人感兴趣,我可以完成宏来做到这一点。只有当 Knuth\meaning以点结尾时......

3) 代码花费大量时间“传播扩展”。一种典型的情况是 \somemacro{<long list of benign tokens>}{\string}\string这里的 需要先扩展,然后才能发生其他事情,因此\somemacro花费大量时间\expandafter在 中插入 s <long list ...>。请注意,\romannumeral如果<long list ...>很长, 将会失败,因此将所有内容编码为数字不会有帮助。使用\csname <long ...>\endcsname是可能的(有\expandafter后续内容),但在这种情况下我担心会污染 TeX 的哈希表。

宏尝试在第一遍中识别“有趣的空间”,这是下面\meaning和的唯一用途\yymatchblankspace 。只能用\string

最后附上了一个宏的测试用例。如果我忽略了一些愚蠢的事情,我深表歉意(当 Joseph Wright 和其他人怀疑时,我往往也会怀疑)。

编辑:除了其他可能与之有关的内容之外,\long为了清楚起见,我在每个定义前面都省略了,因此\par会破坏它。

扩展提供更好的分析以上:为了解决病态情况(例如\escapechar=-1 \let\#=#),可以准备一组宏(每个字符一个(甚至两个),例如\expandafter\def\csname match#\endcsname #1\##{...}% last '#' is \catcode 13)或几个宏,其中一个\defed 负责\def\maintest #1<a list of all active characters and single letter cs's>{...}所有繁重的工作(通过递归插入“抓取”标记在潜在的“分隔符”中)。中间选项(用时间换取空间)也是可能的。至于“那是很多宏”,当然这是一个问题。我(不完美)对此的看法是:“如果一个人能负担得起那么多\catcode寄存器,那么他也能负担得起那些特殊的‘条件’。”

我担心扩张传播上面提到的问题只是在 TeX 中进行递归的代价。通过在第一遍中用\yysx ?where对标记进行编码,可以在一定程度上缓解此问题\def\yysx#1#2{\expandafter\space\expandafter\yysx\expandafter#1\romannumeral-1#2}。这样,条目\romannumeral-1列表前面的a\yysx ?会将扩展“传递”到列表末尾,同时保持完整。

“支架后处理”感觉就像应该是可以避免的。

最后,我被问过很多次“为什么没有 e-TeX?”。我不确定这里是否是讨论这个问题的合适地方,但我有(可能是主观的)理由避免它。如果有人能建议一个更好的地方来讨论这些偏好,我将不胜感激。

% helper macros (to build test cases, etc); @ is a letter

\def\yyreplacestring#1\in#2\with#3{%
      \expandafter\def\expandafter\r@placestring\expandafter##\expandafter1\the#1##2\end{%
          \def\r@placestring{##2}% is this the string at the very end?
          \ifx\r@placestring\empty % then it is the one we inserted, report
              \errmessage{string <\the#1> not present in \the#2}% do not change the register if the string is not there
          \else % remove the extra copy of #1\end at the end
              \expandafter#2\expandafter\expandafter\expandafter
                  {\expandafter\r@plac@string\expandafter{\the#3}{##1}##2\end}%
      \fi}% end of \r@placestring definition
      \expandafter\def\expandafter\r@plac@string
          \expandafter##\expandafter1%
          \expandafter##\expandafter2%
          \expandafter##\expandafter3%
          \the#1\end{##2##1##3}%
      \expandafter\expandafter\expandafter\r@placestring\expandafter\the\expandafter#2\the#1\end
}

\newtoks\toksa
\newtoks\toksb
\newtoks\toksc
\newtoks\toksd

\def\yybreak#1#2\yycontinue{\fi#1}

\def\eatone#1{}
\def\eatonespace#1 {}
\def\identity#1{#1}
\def\yyfirstoftwo#1#2{#1}
\def\yysecondoftwo#1#2{#2}
\def\yysecondofthree#1#2#3{#2}
\def\yythirdofthree#1#2#3{#3}

% #1 -- `call stack'
% #2 -- remaining sequence
% #3 -- `parsed' sequence

\def\yypreparsetokensequenc@#1#2#3{%
    \yystringempty{#2}{#1{#3}}{\yypreparsetokensequen@@{#1}{#2}{#3}}%
}

\def\yypreparsetokensequen@@#1#2#3{% remaining sequence is nonempty
    \yystartsinbrace{#2}{\yydealwithbracedgroup{#1}{#2}{#3}}{\yypreparsetokensequ@n@@{#1}{#2}{#3}}%
}

\def\yydealwithbracedgroup#1#2#3{% the first token of the remaining sequence is a brace
    \iffalse{\fi\yydealwithbracedgro@p#2}{#1}{#3}%
}

\def\yydealwithbracedgro@p#1{%
    \yypreparsetokensequenc@{\yyrepackagesequence}{#1}{}%
}

% #1 -- parsed sequence
% this is a sequence to `propagate expansion' into the next parameter.
% the same can be achieved by packaging the whole sequence with a 
% \csname ... \endcsname pair and using a simple \expandafter
% maybe that would be a better idea ...

\def\yyrepackagesequence#1{%
    \yyrepackagesequenc@{}#1\end
}

% #1 -- `packaged' sequence (\expandafter\expandafter\expandafter ? ...)
% #2 -- the next category 12 character or \end

\def\yyrepackagesequenc@#1#2{%
    \ifx#2\end
        \yybreak{\yyrepackagesequ@nc@{#1\expandafter\expandafter\expandafter}}%
    \else
        \yybreak{\yyrepackagesequenc@{#1\expandafter\expandafter\expandafter#2}}%
    \yycontinue
}

% #1 -- `packaged' sequence (\expandafter\expandafter\expandafter ? ...)
% this macro is followed by the remainder of the original sequence with a so far
% unmatched right brace, the `call stack' and the parsed sequence.

\def\yyrepackagesequ@nc@#1{%
    \expandafter\expandafter\expandafter\yyrepackagesequ@nc@swap#1{\expandafter\eatone\string}%
}

% #1 -- parsed sequence without packaging

\def\yyrepackagesequ@nc@swap#1#{%
    \yyrepackagesequ@nc@sw@p{#1}%
}

% #1 -- parsed `inner' sequence
% #2 -- remainder of the original sequence
% #3 -- `call stack'
% #4 -- parsed sequence so far

\def\yyrepackagesequ@nc@sw@p#1#2#3#4{%
    \yypreparsetokensequenc@{#3}{#2}{#4[#1]}%
}

% `braced group' thread ends here

% #1 -- `call stack'
% #2 -- remaining sequence
% #3 -- `parsed' sequence

\def\yypreparsetokensequ@n@@#1#2#3{% the remaining group in #2 is nonempty and does not start with a brace
    \yystartsinspace{#2}{\yyconsumetruespace{#1}{#2}{#3}}{\yypreparsetokenseq@@n@@{#1}{#2}{#3}}%
}

\def\yyconsumetruespace#1#2#3{%
    \expandafter\yyconsumetruespac@swap\expandafter{\eatonespace#2}{#1}{#3.}%
}

\def\yyconsumetruespac@swap#1#2#3{%
    \yypreparsetokensequenc@{#2}{#1}{#3}%
}

% `group starting with a true (character code 32, category code 10) space' thread ends here

% #1 -- `call stack'
% #2 -- remaining sequence
% #3 -- `parsed' sequence

\def\yypreparsetokenseq@@n@@#1#2#3{% a nonempty group, that does not start with a brace or a true space
    \yymatchblankspace{#2}{\yyrescanblankspace{#2}{#1}{#3}}{\yypreparsetokens@q@@n@@{#1}{#2}{#3}}%
}

% #1 -- remaining sequence
% #2 -- `call stack'
% #3 -- `parsed' sequence

\def\yyrescanblankspace#1#2#3{%
    \expandafter\expandafter\expandafter
        \yyrescanblankspac@swap
    \expandafter\expandafter\expandafter{\expandafter\yynormalizeblankspac@\meaning#1}{#2}{#3*}%
}

\def\yyrescanblankspac@swap#1#2#3{%
    \yystartsinspace{#1}{%
        \expandafter\yyrescanblankspac@sw@p\expandafter{\eatonespace#1}{#2}{#3}%
    }{%
        \expandafter\yyrescanblankspac@sw@p\expandafter{\eatone#1}{#2}{#3}%
    }%
}

\def\yyrescanblankspac@sw@p#1#2#3{%
    \yypreparsetokensequenc@{#2}{#1}{#3}%
}

% `group starting with a blank space' ends here

% #1 -- `call stack'
% #2 -- remaining sequence
% #3 -- `parsed' sequence

\def\yypreparsetokens@q@@n@@#1#2#3{% nonempty group starting with a non blank, non brace token
    \expandafter\yypreparsetokens@q@@n@@swap\expandafter{\eatone#2}{#1}{#30}%
}

\def\yypreparsetokens@q@@n@@swap#1#2#3{%
    \yypreparsetokensequenc@{#2}{#1}{#3}%
}

% #1 -- string of category code 12 or 10 characters
% #2 -- string of category code 12 or 10 characters

\def\yycomparesimplestrings#1#2{%
    \yystringempty{#1}{%
        \yystringempty{#2}{\yyfirstoftwo}{\yysecondoftwo}%
    }{\yycomparesimplestrings@{#1}{#2}}%
}

\def\yycomparesimplestrings@#1#2{% the first string is nonempty
    \yystringempty{#2}{\yysecondoftwo}{\yycomparesimplestrings@@{#1}{#2}}%
}

\def\yycomparesimplestrings@@#1#2{% both strings are nonempty
    \yystartsinspace{#1}{%
        \yystartsinspace{#2}{\yyabsorbfirstspace{#1}{#2}}{\yysecondoftwo}%
    }{%
        \yystartsinspace{#2}{\yysecondoftwo}{\yyabsorbfirstnonspace{#1}{#2}}%
    }    
}

\def\yyabsorbfirstspace#1#2{%
    \expandafter\yyabsorbfirstspac@swap\expandafter{\eatonespace#1}{#2}%
}

\def\yyabsorbfirstspac@swap#1#2{%
     \expandafter\yyabsorbfirst@swap\expandafter{\eatonespace#2}{#1}%
}

\def\yyabsorbfirstnonspace#1#2{%
    \expandafter\yyabsorbfirstnonspac@swap\expandafter{\eatone#1}{#2}%
}

\def\yyabsorbfirstnonspac@swap#1#2{%
     \expandafter\yyabsorbfirst@swap\expandafter{\eatone#2}{#1}%
}

\def\yyabsorbfirst@swap#1#2{%
     \yycomparesimplestrings{#2}{#1}%
}

% `compare strings of category code 12' thread ends here

% #1 -- remaining parsed sequence
% #2 -- analysed sequence

\def\yyanalysetokens@#1#2{%
    \yystringempty{#1}{{#2}}%
        {\yyanalysetok@ns@#1\end{#2}}%
}

\def\yyanalysetok@ns@#1#2\end{%
    \ifx#1.%
        \expandafter\yyfirstoftwo
    \else
        \expandafter\yysecondoftwo
    \fi
    {\yygrabablank{#2}}%
    {%
        \ifx#1[% not a space, an opening brace
            \expandafter\yyfirstoftwo
        \else
            \expandafter\yysecondoftwo
        \fi
        {%
            \yydisableobrace{#2}%
        }{% 
            \ifx#1]% not a space, a closing brace
                \expandafter\yyfirstoftwo
            \else
                \expandafter\yysecondoftwo
            \fi
            {%
                \yydisablecbrace{#2}%
            }{% neither space nor brace
                \yygrabtokenraw{#2}%
            }%
        }%
    }%
}

% #1 -- remaining parsed sequence
% #2 -- analysed sequence
% #3 -- next token

\def\yygrabtokenraw#1#2#3{%
    \expandafter\yyanalysetokens@swap\expandafter{\meaning#3}{#1}{#2}%
}

\def\yyanalysetokens@swap#1#2#3{%
    \yyanalysetokens@{#2}{#3t#1e}%
}

\def\yygrabablank#1#2 {%
    \yyanalysetokens@{#1}{#2s0e}%
}

% #1 -- remaining parsed sequence
% #2 -- analysed sequence

\def\yydisablecbrace#1#2{%
    \yydisablecbrac@{}#1\relax#2\end
}


\def\yydisablecbrac@#1#2{%
    \ifx#2\end
        \yybreak{\yydisablecbrac@@{#1\expandafter\expandafter\expandafter}}%
    \else
        \yybreak{\yydisablecbrac@{#1\expandafter\expandafter\expandafter#2}}%
    \yycontinue
}

\def\yydisablecbrac@@#1{%
    \expandafter\expandafter\expandafter
        \yydisablecbrace@@@#1\end
    \expandafter\expandafter\expandafter
        {\iffalse}\fi\string
}

\def\yydisablecbrace@@@#1\relax#2\end#3{%
    \yystartsinspace{#3}%
        {\expandafter\yyanalysetok@nsswap\expandafter{\eatonespace#3}{#1}{#2c1e}}%
        {\expandafter\yyanalysetok@nsswap\expandafter{\eatone#3}{#1}{#2c2e}}%
}

\def\yyanalysetok@nsswap#1#2#3{%
    \iffalse{\fi\yyanalysetokens@{#2}{#3}#1}%
}

% #1 -- remaining parsed sequence
% #2 -- analysed sequence

\def\yydisableobrace#1#2{%
    \yydisableobrac@{}#1\relax#2\end
}


\def\yydisableobrac@#1#2{%
    \ifx#2\end
        \yybreak{\yydisableobrac@@{#1\expandafter\expandafter\expandafter}}%
    \else
        \yybreak{\yydisableobrac@{#1\expandafter\expandafter\expandafter#2}}%
    \yycontinue
}

\def\yydisableobrac@@#1{%
    \expandafter\expandafter\expandafter
        \yydisableobrace@@@#1\end
    \expandafter\expandafter\expandafter
        {\iffalse}\fi\string
}

\def\yydisableobrace@@@#1\relax#2\end#3{%
    \yystartsinspace{#3}%
        {\expandafter\yyanalysetok@nsswap\expandafter{\eatonespace#3}{#1}{#2o1e}}%
        {\expandafter\yyanalysetok@nsswap\expandafter{\eatone#3}{#1}{#2o2e}}%
}

\uccode`\ =`\-

% \dotspace expands into a character code `\-, category code 10 token (funny space)

\uppercase{\def\dotspace{ }}

\toksa\expandafter\expandafter\expandafter{\expandafter\meaning\dotspace}

\toksb{-}

\toksc{#2}

\toksd\toksa

\yyreplacestring\toksb\in\toksa\with\toksc

\toksc{}
\yyreplacestring\toksb\in\toksd\with\toksc

\expandafter\def\expandafter\yymatchblankspac@\expandafter#\expandafter1\the\toksd{%
    \yystringempty{#1}{\expandafter\yysecondofthree\expandafter{\string}}%
        {\expandafter\yythirdofthree\expandafter{\string}}%
}

\edef\yymatchblankspace#1{% is it \catcode 10 token?
    \noexpand\iffalse{\noexpand\fi
    \noexpand\expandafter
    \noexpand\yymatchblankspac@
    \noexpand\meaning#1\the\toksd}%
}

% the idea behind the sequence below is that a leading character of category code 10
% is replaced either by a character of category code 10 and charachter code 32 or a character
% of category code 12 and character code other than 32
% note that while it is tempting to replace the definition below by something that ends in
% ... blank space #2{ ... with the hope of absorbing the result of \meaning in one step,
% this will not give the desired result in case of an active character,
% say, `~' that had been \let to the normal blank space

\expandafter\def\expandafter\yynormalizeblankspac@\expandafter#\expandafter1\the\toksd{}

\def\yystartsinspace#1{% is it \charcode 32, \catcode 10 token?
    \iffalse{\fi\yystartsinspac@#1 }%
}

\def\yystartsinspac@#1 {%
    \yystringempty{#1}{\expandafter\yysecondofthree\expandafter{\string}}{\expandafter\yythirdofthree\expandafter{\string}}%
}

\def\yystartsinbrace#1{%
  \iffalse{{\fi
  \if!\yytoks@mpty#1}}!%
    \expandafter\yysecondoftwo
  \else
    \expandafter\yyfirstoftwo
  \fi
}

\def\yystringempty#1{%
  \iffalse{{{\fi
  \ifcase\yytoks@mpty#1}}\@ne}\z@
    \expandafter\yyfirstoftwo
  \else
    \expandafter\yysecondoftwo
  \fi
}

\def\yytoks@mpty{%
    \expandafter\eatone\expandafter{\expandafter{%
        \ifcase\expandafter1\expandafter}\expandafter}\expandafter\fi\string
}

%% test code begins here

%\tracingmacros=3
%\tracingonline=3

\catcode`\ =13\relax%
\def\actspace{ }%
\catcode`\ =10\relax%

\catcode`\.=13\relax%
\def\actdotspace{.}%
\catcode`\.=12\relax%

\edef\makefunkydotspace{\let\expandafter\noexpand\actdotspace= \dotspace}
\edef\makefunkyspace{\let\expandafter\noexpand\actspace= \space}

\makefunkyspace
\makefunkydotspace

\catcode`\<=1
\catcode`\>=2
\uccode`\<=32
\uccode`\>=32

% inside the following sequence, < and > will become braces with character code 32 (space),
% \actspace will expand into an active character with character code 32, that has been \let to a
% character code 32, category code 10 token (space)

\uppercase{\edef\temptest{{ } \space\space\dotspace\expandafter\noexpand\actspace\expandafter\noexpand\actdotspace{<> {{}{{ u o l k kk
    \end\noexpand\fi\noexpand\else\noexpand\iffalse{}} }}}}}

%\uppercase{\edef\temptest{\dotspace E <>}}

\show\temptest

\def\displaypreparse#1{%
    \expandafter\errmessage\expandafter{\romannumeral-1\yypreparsetokensequenc@{\yyanalysetokens@}{#1}{}{}#1}%
}

\expandafter\displaypreparse\expandafter{\temptest}

\end

相关内容