旧答案

旧答案

您能否编写一个 LaTeX 宏,其输出是应用于参数的第一个字符标记的结果,该标记的\string类别代码为 1,它不是控制序列标记,也不是活动字符?

例如,

\catcode`\Y=1
\macro{ text text Y z{ww} z} n n nb}

应返回的结果\stringY,即 catcode-12-Y。

例如,

\catcode`\~=10
\catcode`\ =1~%
\macro{~text~text~ ~z{ww}~z}~n~n~nb}%

应返回的结果\string<catcode-1-space>,即 catcode-10-space。

您能否编写一个 LaTeX 宏,其输出是应用于类别代码为 2 的字符标记的结果,该字符\string标记不是控制序列标记也不是活动字符,并且与参数的第一个字符标记相匹配,该字符标记的类别代码为 1,它不是控制序列标记也不是活动字符?

例如,

\catcode`\Y=2
\macro{ text text { z{ww} zY n n nb}

应返回的结果\stringY,即 catcode-12-Y。

例如,

\catcode`\~=10
\catcode`\ =2~%
\macro{~text~text~{~z{ww}~z ~n~n~nb}

应返回的结果\string<catcode-1-space>,即 catcode-10-space。

(我承认我不能。)

答案1

老实说,我认为在保留字符代码的同时对显式类别 2 字符标记进行字符串化没有任何实际用途 — 在下面,我确实回答了这个问题,但我认为这是相当“学术”的事情。我提供的代码也不是真正高性能的,因为需要迭代参数的许多标记。

关于 TeX 术语:

  • TeX 的眼睛将字符转换为 TeX 的内部字符表示方案(传统 TeX 为 ASCII(8 位),XeTeX/LuaTeX 为 unicode),然后将它们传递到 TeX 的嘴里,在那里进行标记化。内部字符表示方案可以被看作是一个函数,其中字符是域,代码点号是范围/陪域。类别
    代码指的是标记化之前的字符,而不是字符标记。类别代码机制可以被看作是一个函数,其中字符(由 TeX 引擎的内部字符表示方案中的代码点号确定)是域,类别代码是陪域。类别代码反过来决定在标记化阶段遇到相应类别代码的字符时应触发的操作。
  • 因此,严格地说,字符标记没有类别代码。它们有类别。和字符代码。“类别”和“字符代码”是插入到标记流中的特定字符标记的属性。这些属性的值是在 TeX 口中进行标记化的过程中确定的。在标记化过程中,类别是通过将 catcode-régime-function 应用于字符并“查看”字符将触发什么操作来确定的。在许多情况下,要触发的操作是创建并将相应类别的字符标记附加到标记流,字符代码是通过将内部字符表示方案函数应用于字符来确定的。标记流依次进入 TeX 的咽喉,在反刍过程中发生可扩展标记的扩展。之后,不可扩展标记(以及扩展被抑制的可扩展标记)到达胃部进行进一步处理,例如执行作业、创建盒子等。字符标记的类别决定了该字符标记的处理以及它在扩展阶段和后续阶段触发的操作。
  • TeX 中有这样一个“扩展的东西”:扩展一个宏标记会“返回”宏标记的顶层扩展,而不是扩展级联的最终结果。
    因此,在描述所需输出时,请具体说明“返回”的含义,即告诉
    • 要“返回”的结果是一系列标记还是一个 .dvi/.pdf 输出文件的页面或者是终端上的一条消息或者......
    • 如果希望获得一系列标记:是否需要可扩展性(如果需要:为获得形成结果的标记而需要触发的扩展步骤的数量——使用基于\romannumeral扩展的复杂宏机制,您需要触发至少两个扩展步骤才能获得结果)或允许定义临时宏/临时宏之类的“副作用”。

宏参数保存的第 1 类(开始组)的显式字符标记的数量与第 2 类(结束组)的显式字符标记的数量相同。


“即兴”大纲:

用于查找/字符串化第一个显式类别 1 字符标记的例程和用于查找/字符串化匹配的显式类别 2 字符标记的例程都会进行迭代并从参数中删除第一个组件(该组件要么是类别 10 和字符代码 32 的显式空格标记,要么是未限定的参数本身),直到参数为空或参数具有类别 1 的前导显式字符标记。

如果为空,则例程将完成而不返回任何标记,因为参数中没有明确的 1/2 类别标记。

如果将参数简化为具有第 1 类前导显式字符标记的内容,

  • 用于查找/字符串化第一个显式类别 1 字符标记的例程用 命中该标记\string,然后在前面添加另一个左括号,然后使用考虑到显式空格标记的提取例程提取参数的第一个组件的第一个组件。

  • 用于查找/字符串化匹配的显式类别 2 字符标记的例程会进行一个循环,其中既使用用于查找/字符串化第一个显式类别 1 字符标记的例程,也使用用于提取参数的第一个组件的例程(其中考虑了显式空格标记):

    对参数的第一个组件(已知不是空格,而是嵌套在括号组内的东西)进行“操作”,检查该第一个组件是否为空。

    如果参数的第一个组件为空,则会出现类似 的情况{}etc etc。在这种情况下,将左括号字符串化,将其删除(考虑到字符串化可能会产生空格标记),用 击中右括号并\string从结果中提取第一个组件,使用考虑显式空格标记的提取例程。终止,返回提取结果。

    如果参数的第一个组件不为空,则会出现类似的情况{<stuff>}etc etc。在这种情况下,通过两次检查,您会得到四种情况:
    检查 1:使用查找/字符串化第一个显式类别 1 字符标记的例程来检查参数的第一个标记的字符串化是否产生显式的空格标记。
    检查 2:检查参数的第一个组件(即大括号组的内容{<stuff>})是否有前导空格。
    根据这些检查的结果:将开括号字符串化并将其作为空格标记或未分隔参数删除,然后删除以下组件作为空格标记或未分隔参数,然后在前面添加开括号。使用结果再次循环。

代码来了。真是太棒了\expandafter。我即兴地做了这件事,快速地摆弄了一些东西。代码肯定可以缩短。

\UD@ExtractFirstOpeningBraceStringified宏和的名称\UD@ExtractFirstOpeningBracesMatchingClosingBraceStringified是不言自明的。
由于\romannumeral-expansion,结果是通过在\UD@ExtractFirstOpeningBraceStringified/上触发两个扩展步骤来提供的\UD@ExtractFirstOpeningBracesMatchingClosingBraceStringified
运行代码不需要 ε-TeX 扩展或 Lua 扩展等。
实现不需要任何\if..\else..\fi-thingies(除了在定义宏之前进行“已定义”检查)。
没有“sentinel-tokens”或类似的东西在参数中使用是被禁止的。

\makeatletter
\errorcontextlines=10000
%%=============================================================================
%% PARAPHERNALIA:
%% \UD@firstoftwo, \UD@secondoftwo, \UD@PassFirstToSecond, \UD@Exchange,
%% \UD@removespace, \UD@stopromannumeral, \UD@CheckWhetherNull,
%% \UD@CheckWhetherLeadingExplicitSpace, \UD@CheckWhetherSpace
%%=============================================================================
\newcommand\UD@firstoftwo[2]{#1}%
\newcommand\UD@secondoftwo[2]{#2}%
\newcommand\UD@PassFirstToSecond[2]{#2{#1}}%
\newcommand\UD@Exchange[2]{#2#1}%
\@ifdefinable\UD@removespace{\UD@Exchange{ }{\def\UD@removespace}{}}%
\@ifdefinable\UD@stopromannumeral{\chardef\UD@stopromannumeral=`\^^00}%
%%-----------------------------------------------------------------------------
%% Check whether argument is empty:
%%.............................................................................
%% \UD@CheckWhetherNull{<Argument which is to be checked>}%
%%                     {<Tokens to be delivered in case that argument
%%                       which is to be checked is empty>}%
%%                     {<Tokens to be delivered in case that argument
%%                       which is to be checked is not empty>}%
%%
%% The gist of this macro comes from Robert R. Schneck's \ifempty-macro:
%% <https://groups.google.com/forum/#!original/comp.text.tex/kuOEIQIrElc/lUg37FmhA74J>
\newcommand\UD@CheckWhetherNull[1]{%
  \romannumeral\expandafter\UD@secondoftwo\string{\expandafter
  \UD@secondoftwo\expandafter{\expandafter{\string#1}\expandafter
  \UD@secondoftwo\string}\expandafter\UD@firstoftwo\expandafter{\expandafter
  \UD@secondoftwo\string}\expandafter\UD@stopromannumeral\UD@secondoftwo}{%
  \expandafter\UD@stopromannumeral\UD@firstoftwo}%
}%
%%-----------------------------------------------------------------------------
%% Check whether argument's first token is an explicit character of
%% category 1:
%%.............................................................................
%% \UD@CheckWhetherBrace{<Argument which is to be checked>}%
%%                      {<Tokens to be delivered in case that argument
%%                        which is to be checked has a leading
%%                        explicit catcode-1-character-token>}%
%%                      {<Tokens to be delivered in case that argument
%%                        which is to be checked does not have a
%%                        leading explicit catcode-1-character-token>}%
\newcommand\UD@CheckWhetherBrace[1]{%
  \romannumeral\expandafter\UD@secondoftwo\expandafter{\expandafter{%
  \string#1.}\expandafter\UD@firstoftwo\expandafter{\expandafter
  \UD@secondoftwo\string}\expandafter\UD@stopromannumeral\UD@firstoftwo}{%
  \expandafter\UD@stopromannumeral\UD@secondoftwo}%
}%
%%-----------------------------------------------------------------------------
%% Check whether brace-balanced argument starts with an explicit space-token:
%%.............................................................................
%% \UD@CheckWhetherLeadingExplicitSpace{<Argument which is to be checked>}%
%%                                     {<Tokens to be delivered in case <argument
%%                                       which is to be checked> does have a
%%                                       leading explicit space-token>}%
%%                                     {<Tokens to be delivered in case <argument
%%                                       which is to be checked> does not have a
%%                                       a leading explicit space-token>}%
\newcommand\UD@CheckWhetherLeadingExplicitSpace[1]{%
  \romannumeral\UD@CheckWhetherNull{#1}%
  {\expandafter\UD@stopromannumeral\UD@secondoftwo}%
  {%
    % Let's nest things into \UD@firstoftwo{...}{} to make sure they are nested in braces
    % and thus do not disturb when the test is carried out within \halign/\valign:
    \expandafter\UD@firstoftwo\expandafter{%
      \expandafter\expandafter\expandafter\UD@stopromannumeral
      \romannumeral\expandafter\UD@secondoftwo
      \string{\UD@CheckWhetherLeadingExplicitSpaceB.#1 }{}%
    }{}%
  }%
}%
\@ifdefinable\UD@CheckWhetherLeadingExplicitSpaceB{%
  \long\def\UD@CheckWhetherLeadingExplicitSpaceB#1 {%
    \expandafter\UD@CheckWhetherNull\expandafter{\UD@firstoftwo{}#1}%
    {\UD@Exchange{\UD@firstoftwo}}{\UD@Exchange{\UD@secondoftwo}}%
    {\expandafter\expandafter\expandafter\UD@stopromannumeral
     \expandafter\expandafter\expandafter}%
     \expandafter\UD@secondoftwo\expandafter{\string}%
  }%
}%
%%-----------------------------------------------------------------------------
%% Check whether brace-balanced argument is an explicit space-token:
%%.............................................................................
\newcommand\UD@CheckWhetherSpace[1]{%
  \romannumeral\expandafter\UD@CheckWhetherNull
               \expandafter{\UD@GobbleToExclam#1!}{%
    \expandafter\UD@firstoftwo\expandafter{%
      \UD@SpaceFork!#1!{\UD@firstoftwo}! !{\UD@secondoftwo}!!!!%
    }{}%
  }{\expandafter\UD@stopromannumeral\UD@secondoftwo}%
}%
\@ifdefinable\UD@SpaceFork{%
  \long\def\UD@SpaceFork#1! !#2#3!!!!{\expandafter\UD@stopromannumeral#2}%
}%
\@ifdefinable\UD@GobbleToExclam{\long\def\UD@GobbleToExclam#1!{}}%
%%=============================================================================
%% Extract first inner undelimited argument:
%%
%%   \UD@ExtractFirstArg{ABCDE} yields  A
%%
%%   \UD@ExtractFirstArg{{AB}CDE} yields  AB
%%
%% Due to \romannumeral-expansion the result is delivered after two 
%% expansion-steps/after "hitting" \UD@ExtractFirstArg with \expandafter
%% twice.
%%
%% \UD@ExtractFirstArg's argument must not be blank.
%% This case can be cranked out via \UD@CheckWhetherBlank before calling
%% \UD@ExtractFirstArg.
%%
%% Use frozen-\relax as delimiter for speeding things up.
%% I chose frozen-\relax because David Carlisle pointed out in
%% <https://tex.stackexchange.com/a/578877>
%% that frozen-\relax cannot be (re)defined in terms of \outer and cannot be
%% affected by \uppercase/\lowercase.
%%
%% \UD@ExtractFirstArg's argument may contain frozen-\relax:
%% The only effect is that internally more iterations are needed for
%% obtaining the result.
%%
%%.............................................................................
\@ifdefinable\UD@RemoveTillFrozenrelax{%
  \expandafter\expandafter\expandafter\UD@Exchange
  \expandafter\expandafter\expandafter{%
  \expandafter\expandafter\ifnum0=0\fi}%
  {\long\def\UD@RemoveTillFrozenrelax#1#2}{{#1}}%
}%
\expandafter\UD@PassFirstToSecond\expandafter{%
  \expandafter\romannumeral\expandafter\UD@ExtractFirstArgLoop
  \expandafter{\expandafter#\expandafter1%
  \romannumeral\expandafter\expandafter\expandafter\UD@stopromannumeral
  \expandafter\expandafter\ifnum0=0\fi}%
}{\newcommand\UD@ExtractFirstArg[1]}%
\newcommand\UD@ExtractFirstArgLoop[1]{%
  \expandafter\UD@CheckWhetherNull\expandafter{\UD@firstoftwo{}#1}%
  {\expandafter\UD@stopromannumeral\UD@secondoftwo{}#1}%
  {\expandafter\UD@ExtractFirstArgLoop\expandafter{\UD@RemoveTillFrozenrelax#1}}%
}%
%%=============================================================================
%% Extract first inner component, either being a space or being an undelimited
%% argument:
%%
%%   \romannumeral\UD@Romannumeral@ExtractFirstComponent{ABCDE} yields  A
%%
%%   \romannumeral\UD@Romannumeral@ExtractFirstComponent{{AB}CDE} yields  AB
%%
%%   \romannumeral\UD@Romannumeral@ExtractFirstComponent{ ABCDE} yields  <explicit space token>
%%
%%   \romannumeral\UD@Romannumeral@ExtractFirstComponent{ {AB}CDE} yields  <explicit space token>
%%
%% Due to \romannumeral-expansion the result is delivered after two 
%% expansion-steps/after "hitting" \UD@ExtractFirstArg with \expandafter
%% twice.
%%
%% \UD@Romannumeral@ExtractFirstComponent's argument must not be empty.
%% This case can be cranked out via \UD@CheckWhetherNull before calling
%% \UD@Romannumeral@ExtractFirstComponent.
%%.............................................................................
\newcommand\UD@Romannumeral@ExtractFirstComponent[1]{%
  \UD@CheckWhetherLeadingExplicitSpace{#1}{%
    \UD@firstoftwo{\UD@stopromannumeral}{} %
  }{%
    \expandafter\expandafter\expandafter\UD@stopromannumeral
    \UD@ExtractFirstArg{#1}%
  }%
}%
%%=============================================================================
%% \UD@ExtractFirstOpeningBraceStringified{<tokens>}
%%
%% Obtain \string-representation of argument's first explicit category-1-
%% character token.
%%
%% Due to \romannumeral-expansion the result is delivered after two 
%% expansion-steps/after "hitting" \UD@ExtractFirstOpeningBraceStringified
%% with \expandafter twice.
%% If the argument does not have an opening brace you get emptiness.
%%.............................................................................
\newcommand\UD@ExtractFirstOpeningBraceStringified[1]{%
  \romannumeral\UD@ExtractFirstOpeningbraceStringifiedloop{#1}%
}%
\newcommand\UD@ExtractFirstOpeningbraceStringifiedloop[1]{%
  \UD@CheckWhetherNull{#1}{\UD@stopromannumeral}{%
    \UD@CheckWhetherBrace{#1}{%
      \expandafter\UD@Romannumeral@ExtractFirstComponent
      \expandafter{%
        \romannumeral\expandafter\UD@Romannumeral@ExtractFirstComponent
        \expandafter{%
        \romannumeral\expandafter\expandafter\expandafter\UD@stopromannumeral
        \expandafter\UD@firstoftwo\expandafter{\expandafter}%
        \romannumeral\expandafter\expandafter\expandafter\UD@stopromannumeral
        \expandafter\string\expandafter}%
        \string#1%
      }%
    }{%
      \UD@CheckWhetherLeadingExplicitSpace{#1}{%
        \expandafter\UD@ExtractFirstOpeningbraceStringifiedloop
        \expandafter{\UD@removespace#1}%
      }{%
        \expandafter\UD@ExtractFirstOpeningbraceStringifiedloop
        \expandafter{\UD@firstoftwo{}#1}%
      }%
    }%
  }%
}%
%%=============================================================================
%% \UD@ExtractFirstOpeningBracesMatchingClosingBraceStringified{<tokens>}
%%
%% Obtain \string-representation of argument's explicit category-2-
%% character token that matches argument's first explicit category-1-
%% character token.
%%
%% Due to \romannumeral-expansion the result is delivered after two 
%% expansion-steps/after "hitting" \UD@ExtractFirstClosingBraceStringified
%% with \expandafter twice.
%% If the argument does not have a closing brace you get emptiness.
%%.............................................................................
\newcommand\UD@ExtractFirstOpeningBracesMatchingClosingBraceStringified[1]{%
  \romannumeral
  \UD@ExtractFirstClosingbraceStringifiedloop{#1}%
}%
\newcommand\UD@ExtractFirstClosingbraceStringifiedloop[1]{%
  \UD@CheckWhetherNull{#1}{%
    \UD@stopromannumeral
  }{%
    \UD@CheckWhetherBrace{#1}{%
       \UD@ExtractFirstClosingbraceStringifiedloopB{#1}%
    }{%
      \UD@CheckWhetherLeadingExplicitSpace{#1}{%
        \expandafter\UD@ExtractFirstClosingbraceStringifiedloop
        \expandafter{\UD@removespace#1}%
      }{%
        \expandafter\UD@ExtractFirstClosingbraceStringifiedloop
        \expandafter{\UD@firstoftwo{}#1}%
      }%
    }%
  }%
}%
\newcommand\UD@mergeargs[3]{%
  \expandafter#1\expandafter{%
    \romannumeral\expandafter\UD@CheckWhetherSpace
    \expandafter{\romannumeral\UD@ExtractFirstOpeningbraceStringifiedloop{#3}}{%
      \UD@Exchange{\expandafter\UD@removespace}%
    }{%
      \UD@Exchange{\expandafter\UD@firstoftwo\expandafter{\expandafter}}%
    }%
    {\expandafter#2\romannumeral\expandafter\expandafter\expandafter\UD@stopromannumeral}%
    \string#3%
  }%
}%
\newcommand\UD@ExtractFirstClosingbraceStringifiedloopB[1]{%
  \expandafter\UD@CheckWhetherNull
  \expandafter{\romannumeral\UD@Romannumeral@ExtractFirstComponent{#1}}{%
    \UD@mergeargs{\UD@Romannumeral@ExtractFirstComponent}{%
      \expandafter\expandafter\UD@stopromannumeral\expandafter\string
    }%
  }{%
    \UD@mergeargs{\UD@ExtractFirstClosingbraceStringifiedloopB}{%
      \UD@CheckWhetherLeadingExplicitSpace
      \expandafter{\romannumeral\UD@Romannumeral@ExtractFirstComponent{#1}}{%
        \UD@Exchange{\expandafter\UD@removespace}%
      }{%
        \UD@Exchange{\expandafter\UD@firstoftwo\expandafter{\expandafter}}%
      }%
      {%
        \expandafter\UD@stopromannumeral\expandafter{\romannumeral
        \expandafter\UD@firstoftwo\expandafter{\expandafter}\string}%
        \expandafter\expandafter\expandafter\UD@stopromannumeral
      }%
    }%
  }{#1}%
}%

\message{%
  ^^JResult:  |\UD@ExtractFirstOpeningBraceStringified{ n A { X{m } j}{}jh}|%
}%

\begingroup
\catcode`\Y=1
\message{%
  ^^JResult:  |\UD@ExtractFirstOpeningBraceStringified{ n A Y X{m } j}{}jh}|%
}%
\endgroup

\begingroup
\catcode`\~=10
\catcode`\ =1~%
\message{%
^^JResult:~~|\UD@ExtractFirstOpeningBraceStringified{~n~A~ ~X{m~}~j}{}jh}|%
}%
\endgroup

\message{%
  ^^JResult:  |\UD@ExtractFirstOpeningBraceStringified{ n A  Xm  jjh}|%
}%

\begingroup
\catcode`\Y=1
\expandafter\expandafter\expandafter\def
\expandafter\expandafter\expandafter\test
\expandafter\expandafter\expandafter{%
  \UD@ExtractFirstOpeningBraceStringified{ n A Y X{m } j}{}jh}%
}%
\message{%
  ^^JResult:  |\meaning\test|%
}%
\endgroup

\message{%
  ^^J------------------------------------------------------%
}%

\message{%
  ^^JResult:  |\UD@ExtractFirstOpeningBracesMatchingClosingBraceStringified{ n A { X{m } j}{}jh}|%
}%

\begingroup
\catcode`\Y=2
\message{%
  ^^JResult:  |\UD@ExtractFirstOpeningBracesMatchingClosingBraceStringified{ n A { X{m } jY{}jh}|%
}%
\endgroup

\begingroup
\catcode`\~=10
\catcode`\ =2~%
\message{%
~~^^JResult:~~|\UD@ExtractFirstOpeningBracesMatchingClosingBraceStringified{~n~A~{~X{m~}~j {}jh}|%
}%
\endgroup

\message{%
  ^^JResult:  |\UD@ExtractFirstOpeningBracesMatchingClosingBraceStringified{ n A  Xm  jjh}|%
}%

\begingroup
\catcode`\Y=2
\expandafter\expandafter\expandafter\def
\expandafter\expandafter\expandafter\test
\expandafter\expandafter\expandafter{%
  \UD@ExtractFirstOpeningBracesMatchingClosingBraceStringified{ n A { X{m } jY{}jh}%
}%
\message{%
  ^^JResult:  |\meaning\test|%
}%
\endgroup

\stop

终端输出:

$ pdflatex test.tex
This is pdfTeX, Version 3.14159265-2.6-1.40.21 (TeX Live 2020) (preloaded format=pdflatex)
 restricted \write18 enabled.
entering extended mode
(./test.tex
LaTeX2e <2020-10-01> patch level 4
L3 programming layer <2021-02-18> 
Result: |{| 
Result: |Y| 
Result: | | 
Result: || 
Result: |macro:->Y| 
------------------------------------------------------ 
Result: |}| 
Result: |Y| 
Result: | | 
Result: || 
Result: |macro:->Y| )
No pages of output.
Transcript written on test.log.

答案2

使用\tl_analysis_map_inline:nn。内部代码可以使用三个参数:一个是标记的表示形式,然后是字符代码(如果该项目是字符),最后是类别代码。

您可以使用第二个参数来生成类别代码为 12 的对应字符。

注意:这不适用于\bgroup,但您并没有要求这种情况。

\documentclass{article}

\ExplSyntaxOn

\NewDocumentCommand{\macro}{m}
 {
  \bool_set_true:N \l__egreg_leftbrace_notfound_bool
  \tl_analysis_map_inline:nn { #1 } { \__egreg_leftbrace:nnn { ##1 } { ##2 } { ##3 } }
  \str_show:N \l_tmpa_str
 }

\bool_new:N \l__egreg_leftbrace_notfound_bool
\cs_new_protected:Nn \__egreg_leftbrace:nnn
 {
  \bool_lazy_and:nnT { \l__egreg_leftbrace_notfound_bool } { \int_compare_p:n { "#3 = 1 } }
   {
    \str_set:Nx \l_tmpa_str { \char_generate:nn { #2 } { 12 } }
    \bool_set_false:N \l__egreg_leftbrace_notfound_bool
   }
 }

\ExplSyntaxOff

\begin{document}

\begingroup
\catcode`Y=1
\macro{ text text Y z{ww} z} n n nb}
\endgroup

\macro{abc}

\macro{a{b}c}

\end{document}

控制台输出:

> \l_tmpa_str=Y.
<recently read> }

l.28 \macro{ text text Y z{ww} z} n n nb}

?
> \l_tmpa_str=.
<recently read> }

l.31 \macro{abc}

?
> \l_tmpa_str={.
<recently read> }

l.33 \macro{a{b}c}

?

答案3

Ulrich Diez 的回答已经涵盖了“如何做”部分。本着“百分之百正确”的精神,以下算法将始终最多需要O(N log N)时间和在)如果分隔符标记不存在于标记列表中,则为时间)。

目前还没有实现,但是这个想法应该可行,并且我们相对确定(渐近)时间复杂度是正确计算的。

算法。

该算法的描述如下。

  1. 首先,使用#{吞噬 直到第一个{}(不用说,添加尾随{}以避免出现标记列表中没有支撑组的情况)
    • 然后相应地处理空标记列表情况。
  2. 然后,计算头部项目的数量.(这可以在没有明确取头部的情况下完成,通过将它们放在标记列表中,然后在计数之后跳过计数本身以添加括号并删除尾部)
  3. 使用函数来删除那么多项目。
  4. 处理任何剩余的空格字符。这部分并不难。
  5. 字符串化并获取结果。回想一下,字符串化}是一个单一标记,因此可以在线性时间内完成。

第 2 步是这里的瓶颈O(N log N)时间复杂度。其余部分需要线性时间。(以线性递归深度为代价\romannumeral


旧答案

(使用与 Ulrich Diez 的答案相同的方法)

因为我觉得编写上述复杂代码太复杂了,所以我决定编写一个“编译器”,编译易于理解的命令式代码,例如...这是提取第一个字符串的宏的定义}

\rdeflinenumbered \firstegroup #x {}!
    \while {}{ \ifnotempty {#x} } {
        \conditional {\ifbrace {#x}} {
            % found the first group.

            % first make sure the open brace has charcode `{` (anything not a space will do.).
            \assignoperate #x {\string #x} {
                \expandonce
                \rcall{\putnextbgroup}
            }

            % then empty out that group.
            \while {
                \assignr #\firstcomponent {\firstarg{#x}}
            } {
                \ifnotempty {#\firstcomponent}
            }{
                % firstcomponent is still nonempty. Pick one item
                \conditional{\ifspace {#\firstcomponent}} {
                    \assignoperate #x {#x} {
                        % following in the input stream:  { <space> ... }  ...
                        \putnext{\string} \expandonce
                        % following in the input stream:  '{' <space> ... }  ...  where the initial { is stringified and is definitely not a space
                        \matchrm{#1 ~}
                        % following in the input stream:    ... }  ...
                        \rcall{\putnextbgroup}
                    }
                } {
                    \assignoperate #x {#x} {
                        % following in the input stream:  { <item> ... }  ...
                        \putnext{\string} \expandonce
                        % following in the input stream:  '{' <item> ... }  ...  where the initial { is stringified and is definitely not a space
                        \matchrm{#1 #2}
                        % following in the input stream:    ... }  ...
                        \rcall{\putnextbgroup}
                    }
                }
                % now firstcomponent is shorter.
            }

            % finally firstcomponent is empty now. (and the opening brace is guaranteed to be non-space)
            \assignoperate #x {#x} {
                % following in the input stream: { } ...
                \putnext{\string} \expandonce
                % following in the input stream: '{' } ...
                \matchrm{#1}
                % following in the input stream:     } ...
                \putnext{\string} \expandonce
                % following in the input stream:     '}' ...
            }

            % finally done
            \assignr #x{\firstcomponent{#x}}
            \return{#x}

        } {
            \conditional {\ifspace {#x}} {
                \assignr #x{\removespace {#x}}
            } {
                \assignr #x{\dropfirst {#x}}
            }
        }
    }
    \return {}
!

一组 TeX 宏。

目前几乎没有文档,但希望您能大致了解代码的作用(例如......

  • while进行 while 循环
  • conditional执行条件(“if”语句)
  • assign“分配”值给标记列表
  • assignr“计算”函数的结果然后将其分配给标记列表
  • matchrm在输入流中向前匹配模式然后删除它,将匹配的值分配给标记列表
  • putnext将标记放在输入流中
  • rcall调用“子程序”
  • ETC。

目前编译器速度很慢,并且生成代码臃肿,这个问题以后会得到改善。另外它不在 CTAN 上。如果您有兴趣,可以尝试找到一些从源代码运行它的方法。

该编译器是用LuaTeX实现的,但生成的代码可以在任何编译器中使用。

如需演示,请尝试运行以下代码

%! TEX program = pdflatex
\documentclass{article}
\usepackage{filecontentsdef}
\begin{document}
\ExplSyntaxOn

% ======== some auxiliary macros ========
\def\__process_char #1 #2 {
    %\prettye:n{\expandafter \expandafter \expandafter \noexpand \char_generate:nn {`#2} {"#1}}
    \expandafter \expandafter \expandafter \noexpand \char_generate:nn {`#2} {"#1}
    \__process_s
}

\def\__process_space_other_cat #1 {
    \expandafter \expandafter \expandafter \noexpand \char_generate:nn {32} {"#1}
    \__process_s
}

\def\__process_cs #1 / {
    \expandafter \noexpand \csname #1 \endcsname
    \__process_s
}

\def\__process_s#1{
    \token_if_eq_charcode:NNTF #1 0 { % 0 <name> / → the control sequence
        \__process_cs
    } {
        \token_if_eq_charcode:NNTF #1 s { ~   \__process_s
        } {
            \token_if_eq_charcode:NNTF #1 S { % S <cat> → a space
                \__process_space_other_cat
            } {
                \token_if_eq_charcode:NNF #1 . { % . → end
                    \__process_char #1
                }
            }
        }
    }
}

% main handler function, will exec the resulting token list.
\def\__process_all#1{
    \begingroup \exp_last_unbraced:Nx \endgroup {\__process_s #1}
}
\ExplSyntaxOff
\begin{filecontentsdefmacro}{\data}
0def/0stzz241/s1{0exp_end:/2}0def/0removespace/6#6#C11{0expandafter/0stzz584/0expandafter/1{0exp:w/0stzz241/6#6#C12}2}0def/0ifempty/6#6#C11{0iffalse/1{0fi/0expandafter/0use_none:n/0expandafter/1{0expandafter/1{0string/6#6#C12}0@ifempty@casei/2}0@ifempty@caseii/2}0use_i:nn/2}0def/0@ifempty@casei/1{0exp:w/0removenextegroup/2}0def/0@ifempty@caseii/1{0expandafter/0stzz309/0exp:w/0removenextegroup/2}0def/0stzz309/6#6#C11{0use_ii:nn/2}0def/0ifbrace/6#6#C11{0iffalse/1{0fi/0expandafter/0use_none:n/0expandafter/1{0expandafter/1{0string/6#6#C12}0@ifbrace@casei/2}0@ifbrace@caseii/2}0use_ii:nn/2}0def/0@ifbrace@casei/1{0expandafter/0stzz353/0exp:w/0removenextegroup/2}0def/0stzz353/1{0expandafter/0stzz354/0exp:w/0removeuntilegroup/2}0def/0stzz354/6#6#C11{0use_i:nn/2}0def/0@ifbrace@caseii/1{0exp:w/0removenextegroup/2}0def/0removeuntilegroup/1{0expandafter/0stzz400/0expandafter/1{0iffalse/2}0fi/2}0def/0putnextbgroup/1{0expandafter/0exp_end:/0expandafter/1{0iffalse/2}0fi/2}0def/0putnextegroup/1{0expandafter/0exp_end:/0iffalse/1{0fi/2}2}0def/0stzz400/6#6#C11{0exp_end:/2}0def/0dropfirst/6#6#C11{0expandafter/0stzz584/0expandafter/1{0exp:w/0stzz400/6#6#C12}2}0def/0stzz411/6#6#C1s1{0expandafter/0stzz425/0expandafter/1{0exp:w/0dropfirst/1{6#6#C12}2}2}0def/0stzz425/6#6#C11{0ifempty/1{6#6#C12}0stzz426/0stzz429/2}0def/0stzz429/1{0expandafter/0stzz434/0expandafter/0use_ii:nn/0exp:w/0removeuntilegroup/2}0def/0stzz426/1{0expandafter/0stzz434/0expandafter/0use_i:nn/0exp:w/0removeuntilegroup/2}0def/0stzz434/6#6#C11{0expandafter/0stzz584/0expandafter/6#6#C10exp:w/0putnextegroup/2}0def/0ifspace/6#6#C11{0expandafter/0stzz439/0expandafter/1{0exp:w/0stzz411/C.6#6#C1s2}2}0def/0stzz439/6#6#C11{6#6#C12}0def/0ifnotempty/6#6#C11{0ifempty/1{6#6#C12}0use_ii:nn/0use_i:nn/2}0def/0stzz464/6#6#C16#6#C20relax/1{0exp_end:/1{6#6#C12}2}0def/0firstarg/6#6#C11{0expandafter/0stzz463c/0expandafter/1{0exp:w/0dropfirst/1{6#6#C10relax/2}2}1{6#6#C10relax/2}2}0def/0stzz463a/6#6#C11{0expandafter/0stzz463b/0expandafter/1{0exp:w/0stzz464/6#6#C12}2}0def/0stzz463b/6#6#C11{0expandafter/0stzz463c/0expandafter/1{0exp:w/0dropfirst/1{6#6#C12}2}1{6#6#C12}2}0def/0stzz463c/6#6#C16#6#C21{0ifnotempty/1{6#6#C12}0stzz463a/0stzz472/1{6#6#C22}2}0def/0stzz472/6#6#C11{0expandafter/0stzz584/0expandafter/1{0use:n/6#6#C12}2}0def/0firstargsingletoken/6#6#C11{0expandafter/0stzz484/0removebgroup/1{6#6#C12}2}0def/0stzz484/6#6#C11{0stzz486/6#6#C12}0def/0stzz486/6#6#C11{0expandafter/0stzz487/0expandafter/6#6#C10exp:w/0putnextbgroup/2}0def/0stzz487/6#6#C16#6#C21{0exp_end:/6#6#C12}0def/0firstcomponent/6#6#C11{0ifspace/1{6#6#C12}0stzz495/1{0stzz497/1{6#6#C12}2}2}0def/0stzz497/6#6#C11{0expandafter/0stzz584/0expandafter/1{0exp:w/0firstarg/1{6#6#C12}2}2}0def/0stzz495/1{0exp_end:/s2}0def/0stzz509/1{0expandafter/0putnextbgroup/2}0def/0firstbgroup/6#6#C11{0ifnotempty/1{6#6#C12}1{0stzz505a/1{6#6#C12}2}0exp_end:/2}0def/0stzz505a/6#6#C11{0ifbrace/1{6#6#C12}0stzz520c/0stzz519/1{6#6#C12}2}0def/0stzz519/6#6#C11{0ifspace/1{6#6#C12}0stzz522b/0stzz522/1{6#6#C12}2}0def/0stzz522/6#6#C11{0expandafter/0stzz520a/0expandafter/1{0exp:w/0dropfirst/1{6#6#C12}2}2}0def/0stzz522b/6#6#C11{0expandafter/0stzz520a/0expandafter/1{0exp:w/0removespace/1{6#6#C12}2}2}0def/0stzz520a/6#6#C11{0ifnotempty/1{6#6#C12}1{0stzz505a/1{6#6#C12}2}0exp_end:/2}0def/0stzz520c/6#6#C11{0expandafter/0stzz515/0expandafter/1{0exp:w/0stzz509/0string/6#6#C12}2}0def/0stzz515/6#6#C11{0expandafter/0stzz583/0expandafter/1{0exp:w/0firstarg/1{6#6#C12}2}2}0def/0stzz537/1{0expandafter/0putnextbgroup/2}0def/0stzz559/1{0expandafter/0stzz563/0string/2}0def/0stzz563/6#6#C16#6#C21{0putnextbgroup/2}0def/0stzz550/1{0expandafter/0stzz554/0string/2}0def/0stzz554/6#6#C1s1{0putnextbgroup/2}0def/0stzz572/1{0expandafter/0stzz576/0string/2}0def/0stzz576/6#6#C11{0expandafter/0exp_end:/0string/2}0def/0firstegroup/6#6#C11{0ifnotempty/1{6#6#C12}1{0stzz532a/1{6#6#C12}2}0exp_end:/2}0def/0stzz532a/6#6#C11{0ifbrace/1{6#6#C12}0stzz588c/0stzz587/1{6#6#C12}2}0def/0stzz587/6#6#C11{0ifspace/1{6#6#C12}0stzz590b/0stzz590/1{6#6#C12}2}0def/0stzz590/6#6#C11{0expandafter/0stzz588a/0expandafter/1{0exp:w/0dropfirst/1{6#6#C12}2}2}0def/0stzz590b/6#6#C11{0expandafter/0stzz588a/0expandafter/1{0exp:w/0removespace/1{6#6#C12}2}2}0def/0stzz588a/6#6#C11{0ifnotempty/1{6#6#C12}1{0stzz532a/1{6#6#C12}2}0exp_end:/2}0def/0stzz588c/6#6#C11{0expandafter/0stzz550b/0expandafter/1{0exp:w/0stzz537/0string/6#6#C12}2}0def/0stzz543a/6#6#C16#6#C21{0ifspace/1{6#6#C22}0stzz559c/0stzz559a/1{6#6#C12}2}0def/0stzz559a/6#6#C11{0expandafter/0stzz550b/0expandafter/1{0exp:w/0stzz559/6#6#C12}2}0def/0stzz559c/6#6#C11{0expandafter/0stzz550b/0expandafter/1{0exp:w/0stzz550/6#6#C12}2}0def/0stzz550b/6#6#C11{0expandafter/0stzz544a/0expandafter/1{0exp:w/0firstarg/1{6#6#C12}2}1{6#6#C12}2}0def/0stzz544a/6#6#C16#6#C21{0ifnotempty/1{6#6#C12}1{0stzz543a/1{6#6#C22}1{6#6#C12}2}1{0stzz572a/1{6#6#C22}2}2}0def/0stzz572a/6#6#C11{0expandafter/0stzz583/0expandafter/1{0exp:w/0stzz572/6#6#C12}2}0def/0stzz583/6#6#C11{0expandafter/0stzz584/0expandafter/1{0exp:w/0firstcomponent/1{6#6#C12}2}2}0def/0stzz584/6#6#C11{0exp_end:/6#6#C12}.
\end{filecontentsdefmacro}
\ExplSyntaxOn
\exp_args:NV \__process_all \data
\ExplSyntaxOff
\let\removenextegroup\removeuntilegroup




\message{%
  ^^JResult:  |\romannumeral\firstbgroup{ n A { X{m } j}{}jh}|%
}%

\begingroup
\catcode`\Y=1
\message{%
  ^^JResult:  |\romannumeral\firstbgroup{ n A Y X{m } j}{}jh}|%
}%
\endgroup

\begingroup
\catcode`\~=10
\catcode`\ =1~%
\message{%
^^JResult:~~|\romannumeral\firstbgroup{~n~A~ ~X{m~}~j}{}jh}|%
}%
\endgroup

\message{%
  ^^JResult:  |\romannumeral\firstbgroup{ n A  Xm  jjh}|%
}%

\begingroup
\catcode`\Y=1
\expandafter\expandafter\expandafter\def
\expandafter\expandafter\expandafter\test
\expandafter\expandafter\expandafter{%
  \romannumeral\firstbgroup{ n A Y X{m } j}{}jh}%
}%
\message{%
  ^^JResult:  |\meaning\test|%
}%
\endgroup

\message{%
  ^^J------------------------------------------------------%
}%

\message{%
  ^^JResult:  |\romannumeral\firstegroup{ n A { X{m } j}{}jh}|%
}%

\begingroup
\catcode`\Y=2
\message{%
  ^^JResult:  |\romannumeral\firstegroup{ n A { X{m } jY{}jh}|%
}%
\endgroup

\begingroup
\catcode`\~=10
\catcode`\ =2~%
\message{%
~~^^JResult:~~|\romannumeral\firstegroup{~n~A~{~X{m~}~j {}jh}|%
}%
\endgroup

\message{%
  ^^JResult:  |\romannumeral\firstegroup{ n A  Xm  jjh}|%
}%

\begingroup
\catcode`\Y=2
\expandafter\expandafter\expandafter\def
\expandafter\expandafter\expandafter\test
\expandafter\expandafter\expandafter{%
  \romannumeral\firstegroup{ n A { X{m } jY{}jh}%
}%
\message{%
  ^^JResult:  |\meaning\test|%
}%
\endgroup

\end{document}

演示部分复制自Ulrich Diez 的回答

所使用的算法也与该答案大致相同。

源代码请参见https://github.com/user202729/TeXlib/blob/main/test_imperative.tex#L487等等,如果你有兴趣的话。(该文件可以在 LuaLaTeX 中使用适当的库运行,尽管输出显示在 HTML 文件中,但现在我更喜欢使用我的prettytok包来显示输出)

由于一些限制,\relax代替(冻结放松)令牌,这在某些情况下会使算法变慢。

关于编译的代码——输出是一些可扩展的宏,例如,但是我让程序生成在正常 catcode 中不能很好标记的标记,因此使用辅助函数来重建标记列表。

示例生成的代码(正如我之前提到的,当前代码非常臃肿。稍后会修复)

在此处输入图片描述

相关内容