打印所有 token 的 catcode 的宏

打印所有 token 的 catcode 的宏

主要是为了调试目的,我想要一个宏来打印标记列表中的所有标记及其对应的 catcode。

基于Joseph Wright 的回答我编写了一个宏,它返回一个标记的 catcode。

基于这个问题它显示了三个不同的循环,我看到了两种完全不同的方法,它们对标记进行迭代:通过将显式标记作为参数传递来处理显式标记,或使用来处理隐式标记\let。后者似乎更有希望,因此我将其称为方法 1。

我遇到了以下问题:

  • 使用方法 1(隐式标记):
    • 如果隐式标记没有 catcode 11 或 12(字母或其他),我不知道如何打印隐式标记的符号。
    • 活动字符打印为 catcode 16。
  • 使用方法 2 (显式标记):
    • 空间被吞噬。
    • {并且}不被打印出来而是用来分组。

我不认为第二种方法的问题能够得到解决。但我希望第一种方法能够有所改进,或者有第三种更好的方法,而这正是我所忽略的。

\documentclass{article}
\usepackage{filecontents}

% ========== get catcode ==========
% I am not using something like \the\catcode`#1 because:
% (1) I want the catcode of the token, 
%     not the catcode which a token would get 
%     if it was created at this position
% (2) that would not work with implicit tokens

\makeatletter
\@firstofone{\let\implicitSpaceToken= }
\makeatother

% based on https://tex.stackexchange.com/a/7413/120953
\newcommand{\getCatcode}[1]{%
    %    0: escape character, no tokens of that catcode exist
    \ifcat \egroup\noexpand#1%
         1%
    \else\ifcat \bgroup\noexpand#1%
         2%
    \else\ifcat $\noexpand#1%$ (the commented out dollar sign is important for the syntax highlighting in TeXstudio)
         3%
    \else\ifcat &\noexpand#1%
         4%
    %    5: end of line, no tokens of that catcode exist
    \else\ifcat ##\noexpand#1%
         6%
    \else\ifcat ^\noexpand#1%
         7%
    \else\ifcat _\noexpand#1%
         8%
    %    9: ignored character, no tokens of that catcode exist
    \else\ifcat \implicitSpaceToken\noexpand#1%
        10%
    \else\ifcat a\noexpand#1%
        11%
    \else\ifcat 1\noexpand#1%
        12%
    \else\ifcat \noexpand~\noexpand#1%
        13%
    %   14: comment character, no tokens of that catcode exist
    %   15: invalid character, no tokens of that catcode exist
    \else\ifcat \relax\noexpand#1%
        16%
    \else
        error% this can not happen
    \fi \fi \fi \fi \fi \fi \fi \fi \fi \fi \fi \fi
}


% ========== approach 1: loop based on implicit tokens ==========
% + does not ignore spaces
% + no problems with groups
% - wrong catcode for active characters (16 instead of 13)
% - I don't see a feasable way to print the characters (other than those with catcode 10 and 11)

\begin{filecontents}{loop-implicit-tokens.tex}
\def\printtokens#1{
%   \def\do##1{$\texttt{\string##1}_{\getCatcode##1}$}%
    \def\do##1{%
        \edef\i{\getCatcode{##1}}%
        (\i%
        \ifnum \i = 11\relax
            :\,\texttt{##1}%
        \else\ifnum \i = 12\relax
            :\,\texttt{##1}%
        \fi \fi
        )%
    }%
    \iterate#1\relax
}

% based on https://tex.stackexchange.com/q/359189/120953
\def\iterate{\afterassignment\loopbody\let\xchar= }
\def\loopbody{%
    \ifx\relax\xchar
        \let\next=\relax
    \else
        \do\xchar
        \let\next=\iterate
    \fi
    \next
}
\end{filecontents}


% ========== approach 2: loop based on explicit tokens ==========
% + prints correct catcode of active characters
% + possible to print the character
% - ignores *explicit* tokens with catcode 10 (space)
% - problems with *explicit* tokens of catcodes 1 and 2 (groups)

\begin{filecontents*}{loop-explicit-tokens.tex}
\def\printtokens#1{
    \def\do##1{$\texttt{\string##1}_{\getCatcode##1}$}%
    \iterate#1\relax
}

% based on https://tex.stackexchange.com/q/359189/120953
\def\iterate#1{%
    \ifx\relax#1%
    \else
        \do{#1}%
        \expandafter\iterate
    \fi
}
\end{filecontents*}


% ========== main document ==========

\input{loop-implicit-tokens}
%\input{loop-explicit-tokens}

\newcommand{\printtokensinmacro}[1]{\expandafter\printtokens\expandafter{#1}}

\makeatletter
\newcommand{\test}{@ $i_\text{di}^2$&##~}
\makeatother

\begin{document}
\printtokensinmacro\test

\end{document}

答案1

仅使用 TeX 基元的解决方案在这里。您不需要 expl3、LaTeX 等。结果与\showcatcodes此处介绍的另一个解决方案相同。

\def\showcatcodes#1{\showcA#1\showcA}
\def\showcA{\let\next=\showcC \futurelet\nextc\showcB}
\def\showcB{%
   \ifx\nextc\showcA \def\next##1{}\fi
   \expandafter\ifx\space\nextc \def\next{\showcD\ {10}}\fi
   \ifx\nextc{\def\next{\showcD\{{1}}\fi
   \ifx\nextc}\def\next{\showcD\}{2}}\fi
   \next
}
\def\showcC#1{{\tt\string#1}\expandafter
   \ifcat\noexpand#1\relax \showcE{16}\else \showcE{\the\catcode`#1}\fi
   \showcA
}
\def\showcD#1#2{{\tt\char`#1}\showcE{#2}\afterassignment\showcA \let\nextc= }
\def\showcE#1{${}_{#1}$\thinspace}

\showcatcodes{a~b{cd{e1}}2 3!$_ ^y\xxx} 

\end

编辑关于 DavidCarlisle 的评论(如下),我添加了我的代码的第二个版本:

\def\showcatcodes#1{\showcA#1\showcA}
\def\showcA{\let\next=\showcC \futurelet\nextc\showcB}
\def\showcB{%
   \ifx\nextc\showcA \def\next##1{}\fi
   \ifcat\space\noexpand\nextc \def\next{\showcD\ {10}}\fi
   \ifcat\noexpand\nextc{\def\next{\showcD\{{1}}\fi
   \ifcat\noexpand\nextc}\def\next{\showcD\}{2}}\fi
   \next
}
\def\showcC#1{{\tt\string#1}\showcE{%
   \ifcat\noexpand#1$3\fi \ifcat\noexpand#1&4\fi \ifcat\noexpand#1##6\fi
   \ifcat\noexpand#1^7\fi \ifcat\noexpand#1_8\fi \ifcat\noexpand#1x11\fi 
   \ifcat\noexpand#1:12\fi \ifcat\noexpand#1\noexpand~13\fi 
   \ifcat\noexpand#1\hbox16\fi
   }\showcA
}
\def\showcD#1#2{{\tt\char`#1}\showcE{#2}\afterassignment\showcA \let\nextc= }
\def\showcE#1{${}_{#1}$\thinspace}

\showcatcodes{a~b{cd{e1}}2 3!$_ ^y\xxx} 

\end

答案2

你可以使用变体https://tex.stackexchange.com/a/358697/4427

\documentclass{article}
\usepackage{expl3,xparse}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\ExplSyntaxOn

\NewDocumentCommand\showcatcodes { m }
 {
  \group_begin:
  \ttfamily
  \tl_set:Nn \l_tmpa_tl { #1 }
  \jakun_remove_braces:
  \regex_extract_all:nVN { . } \l_tmpa_tl \l_tmpa_seq
  \seq_map_inline:Nn \l_tmpa_seq
   { \jakun_value_catcode:n { ##1 } }
  \group_end:
 }
\cs_new_protected:Nn \jakun_remove_braces:
 {
  \regex_match:nVT { \cB. } \l_tmpa_tl
   {
    \regex_replace_all:nnN { \cB. (.*?) \cE\} } { \cO\{ \1 \cO\} } \l_tmpa_tl
    \jakun_remove_braces:
   }
 }
\cs_generate_variant:Nn \regex_extract_all:nnN { nV }
\prg_generate_conditional_variant:Nnn \regex_match:nn { nV } { T }
\cs_new_protected:Nn \jakun_value_catcode:n
 {
  \bool_lazy_and:nnTF { \tl_if_single_p:n { #1 } } { \token_if_cs_p:N #1 }
   {
    \token_to_str:N #1 \textsubscript{16}
   }
   {
    \str_if_eq:nnTF { #1 } { ~ }
     { \textvisiblespace \textsubscript{10} }
     { \token_to_str:N #1 \textsubscript{\char_value_catcode:n { `#1 }} }
   }
 }
\ExplSyntaxOff


\begin{document}
\showcatcodes{a~b{cd{e1}}2 3!$_ ^y\xxx}
\end{document}

在此处输入图片描述

使用当前类别代码的版本;我认为需要做更多的工作来处理隐含字符。你可以尝试一下。

我认为这\tl_analysis_show:n对于调试来说要好得多。

\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage{xparse}

\ExplSyntaxOn
\NewDocumentCommand{\showcatcodes}{sm}
 {
  \group_begin:
  \ttfamily
  \IfBooleanTF{#1}
   {
    \exp_last_unbraced:NV \jakun_showcatcodes: #2 \q_stop
   }
   {
    \jakun_showcatcodes: #2 \q_stop
   }
  \group_end:
 }

\cs_new_protected:Nn \jakun_showcatcodes:
 {
  \peek_meaning_remove:NTF \q_stop
   {
    %\unskip
   }
   {
    \peek_catcode_remove:NTF \c_space_token
     {
      \jakun_print_catcode:nn { \textvisiblespace } { 10 }
      \jakun_showcatcodes:
     }
     {
      \peek_catcode_remove:NTF \c_group_begin_token
       {
        \jakun_print_catcode:nn { \{ } { 1 }
        \jakun_showcatcodes:
       }
       {
        \peek_catcode_remove:NTF \c_group_end_token
         {
          \jakun_print_catcode:nn { \} } { 2 }
          \jakun_showcatcodes:
         }
         {
          \jakun_other_catcode:N
         }
       }
     }
   }
 }

\cs_new_protected:Nn \jakun_other_catcode:N
 {
  \token_if_cs:NTF #1
   {
    \jakun_print_catcode:nn { \token_to_str:N #1 } { 16 }
   }
   {
    \token_if_eq_catcode:NNTF \c_math_toggle_token #1
     {
      \jakun_print_catcode:nn { \token_to_str:N #1 } { 3 }
     }
     {
      \token_if_eq_catcode:NNTF \c_alignment_token #1
       {
        \jakun_print_catcode:nn { \token_to_str:N #1 } { 4 }
       }
       {
        \token_if_eq_catcode:NNTF \c_parameter_token #1
         {
          \jakun_print_catcode:nn { \token_to_str:N #1 } { 6 }
         }
         {
          \token_if_eq_catcode:NNTF \c_math_superscript_token #1
           {
            \jakun_print_catcode:nn { \token_to_str:N #1 } { 7 }
           }
           {
            \token_if_eq_catcode:NNTF \c_math_subscript_token #1
             {
              \jakun_print_catcode:nn { \token_to_str:N #1 } { 8 }
             }
             {
              \token_if_eq_catcode:NNTF \c_catcode_letter_token #1
               {
                \jakun_print_catcode:nn { \token_to_str:N #1 } { 11 }
               }
               {
                \token_if_eq_catcode:NNTF \c_catcode_other_token #1
                 {
                  \jakun_print_catcode:nn { \token_to_str:N #1 } { 12 }
                 }
                 {
                  \jakun_print_catcode:nn { \token_to_str:N #1 } { 13 }
                 }
               }
             }
           }
         }
       }
     }
   }
   \jakun_showcatcodes:
 }

\cs_new_protected:Nn \jakun_print_catcode:nn
 {
  #1\textsubscript{#2}~
 }

\ExplSyntaxOff

\begin{document}

\showcatcodes{abc x{y{z}}~&#_\xyz}

{
\catcode`z=\active
\showcatcodes{abc x{y{z}}~&#_\xyz}
\gdef\test{abc x{y{z}}~&##_\xyz}
}

\showcatcodes*{\test}

\end{document}

在此处输入图片描述

答案3

此答案使用tokcycle包来提供 catcode 解码。它可以处理隐式、活动和长标记,但是也有一些限制。

该包当前设置为仅记住一个隐式每次只处理一个 cat-6 标记。如果输入流中有多个隐式 cat-6,它会将它们全部检测为 cat-6,但只会记住最近的隐式 cat-6 声明的名称。多个明确的cat-6 令牌没有问题。

该包可以处理 cat 1,2 标记的变化。但是,它无法检测与此类标记关联的字符代码,而必须提前告知它们。我将在后面的答案中展示一个例子。

它永远不会将%输入流中的 解释为 cat-14。相反,%在 到达环境之前,TeX 会将其解析为注释字符tokcycle

同样地,cat-5 行尾符号在到达 之前会被 TeX 拦截tokcycle,因此它们会被解释为显式空格标记。

首先,MWE 将 cat 1,2 标记保留为{},并且仅具有一个隐式 cat-6 标记\C

\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage{tokcycle,lmodern}
\tokcycleenvironment\showcats
 {\ifcatSIX\addcytoks{\thistok{##1}{6}}\else\addcytoks{\catcomp{##1}}\fi}
 {\addcytoks{\thistok{\{}{1}}\processtoks{##1}\addcytoks{\thistok{\}}{2}}}
 {\addcytoks{\catcomp{##1}}\tctestifx{\par##1}{\addcytoks{\par}}{}}
 {\ifimplicittok\addcytoks{\catcomp{##1}}\else
  \addcytoks{\thistok{\textvisiblespace}{\number\catcode`##1}}\fi}
\newcommand\thistok[2]{#1$_{#2}$\,\allowbreak}
\makeatletter
\newcommand\catcomp[1]{%
  \tctestifx{\implicitsixtok#1}{\expandafter\string#1$_{6}}{%
  \string#1$_{%
  \tctestifcatnx #1\relax{0}{%
  \tctestifcatnx #1${3}{%
  \tctestifcatnx #1&{4}{%
  \tctestifcatnx #1^{7}{%
  \tctestifcatnx #1_{8}{%
  \tctestifcatnx #1\@sptoken{10}{%
  \tctestifcatnx #1a{11}{%
  \tctestifcatnx #11{12}{%
  \tctestifcatnx #1~{13}{%
  *%
  }}}}}}}}}}%
  }$\,\allowbreak%
}
\let\deftok\tc@deftok
\makeatother
\begin{document}
\ttfamily
\let\A$% 3
\let\B&% 4
\let\C#% 6
\let\D^% 7
\let\E_% 8
\deftok\F{ }% 10
\let\G a% 11
\let\H 1% 12
\let\I~% 0, because \I is not active, 
%           but a macro that takes the same meaning as ~
\let\J\relax% 0
\def\K{xyz}% 0

\catcode`q=\active% 13
\def q{x}
\catcode`Q=\active
\let Q #% This implicit assignment makes 
%         the catcode of Q=6, rather than 13
\deftok\sptoken{ }% 10
\showcats 
\A\B\C\D\E\F\G\H\I\J\K

A9 $x_2^{y+1}$ \today &#~

\space\sptoken qQ<>
\endshowcats

\end{document}

在此处输入图片描述

需要注意的是,我们在这里测试的是实际标记的 catcode,而不是\catcode与给定 charcode 关联的当前值。因此,例如,如果在 之后\E设置为_,我们将_catcode 重新赋值为 7,则标记\E仍将测试为 cat-8,而不是 7。


现在,对于 cat 1,2 更改的情况,我将使用<>。因此,首先,必须保存这些 cat 1,2 标记的 catcode-12 版本。我使用

\def\<{<}
\def\>{>}

在进行任何 catcode 更改之前。然后我使用

\catcode`<=1
\catcode`>=2
\let\bgroup<
\let\egroup>
\settcGrouping<<#1>>
\catcode`{=12
\catcode`}=12

这里唯一不寻常的是\settcGrouping<<#1>>宏,它告诉tokcycle将哪些标记放在输出流中进行分组(它用作{_1}_2默认值,现在重置为<_1>_2)。此调用对于这个特定问题来说并不是真正必要的,因为我实际上并没有对输出流中的每个标记进行反标记。但是如果我这样做了,它将确保输出流分组将使用更新的标记进行填充<>

为了改变这种特殊方法,我明确告诉伪环境在出现分组情况时Showcats显示先前定义的内容\<_1\>_2

MWE...为了好玩,我使用纯 pdfTeX 来做,因为tokcycle可以在该模式下运行:

\input tokcycle
\def\thistok#1#2{#1$_{#2}$\,\allowbreak}
\catcode`@=11
\def\textvisiblespace{\char"20}
\def\,{\kern2pt}
\long\def\catcomp#1{%
  \tctestifx{\implicitsixtok#1}{\expandafter\string#1$_{6}}{%
  \string#1$_{%
  \tctestifcatnx #1\relax{0}{%
  \tctestifcatnx #1${3}{%
  \tctestifcatnx #1&{4}{%
  \tctestifcatnx #1^{7}{%
  \tctestifcatnx #1_{8}{%
  \tctestifcatnx #1\@sptoken{10}{%
  \tctestifcatnx #1a{11}{%
  \tctestifcatnx #11{12}{%
  \tctestifcatnx #1~{13}{%
  *%
  }}}}}}}}}}%
  }\,$\allowbreak%
}
\let\deftok\tc@deftok
\deftok\@sptoken{ }% 10
\catcode`@=12

\tt
\let\A$% 3
\let\B&% 4
\let\C#% 6
\let\D^% 7
\let\E_% 8
\deftok\F{ }% 10
\let\G a% 11
\let\H 1% 12
\let\I~% 0, because \I is not active, 
%           but a macro that takes the same meaning as ~
\let\J\relax% 0
\def\K{xyz}% 0

\catcode`q=\active% 13
\def q{x}
\catcode`Q=\active
\let Q #% This implicit assignment makes 
%         the catcode of Q=6, rather than 13
\deftok\sptoken{ }% 10

\def\<{<}
\def\>{>}

\catcode`<=1
\catcode`>=2
\let\bgroup<
\let\egroup>
\settcGrouping<<#1>>
\catcode`{=12
\catcode`}=12

\tokcycleenvironment\Showcats
 <\ifcatSIX\addcytoks<\thistok<##1><6>>\else\addcytoks<\catcomp<##1>>\fi>
 <\addcytoks<\thistok<\<><1>>\processtoks<##1>\addcytoks<\thistok<\>><2>>>
 <\addcytoks<\catcomp<##1>>\tctestifx<\par##1><\addcytoks<\par>><>>
 <\ifimplicittok\addcytoks<\catcomp<##1>>\else
  \addcytoks<\thistok<\textvisiblespace><\number\catcode`##1>>\fi>

\Showcats 
\A\B\C\D\E\F\G\H\I\J\K

A9 $x_2^<y+1>$ \today &#~

\space\sptoken qQ{}
\endShowcats

\bye

在此处输入图片描述


在制定此答案的过程中,我发现了该包中的一个错误。它没有正确处理主动隐式空格。例如,

\makeatletter
\catcode`Q=\active
\tc@deftok Q{ }
\tokcycle{}{}{}{\detokenize{[#1]}}{x y zQw}

Q在输入流中遇到时不会产生合理的结果。

我现在已将该功能实现到包中,并将 v1.2(2020-10-01)上传到 ctan 进行重新分发。

相关内容