去标记化不需要额外的空格吗？

Question 1

一旦 TeX 将它们转换为标记，就不可能区分\A .：\A.如果您需要保留这些空间，唯一的解决方案是逐字读取参数。

但是，如果你对此没意见，那么最简单的方法是更新l3内核和l3实验捆绑（和l3 软件包）到非常最新版本（2012 年 2 月），然后使用l3regex包添加\string到参数中的每个标记前面，然后展开。下面的代码执行此操作（替换\tl_show:N为您想要对字符串执行的任何操作）。

\documentclass{article}
\usepackage{l3regex}
\ExplSyntaxOn
\cs_new_protected:Npn \test #1
  {
    \tl_set:Nn \l_tmpa_tl {#1}
    \regex_replace_all:nnN { . } { \c{string} \0 } \l_tmpa_tl
    \tl_set:Nx \l_tmpb_tl { \l_tmpa_tl }
    % now \l_tmpb_tl contains what you want:
    \tl_show:N \l_tmpb_tl
  }
\ExplSyntaxOff
\begin{document}
  \test{\A\d{2,}.+ Hello, world!\z}
\end{document}

它是如何工作的？\regex_replace_all:nnN对存储的标记列表执行替换，因此我们需要存储参数。

\tl_set:Nn    % Set locally
  \l_tmpa_tl  % the "local temporary token list" `\l_tmpa_tl`
  {#1}        % to contain "#1" (the argument).
\regex_replace_all:nnN % Replace every occurrence of
  { . }                % any token, even braces etc.
  {                    % by
    \c{string}         %   \string
    \0                 %   what was matched (the token)
  } \l_tmpa_tl         % in \l_tmpa_tl
\tl_set:Nx        % Set locally, with expansion,
  \l_tmpb_tl      % the "local temporary token list b"
  { \l_tmpa_tl }  % to (the expansion of) `\l_tmpa_tl`
\tl_show:N    % Show the contents of
  \l_tmpb_tl  % the token list variable `\l_tmpb_tl`

当然，在底层，l3regex做了很多工作，所以这取决于你必须经历多少个这样的正则表达式。

编辑：一个简单的 TeX 解决方案，用于您要求的非常具体的任务。我假设字符串从不包含字符^^A（字符代码 1）。这个想法是用来\lowercase改变所有真的将空格标记替换为某个可识别的字符。然后\detokenize，并一次循环遍历结果中的一个字符（这会自动跳过空格），并^^A用空格替换。

\catcode64=11
\long\def\test#1%
  {%
    \begingroup
      % Ensure that every character is preserved by \lowercase.
      \count@\z@
      \loop\ifnum\count@<256
        \lccode\count@\z@
        \advance\count@\@ne
      \repeat
      % Except spaces, changed to ^^A
      \lccode32=\@ne
      \lowercase
        {%
          \endgroup
          \edef\result{\expandafter\test@\detokenize{#1}\relax}%
        }%
  }
% Then map {^^A => space, space =>} onto the string.
\def\test@#1%
  {%
    \ifx#1\relax\test@end\fi
    \ifnum`#1=\@ne\space\else#1\fi
    \test@
  }
\def\test@end\fi#1\test@{\fi}
\catcode64=12
\test{ab c\d e{f} \fg }\show\result

Answer

一旦 TeX 将它们转换为标记，就不可能区分\A .：\A.如果您需要保留这些空间，唯一的解决方案是逐字读取参数。

但是，如果你对此没意见，那么最简单的方法是更新l3内核和l3实验捆绑（和l3 软件包）到非常最新版本（2012 年 2 月），然后使用l3regex包添加\string到参数中的每个标记前面，然后展开。下面的代码执行此操作（替换\tl_show:N为您想要对字符串执行的任何操作）。

\documentclass{article}
\usepackage{l3regex}
\ExplSyntaxOn
\cs_new_protected:Npn \test #1
  {
    \tl_set:Nn \l_tmpa_tl {#1}
    \regex_replace_all:nnN { . } { \c{string} \0 } \l_tmpa_tl
    \tl_set:Nx \l_tmpb_tl { \l_tmpa_tl }
    % now \l_tmpb_tl contains what you want:
    \tl_show:N \l_tmpb_tl
  }
\ExplSyntaxOff
\begin{document}
  \test{\A\d{2,}.+ Hello, world!\z}
\end{document}

它是如何工作的？\regex_replace_all:nnN对存储的标记列表执行替换，因此我们需要存储参数。

\tl_set:Nn    % Set locally
  \l_tmpa_tl  % the "local temporary token list" `\l_tmpa_tl`
  {#1}        % to contain "#1" (the argument).
\regex_replace_all:nnN % Replace every occurrence of
  { . }                % any token, even braces etc.
  {                    % by
    \c{string}         %   \string
    \0                 %   what was matched (the token)
  } \l_tmpa_tl         % in \l_tmpa_tl
\tl_set:Nx        % Set locally, with expansion,
  \l_tmpb_tl      % the "local temporary token list b"
  { \l_tmpa_tl }  % to (the expansion of) `\l_tmpa_tl`
\tl_show:N    % Show the contents of
  \l_tmpb_tl  % the token list variable `\l_tmpb_tl`

当然，在底层，l3regex做了很多工作，所以这取决于你必须经历多少个这样的正则表达式。

编辑：一个简单的 TeX 解决方案，用于您要求的非常具体的任务。我假设字符串从不包含字符^^A（字符代码 1）。这个想法是用来\lowercase改变所有真的将空格标记替换为某个可识别的字符。然后\detokenize，并一次循环遍历结果中的一个字符（这会自动跳过空格），并^^A用空格替换。

\catcode64=11
\long\def\test#1%
  {%
    \begingroup
      % Ensure that every character is preserved by \lowercase.
      \count@\z@
      \loop\ifnum\count@<256
        \lccode\count@\z@
        \advance\count@\@ne
      \repeat
      % Except spaces, changed to ^^A
      \lccode32=\@ne
      \lowercase
        {%
          \endgroup
          \edef\result{\expandafter\test@\detokenize{#1}\relax}%
        }%
  }
% Then map {^^A => space, space =>} onto the string.
\def\test@#1%
  {%
    \ifx#1\relax\test@end\fi
    \ifnum`#1=\@ne\space\else#1\fi
    \test@
  }
\def\test@end\fi#1\test@{\fi}
\catcode64=12
\test{ab c\d e{f} \fg }\show\result

Question 2

尽管问题被标记为，但tex-core我想指出。据我所知，xparse它具有执行不带空格的 detokinization 的参数规范。形成手册：v

类型“v”的参数以逐字模式读取，这将导致抓取的参数由类别代码 12（“其他”）的标记组成，但空格除外，空格的类别代码为 10（“空间”）。

\DeclareDocumentCommand\foo{v}{\ttfamily #1}

并使用它

\foo!\A\d{2,}.+\z!

产生与以下相同的输出

\verb!\A\d{2,}.+\z!

没有引入额外的空格xparse。从这个意义上讲，参数的内容#1是“未受影响的”。

Answer

尽管问题被标记为，但tex-core我想指出。据我所知，xparse它具有执行不带空格的 detokinization 的参数规范。形成手册：v

类型“v”的参数以逐字模式读取，这将导致抓取的参数由类别代码 12（“其他”）的标记组成，但空格除外，空格的类别代码为 10（“空间”）。

\DeclareDocumentCommand\foo{v}{\ttfamily #1}

并使用它

\foo!\A\d{2,}.+\z!

产生与以下相同的输出

\verb!\A\d{2,}.+\z!

没有引入额外的空格xparse。从这个意义上讲，参数的内容#1是“未受影响的”。

Question 3

真正通用的解决方案似乎很难，但这是我的尝试。

调用\spaceparse参数以对命令后的空格进行去标记化和解析。解析结果可在宏中找到\result。您需要调用\result才能看到结果。

由于\detokenize将井号加倍，我们首先反转该操作。如果您不需要此默认操作，则使用星号 (*) 形式的\spaceparse。

您可以将其复制到包中并调用该包。

\documentclass{article}
\usepackage{catoptions}
% No conflict with etoolbox.sty:
% \usepackage{etoolbox}
\makeatletter
\robust@def*\spaceparse{\cpt@testst\sp@ceparse}
\robust@def*\sp@ceparse#1{%
  \begingroup
  \edef\@tempa{\detokenize{#1}}%
  \ifboolTF{cpt@st}{}{\s@expandarg\cpt@pophash\@tempa\@tempa}%
  \edef\@tempa##1{##1\expandcsonce\@tempa\@space\cpt@nil}%
  \edef\@tempb##1{\def##1####1\@space####2\cpt@nil}%
  \@tempb\@tempb{%
    \ifblankTF{##2}{%
      \toks@\expandafter{\the\toks@##1}%
    }{%
      \countbackslash{##1}%
      \ifnum\nr=\@ne
        \xifinsetTF{\@car##2\relax\@nil}\cpt@oth@rchars{%
          \toks@\expandafter{\the\toks@##1}%
        }{%
          \cptexpanded{\toks@{\the\toks@\unexpanded{##1}\@space}}%
        }%
      \else
        \cptexpanded{\toks@{\the\toks@\unexpanded{##1}\@space}}%
      \fi
      \@tempb##2\cpt@nil
    }%
  }%
  \@tempa{\toks@{}\@tempb}%
  \edef\result{\the\toks@}%
  \postgroupdef\result\endgroup
}
\robust@def*\countbackslash#1{%
  \begingroup
  \@tempcnta\z@
  \def\@tempa##1{%
    \def\@tempa####1##1####2\@nil{%
      \ifblankTF{####2}{}{%
        \advance\@tempcnta\@ne
        \@tempa####2\@nil
      }%
    }%
    \@tempa#1##1\@nil
  }%
  \s@expandarg\@tempa\@backslashchar
  \cptexpanded{\endgroup\def\noexpand\nr{\the\@tempcnta}}%
}
\makeatother

测试：

\def\x{##1\A\d{2,}.+\z A B\\x y}
% Content of \x is already read:
\expandafter\spaceparse\expandafter{\x}
\show\result

\spaceparse{xx\x{f} \fg x}
\show\result

\spaceparse{ab c\d e{f} \fg x}
\show\result

\spaceparse{#1\A\d{2,}.+\z A B}
\show\result

\begin{document}

\end{document}

Answer