分隔字符串中的字符

分隔字符串中的字符

我想编写一个宏,根据某些条件将给定字符串中的字符分开。如果我有一个字符串,abcdef<ghi>jkl<mn>opqrs我想将每个字符作为字符串访问,以便在另一个命令中进一步使用,但ghi应该是一个字符串,同样mn应该是一个字符串。你能帮我实现这个吗?

抱歉,我添加了以下问题,我以为一般答案适用于Xelatex梵文字符,但似乎不行。因此进行了编辑。

我的输入字符串将是天城文字符,然后我将运行Xelatex

例如我的字符串的形式为सा{परेग}नी{धम}पनी

另外,我通过使用Xstring包对另一个字符串进行操作来获取该字符串,我正在执行以下提取

\StrBetween{(सा{परे}न{धम}पनी)[नी{धम}पनी]}{(}{)}[\firststring]现在我想用它\firststring作为输入。

答案1

在这里,我使用 来tokcycle处理原始输入的标记,并在每个字符标记后添加,,除非它位于<...>分组内。然后,我可以使用listofitems来读取这个以逗号分隔的列表,将每个项目存储在一个数组中\mystring。此数组可通过项目的索引访问。

如果需要访问每个数组元素的实际标记,则\mystring[...]需要将调用扩展两次。

\documentclass{article}
\usepackage{listofitems,tokcycle}
\newcounter{nestlevel}
\Characterdirective{\ifx<#1\stepcounter{nestlevel}\else
  \ifx>#1\addtocounter{nestlevel}{-1}\addcytoks{,}\else
  \addcytoks{#1}\ifnum\thenestlevel=0\relax\addcytoks{,}\fi\fi\fi}
\ignoreemptyitems
\gdef\getmystring#1{%
  \setcounter{nestlevel}{0}%
  \tokcyclexpress{#1}%
  \expandafter\readlist\expandafter\mystring\expandafter{\the\cytoks}}
\begin{document}
\getmystring{abcdef<ghi>jkl<mn>opqrs}
The number of strings is \mystringlen.

\mystring[3], \mystring[6], \mystring[7], \mystring[11], \mystring[16]
\end{document}

在此处输入图片描述

如果允许{...}对输入流进行括号分组,而不是尖括号分组<...>,则构造会更简单一些,以获得相同的结果:

\documentclass{article}
\usepackage{listofitems,tokcycle}
\stripgroupingtrue
\ignoreemptyitems
\newcommand\getmystring[1]{%
  \tokcycle{\addcytoks{##1,}}{\addcytoks{##1,}}{}{}{#1}%
  \expandafter\readlist\expandafter\mystring\expandafter{\the\cytoks}}
\begin{document}
\getmystring{abcdef{ghi}jkl{mn}opqrs}
The number of strings is \mystringlen.

\mystring[3], \mystring[6], \mystring[7], \mystring[11], \mystring[16]
\end{document}

补充

OP 评论说他希望使用这种方法,即不是直接提供输入,而是通过\defed 字符串提供输入。在这里,我提供\getmydefstring如何做到这一点:

\documentclass{article}
\usepackage{listofitems,tokcycle}
\stripgroupingtrue
\ignoreemptyitems
\newcommand\getmystring[1]{%
  \tokcycle{\addcytoks{##1,}}{\addcytoks{##1,}}{}{}{#1}%
  \expandafter\readlist\expandafter\mystring\expandafter{\the\cytoks}}
\newcommand\getmydefstring[1]{%
  \def\tmp{\tokcycle{\addcytoks{####1,}}{\addcytoks{####1,}}{}{}}
  \expandafter\tmp\expandafter{#1}%
  \expandafter\readlist\expandafter\mystring\expandafter{\the\cytoks}}
\begin{document}
\def\mystring{abcdef{ghi}jkl{mn}opqrs}
\getmydefstring{\mystring}
The number of strings is \mystringlen.

\mystring[3], \mystring[6], \mystring[7], \mystring[11], \mystring[16]
\end{document}

答案2

如果您不介意将字符串处理为无分隔参数列表或显式空格标记,我可以提供一个宏\ExtractKthArgOrSpace

请注意,无限制参数

  • 要么由单个标记组成,该标记既不是显式空格标记,也不是类别代码 1(开始组)的显式字符标记,例如,或 2(结束组),例如,{1}2
  • 或者由一组嵌套在花括号中的花括号平衡标记组成。

因此,可以使用<and来代替and 。>{}

当使用传统 TeX 引擎(其中内部字符编码方案为 8 位 ASCII)通过 inputenc 包处理 utf8 编码的 .tex 输入时,unicode 字符/包含 unicode 字符的子字符串需要嵌套在花括号中。这是因为在这样的系统上,.tex 输入中的单个 unicode 字符将用于字节编码/8 位编码字符序列,从而产生多个字符标记序列,第一个字符标记处于活动状态并“查看后续字符标记以决定要将哪些字形传送到输出文件”。

\makeatletter
%% Code for \ExtractKthArgOrSpace
%%=============================================================================
%% Paraphernalia:
%%    \UD@firstoftwo, \UD@secondoftwo, \UD@PassFirstToSecond, \UD@Exchange,
%%    \UD@CheckWhetherNull
%%=============================================================================
\newcommand\UD@firstoftwo[2]{#1}%
\newcommand\UD@secondoftwo[2]{#2}%
\newcommand\UD@PassFirstToSecond[2]{#2{#1}}%
\newcommand\UD@Exchange[2]{#2#1}%
%%-----------------------------------------------------------------------------
%% Check whether argument is empty:
%%.............................................................................
%% \UD@CheckWhetherNull{<Argument which is to be checked>}%
%%                     {<Tokens to be delivered in case that argument
%%                       which is to be checked is empty>}%
%%                     {<Tokens to be delivered in case that argument
%%                       which is to be checked is not empty>}%
%%
%% The gist of this macro comes from Robert R. Schneck's \ifempty-macro:
%% <https://groups.google.com/forum/#!original/comp.text.tex/kuOEIQIrElc/lUg37FmhA74J>
\newcommand\UD@CheckWhetherNull[1]{%
  \romannumeral0\expandafter\UD@secondoftwo\string{\expandafter
  \UD@secondoftwo\expandafter{\expandafter{\string#1}\expandafter
  \UD@secondoftwo\string}\expandafter\UD@firstoftwo\expandafter{\expandafter
  \UD@secondoftwo\string}\expandafter\expandafter\UD@firstoftwo{ }{}%
  \UD@secondoftwo}{\expandafter\expandafter\UD@firstoftwo{ }{}\UD@firstoftwo}%
}%
%%=============================================================================
%% Check whether brace-balanced argument's first token is a space-token
%%=============================================================================
%% \UD@CheckWhetherLeadingSpace{<Argument which is to be checked>}%
%%                             {<Tokens to be delivered in case <argument
%%                               which is to be checked>'s 1st token is a
%%                               space-token>}%
%%                             {<Tokens to be delivered in case <argument
%%                               which is to be checked>'s 1st token is not
%%                               a space-token>}%
\newcommand\UD@CheckWhetherLeadingSpace[1]{%
  \romannumeral0\UD@CheckWhetherNull{#1}%
  {\UD@firstoftwo\expandafter{} \UD@secondoftwo}%
  {\expandafter\UD@secondoftwo\string{\UD@@CheckWhetherLeadingSpace.#1 }{}}%
}%
\@ifdefinable\UD@@CheckWhetherLeadingSpace{%
  \long\def\UD@@CheckWhetherLeadingSpace#1 {%
    \expandafter\UD@CheckWhetherNull\expandafter{\UD@secondoftwo#1{}}%
    {\UD@Exchange{\UD@firstoftwo}}{\UD@Exchange{\UD@secondoftwo}}%
    {\UD@Exchange{ }{\expandafter\expandafter\expandafter\expandafter
     \expandafter\expandafter\expandafter}\expandafter\expandafter
     \expandafter}\expandafter\UD@secondoftwo\expandafter{\string}%
  }%
}%
%%=============================================================================
%% Remove a space-token
%%=============================================================================
\@ifdefinable\UD@gobblespace{\UD@firstoftwo{\def\UD@gobblespace}{} {}}%
%%=============================================================================
%% Extract K-th element of list of undelimited arguments or spaces:
%%
%% \ExtractKthArgOrSpace{<integer K>}{<list of undelimited args or spaces>} 
%% 
%% In case there is no K-th element in <list of undelimited args or spaces> : 
%%   Does not deliver any token.
%% In case there is a K-th element in <list of undelimited args or spaces> : 
%%   Does deliver that K-th element with one level of surrounding curly 
%%   braces removed if present.
%%
%% Examples:
%%
%%   \ExtractKthArgOrSpace{0}{ABCDE} yields: <nothing>
%%
%%   \ExtractKthArgOrSpace{3}{ABCDE} yields:  C
%%
%%   \ExtractKthArgOrSpace{3}{AB{CD}E} yields:  CD
%%
%%   \ExtractKthArgOrSpace{3}{AB DE} yields: <space token>
%%
%%   \ExtractKthArgOrSpace{2}{{AB} {DE}F} yields: <space token>
%%
%%   \ExtractKthArgOrSpace{4}{{001}{002}{003}{004}{005}} yields: 004
%%
%%   \ExtractKthArgOrSpace{6}{{001}{002}{003}} yields: <nothing> 
%%
%% Due to \romannumeral0-expansion the result is delivered after two expansion-
%% steps/after two "hits" by \expandafter.
%% 
%%=============================================================================
\newcommand\ExtractKthArgOrSpace[1]{%
  \romannumeral0%
  % #1: <integer number K>
  \expandafter\UD@ExtractKthArgOrSpaceCheck
  \expandafter{\romannumeral\number\number#1 000}%
}%
\newcommand\UD@ExtractKthArgOrSpaceCheck[2]{%
  \UD@CheckWhetherNull{#1}{ }{%
    \expandafter\UD@ExtractKthArgOrSpaceLoop\expandafter{\UD@firstoftwo{}#1}{#2}%
  }%
}%
\newcommand\UD@ExtractKthArgOrSpaceLoop[2]{%
  \UD@CheckWhetherNull{#2}{ }{%
    \UD@CheckWhetherNull{#1}{%
      \UD@CheckWhetherLeadingSpace{#2}{%
        \UD@ExtractFirstArgLoop{{ }#2\UD@SelDOm}%
      }{%
        \UD@ExtractFirstArgLoop{#2\UD@SelDOm}%
      }%
    }{%
      \UD@CheckWhetherLeadingSpace{#2}{%
        \expandafter\UD@PassFirstToSecond\expandafter{\UD@gobblespace#2}%
      }{%
        \expandafter\UD@PassFirstToSecond\expandafter{\UD@firstoftwo{}#2}%
      }%
      {\expandafter\UD@ExtractKthArgOrSpaceLoop\expandafter{\UD@firstoftwo{}#1}}%
    }%
  }%
}%
\newcommand\UD@RemoveTillUD@SelDOm{}%
\long\def\UD@RemoveTillUD@SelDOm#1#2\UD@SelDOm{{#1}}%
\newcommand\UD@ExtractFirstArgLoop[1]{%
  \expandafter\UD@CheckWhetherNull\expandafter{\UD@firstoftwo{}#1}%
  {\UD@firstoftwo{\expandafter}{} \UD@secondoftwo{}#1}%
  {\expandafter\UD@ExtractFirstArgLoop\expandafter{\UD@RemoveTillUD@SelDOm#1}}%
}%
%% End of code for \ExtractKthArgOrSpace.
\makeatother

\documentclass{article}
\begin{document}

\noindent The list of undelimited arguments or space-tokens is: \verb|abcdef{ghi}jkl{mn}opqr s|

\bigskip

\noindent
\verb|\ExtractKthArgOrSpace{1}{abcdef{ghi}jkl{mn}opqr s}|: \ExtractKthArgOrSpace{1}{abcdef{ghi}jkl{mn}opqr s}\\
\verb|\ExtractKthArgOrSpace{2}{abcdef{ghi}jkl{mn}opqr s}|: \ExtractKthArgOrSpace{2}{abcdef{ghi}jkl{mn}opqr s}\\
\verb|\ExtractKthArgOrSpace{3}{abcdef{ghi}jkl{mn}opqr s}|: \ExtractKthArgOrSpace{3}{abcdef{ghi}jkl{mn}opqr s}\\
\verb|\ExtractKthArgOrSpace{4}{abcdef{ghi}jkl{mn}opqr s}|: \ExtractKthArgOrSpace{4}{abcdef{ghi}jkl{mn}opqr s}\\
\verb|\ExtractKthArgOrSpace{5}{abcdef{ghi}jkl{mn}opqr s}|: \ExtractKthArgOrSpace{5}{abcdef{ghi}jkl{mn}opqr s}\\
\verb|\ExtractKthArgOrSpace{6}{abcdef{ghi}jkl{mn}opqr s}|: \ExtractKthArgOrSpace{6}{abcdef{ghi}jkl{mn}opqr s}\\
\verb|\ExtractKthArgOrSpace{7}{abcdef{ghi}jkl{mn}opqr s}|: \ExtractKthArgOrSpace{7}{abcdef{ghi}jkl{mn}opqr s}\\
\verb|\ExtractKthArgOrSpace{8}{abcdef{ghi}jkl{mn}opqr s}|: \ExtractKthArgOrSpace{8}{abcdef{ghi}jkl{mn}opqr s}\\
\verb|\ExtractKthArgOrSpace{9}{abcdef{ghi}jkl{mn}opqr s}|: \ExtractKthArgOrSpace{9}{abcdef{ghi}jkl{mn}opqr s}\\
\verb|\ExtractKthArgOrSpace{10}{abcdef{ghi}jkl{mn}opqr s}|: \ExtractKthArgOrSpace{10}{abcdef{ghi}jkl{mn}opqr s}\\
\verb|\ExtractKthArgOrSpace{11}{abcdef{ghi}jkl{mn}opqr s}|: \ExtractKthArgOrSpace{11}{abcdef{ghi}jkl{mn}opqr s}\\
\verb|\ExtractKthArgOrSpace{12}{abcdef{ghi}jkl{mn}opqr s}|: \ExtractKthArgOrSpace{12}{abcdef{ghi}jkl{mn}opqr s}\\
\verb|\ExtractKthArgOrSpace{13}{abcdef{ghi}jkl{mn}opqr s}|: \ExtractKthArgOrSpace{13}{abcdef{ghi}jkl{mn}opqr s}\\
\verb|\ExtractKthArgOrSpace{14}{abcdef{ghi}jkl{mn}opqr s}|: \ExtractKthArgOrSpace{14}{abcdef{ghi}jkl{mn}opqr s}\\
\verb|\ExtractKthArgOrSpace{15}{abcdef{ghi}jkl{mn}opqr s}|: \ExtractKthArgOrSpace{15}{abcdef{ghi}jkl{mn}opqr s}\\
Let's put the following into parentheses to make the space visible:\\
\verb|(\ExtractKthArgOrSpace{16}{abcdef{ghi}jkl{mn}opqr s})|: (\ExtractKthArgOrSpace{16}{abcdef{ghi}jkl{mn}opqr s})\\
\verb|\ExtractKthArgOrSpace{17}{abcdef{ghi}jkl{mn}opqr s}|: \ExtractKthArgOrSpace{17}{abcdef{ghi}jkl{mn}opqr s}

\bigskip

\noindent Space-tokens are taken into account, but be aware that with \LaTeX's tokenizer
consecutive spaces appearing in the .tex-input-file usually collapse into a single explicit space-token
if not skipped completely due to appearing behind something that got tokenized as control-word-token:

\bigskip

\noindent
\verb|(\ExtractKthArgOrSpace{1}{ })|: (\ExtractKthArgOrSpace{1}{ })\\
\verb|(\ExtractKthArgOrSpace{1}{ abc})|: (\ExtractKthArgOrSpace{1}{ abc})\\
\verb|(\ExtractKthArgOrSpace{1}{{ }abc})|: (\ExtractKthArgOrSpace{1}{{ }abc})\\
\verb|(\ExtractKthArgOrSpace{3}{ab c})|: (\ExtractKthArgOrSpace{3}{ab c})\\
\verb|(\ExtractKthArgOrSpace{3}{ab{ }c})|: (\ExtractKthArgOrSpace{3}{ab{ }c})

\bigskip

\noindent
Now let's define a macro after hitting \verb|\ExtractKthArgOrSpace| with \verb|\expandafter| twice:

\begin{verbatim}
\expandafter\expandafter
\expandafter            \newcommand
\expandafter\expandafter
\expandafter            \test
\expandafter\expandafter
\expandafter            {%
   \ExtractKthArgOrSpace{7}{abcdef{ghi}jkl{mn}opqr s}%
}%

\texttt{\meaning\test}
\end{verbatim}

\expandafter\expandafter
\expandafter            \newcommand
\expandafter\expandafter
\expandafter            \test
\expandafter\expandafter
\expandafter            {%
   \ExtractKthArgOrSpace{7}{abcdef{ghi}jkl{mn}opqr s}%
}%

\noindent
\texttt{\meaning\test}
\end{document}

在此处输入图片描述

答案3

在评论中您指出使用{}而不是<>可以。

这使得它变得简单,因为 latex 有这样一个循环,所以你不需要包,只需要一个命令来定义在这种情况下每次迭代要做什么,这里我每次都用 [] 包围并结束段落。

在此处输入图片描述

\documentclass{article}

\makeatletter
\newcommand\myloop[1]{\@tfor\zzz:=#1\do{[\zzz]\par}}
\makeatother

\begin{document}
\myloop{abcdef{ghi}jkl{mn}opqrs}
\end{document}

或者如果你想保存这些项目,只需在循环中修改命令

在此处输入图片描述

\documentclass{article}

\makeatletter
\newcount\mycount
\newcommand\myloop[1]{%
\mycount=0 %
\@tfor\zzz:= #1\do{%
\advance\mycount 1 %
\expandafter\edef\csname zzz\the\mycount\endcsname{\zzz}%
}}
\newcommand\myuse[1]{\csname zzz#1\endcsname}

\makeatother

\begin{document}
\myloop{abcdef{ghi}jkl{mn}opqrs}

7th item is \myuse{7}

11th item is \myuse{11}

13th item is \myuse{13}

\end{document}

答案4

它是带有 的双行代码expl3

\documentclass{article}
\usepackage{xparse}

\ExplSyntaxOn

\NewExpandableDocumentCommand{\getstringitem}{mm}
 {
  \tl_item:en { #1 } { #2 }
 }
\cs_generate_variant:Nn \tl_item:nn { e }

\ExplSyntaxOff

\begin{document}

\getstringitem{abcdef{ghi}jkl{mn}opqrs}{1},
\getstringitem{abcdef{ghi}jkl{mn}opqrs}{2},
\getstringitem{abcdef{ghi}jkl{mn}opqrs}{3},
\getstringitem{abcdef{ghi}jkl{mn}opqrs}{4},
\getstringitem{abcdef{ghi}jkl{mn}opqrs}{5},
\getstringitem{abcdef{ghi}jkl{mn}opqrs}{6},
\getstringitem{abcdef{ghi}jkl{mn}opqrs}{7},
\getstringitem{abcdef{ghi}jkl{mn}opqrs}{8},
\getstringitem{abcdef{ghi}jkl{mn}opqrs}{9},
\getstringitem{abcdef{ghi}jkl{mn}opqrs}{10},
\getstringitem{abcdef{ghi}jkl{mn}opqrs}{11},
\getstringitem{abcdef{ghi}jkl{mn}opqrs}{12},
\getstringitem{abcdef{ghi}jkl{mn}opqrs}{13},
\getstringitem{abcdef{ghi}jkl{mn}opqrs}{14},
\getstringitem{abcdef{ghi}jkl{mn}opqrs}{15},
\getstringitem{abcdef{ghi}jkl{mn}opqrs}{16}

\getstringitem{\getstringitem{abcdef{ghi}jkl{mn}opqrs}{7}}{2} should be h

\newcommand{\mystring}{a{bcde}{klm}yz}

\getstringitem{\mystring}{1},
\getstringitem{\mystring}{2},
\getstringitem{\mystring}{3},
\getstringitem{\mystring}{4},
\getstringitem{\mystring}{5}

\end{document}

请注意,您甚至可以嵌套调用并使用存储在宏中的字符串。

在此处输入图片描述

您可以通过以下方式获取最后一个元素

\getstringitem{\mystring}{-1}

(当然,明确的字符串也有效)。

相关内容