从文本中提取数字和非数字部分

从文本中提取数字和非数字部分

我想从字符串中提取前导数字和后续文本。我有一个想法,使用xstring包从右侧吞噬字符,直到最终得到一个数字或一个空字符串,但想知道是否有更简单的方法。前导数字是直到第一个不是数字、句点或加号和减号的字符为止的所有文本。

真的不需要担心错误情况,例如:

  • 额外的+-在数量内
  • 第一个非数字后面还有更多数字

\ExtractLeadingNumber因此,我应该获得以下输出,其中完成了和的定义\ExtractTralingNonDigits

在此处输入图片描述

代码:

\documentclass[border=2pt]{standalone}
\usepackage{booktabs}

\newcommand*{\ExtractLeadingNumber}[1]{#1}%
\newcommand*{\ExtractTralingNonDigits}[1]{#1}%

% ignore #2 and #3 as those are only needed to produce the desired output
\newcommand{\Test}[3]{#1&\ExtractLeadingNumber{#1}&\ExtractTralingNonDigits{#1}\\}%
%\newcommand{\Test}[3]{#1&#2&#3\\}% This produces desired output

\begin{document}
\begin{tabular}{l r r r}
 & &Number &Non-Digits\\

\midrule
Decimal:
&\Test{ 1.01abc}{ 1.01}{abc}
&\Test{+2.01abc}{+2.01}{abc}
&\Test{-3.01abc}{-3.01}{abc}

\midrule
Integer:
&\Test{  abc}{  }{abc}
&\Test{ 5abc}{ 5}{abc}
&\Test{+6abc}{+6}{abc}
&\Test{-7abc}{-7}{abc}

\midrule
Floating Point:
&\Test{ 5.34abc}{ 5.34}{abc}
&\Test{+6.34abc}{+6.34}{abc}
&\Test{-7.34abc}{-7.34}{abc}

\midrule
Number Only:
&\Test{3}{3}{}
&\Test{3.2}{3.2}{}
&\Test{-5.1}{-5.1}{}
&\Test{+5.1}{+5.1}{}

\midrule
No Digits:
&\Test{abc}{}{abc}

\midrule
Formatted Text:
&\Test{  8$abc_1$}{  8}{$abc_1$}
&\Test{-8.2$abc_1$}{-8.2}{$abc_1$}
&\Test{+$abc_1$}{+}{$abc_1$}
&\Test{$abc_1$}{}{$abc_1$}% no digits
\end{tabular}
\end{document}

答案1

这是一个解决方案xstring

\documentclass[border=2pt]{standalone}
\usepackage{booktabs}
\usepackage{xstring}
\makeatletter
% first, need to fix a bug in xstring:
\@xs@newmacro\IfDecimal{}{1}{0}{%
    \@xs@formatnumber{#1}\@xs@reserved@A
    \decimalpart\z@
    \afterassignment\@xs@defafterinteger\integerpart\@xs@reserved@A\relax\@xs@nil
    \expandafter\@xs@testdot\@xs@afterinteger\@xs@nil
    \ifx\@empty\@xs@afterdecimal\expandafter\@firstoftwo\else\expandafter\@secondoftwo\fi}

\newcommand*\Test[1]{%
    \IfBeginWith{#1}{ }{\StrBehind{#1}{ }[\temp@@]}{\def\temp@@{#1}}%
    \IfDecimal\temp@@
        {\def\temp@{#1&#1&}}
        {\def\temp@{#1&}%
        \StrBefore{#1}\@xs@afterdecimal[\temp@@]%
        \expandafter\g@addto@macro\expandafter\temp@\expandafter{\temp@@&}%
        \expandafter\g@addto@macro\expandafter\temp@\expandafter{\@xs@afterdecimal}%
        }%
    \temp@\\}
\makeatother
\begin{document}
\begin{tabular}{l r r r}
 & &Number &Non-Digits\\
\midrule
Decimal:
&\Test{ 1.01abc}
&\Test{+2.01abc}
&\Test{-3.01abc}

\midrule
Integer:
&\Test{  abc}
&\Test{ 5abc}
&\Test{+6abc}
&\Test{-7abc}

\midrule
Floating Point:
&\Test{ 5.34abc}
&\Test{+6.34abc}
&\Test{-7.34abc}

\midrule
Number Only:
&\Test{3}
&\Test{3.2}
&\Test{-5.1}
&\Test{+5.1}

\midrule
No Digits:
&\Test{abc}

\midrule
Formatted Text:
&\Test{  8$abc_1$}
&\Test{-8.2$abc_1$}
&\Test{+$abc_1$}
&\Test{$abc_1$}
\end{tabular}
\end{document}

编辑:这是如何处理\ExtractLeadingNumber\ExtractTralingNonDigits

\makeatletter
% first, need to fix a bug in xstring:
\@xs@newmacro\IfDecimal{}{1}{0}{%
    \@xs@formatnumber{#1}\@xs@reserved@A
    \decimalpart\z@
    \afterassignment\@xs@defafterinteger\integerpart\@xs@reserved@A\relax\@xs@nil
    \expandafter\@xs@testdot\@xs@afterinteger\@xs@nil
    \ifx\@empty\@xs@afterdecimal\expandafter\@firstoftwo\else\expandafter\@secondoftwo\fi}

\newcommand*\ExtractLeadingNumber[1]{%
    \IfBeginWith{#1}{ }{\StrBehind{#1}{ }[\temp@@]}{\def\temp@@{#1}}%
    \IfDecimal\temp@@{#1}{\StrBefore{#1}\@xs@afterdecimal}%
}
\newcommand*\ExtractTralingNonDigits[1]{%
    \IfBeginWith{#1}{ }{\StrBehind{#1}{ }[\temp@@]}{\def\temp@@{#1}}%
    \IfDecimal\temp@@{}\@xs@afterdecimal
}
\makeatother

\newcommand*\Test[1]{#1&\ExtractLeadingNumber{#1}&\ExtractTralingNonDigits{#1}\\}

答案2

l3regex使用 LaTeX3模块的方法

\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage{array,booktabs,expl3,l3regex}
\ExplSyntaxOn
\tl_new:N \l_extract_tl
\regex_set:Nn \l_extract_tl { ^\s*([+-]?\d*\.?\d*)\s*(.*) }
\seq_new:N \l_extract_seq
\tl_new:N \NumberValue
\tl_new:N \OtherValue
\cs_new_protected:Npn \extract_number:n #1
  {
    \regex_extract_once:NnN  \l_extract_tl {#1} \l_extract_seq
    \tl_gset:Nx \NumberValue { \seq_item:Nn \l_extract_seq { 2 } }
    \tl_gset:Nx \OtherValue { \seq_item:Nn \l_extract_seq { 3 } }
  }
\cs_new_protected:Npn \Test #1
  {
    \extract_number:n {#1}
    & \detokenize{#1} & \NumberValue & \OtherValue
  }
\ExplSyntaxOff
\begin{document}
\begin{tabular}{l>{\ttfamily}r>{\ttfamily}r>{\ttfamily}r}
  \toprule
             & \multicolumn{1}{r}{Input} & 
               \multicolumn{1}{r}{Digit} & \multicolumn{1}{r}{Non-digit} \\
  \midrule
   Decimal:  \Test{ 1.01abc}               \\
             \Test{+2.01abc}               \\ 
             \Test{-3.01abc}               \\
  \midrule
   Integer:  \Test{  abc}                  \\
             \Test{ 5abc}                  \\ 
             \Test{+6abc}                  \\
             \Test{-7abc}                  \\
  \midrule
   Floating Point: \Test{ 5.34abc}         \\
                   \Test{+6.34abc}         \\
                   \Test{-7.34abc}         \\
  \midrule
   Number Only:    \Test{3}                \\
                   \Test{3.2}              \\ 
                   \Test{-5.1}             \\
                   \Test{+5.1}             \\
  \midrule
   No Digits:      \Test{abc}              \\
  \midrule
   Formatted Text: \Test{  8$abc_1$}       \\ 
                   \Test{-8.2$abc_1$}      \\ 
                   \Test{+$abc_1$}         \\
                   \Test{$abc_1$}          \\
  \bottomrule
\end{tabular}
\end{document}

目前,该模块是“实验性的”,因此需要单独加载expl3,但我希望它能在不久的将来(年底之前)转移到“内核”。

其工作原理是,当我们进行正则表达式匹配时,捕获组会按从 0(完整匹配)向上索引的顺序存储。因此,我将第一个捕获组作为数字部分,将第二个捕获组作为非数字部分。请注意,我还删除了\s*这两个组中的任何前导空格:如果您错过了这一点,那么您还会将空格作为匹配的一部分。

还要注意,此处的结果已去标记化,因此如果您想要格式化文本,则需要\scantokens结果。(这里可以做类似的事情\scantokens\expandafter{\OtherValue}。)

答案3

如果您可以使用 luatex,则可以使用适当的解析器(下面的代码在 ConTeXt 中,只是因为我不知道在 LaTeX 中使用 luatex 的所有细节)。

 \startluacode
  local P, R, S, V, match = lpeg.P, lpeg.R, lpeg.S, lpeg.V, lpeg.match
  local Ct, C, Cs, Cc = lpeg.Ct, lpeg.C, lpeg.Cs, lpeg.Cc

  local format = string.format

  local digit    = R("09")
  local sign     = S('+-')
  local integer  = sign^0 * digit^0 -- NOTE: I'd rather use digit^1, but
                                    -- the requirements want to capture a
                                    --  single sign as well
  local float    = sign^0 * digit^0 * P('.') * digit^1
  local space    = P(" ")^0

  local number   = Cs(float + integer)
  local any      = Cs(P(1)^0)

  local number_value = Cc("\\global\\def\\NumberValue{%s}") * number / format
  local other_value  = Cc("\\global\\def\\OtherValue{%s}")  * any    / format
  local parser = Cs(space * number_value * other_value)

  function commands.extract_number(s)
      context(match(parser,s))
  end
\stopluacode

\unprotect
\def\extract#1%
    {\let\NumberValue\relax
     \let\OtherValue \relax
     \ctxcommand{extract_number(\!!bs\detokenize{#1}\!!es)}}
\protect

然后您可以按如下方式使用它。

\def\Test#1%
    {\extract{#1}%
     #1 \NC \NumberValue \NC \OtherValue}

\starttext

\starttabulate[|l|r|r|r|]
  \HL
  \NC           \NC Input \NC Digit \NC Non-Digit \NC \NR
  \HL
  \NC Decimal:  \NC \Test{ 1.01abc}               \NC \NR
  \NC           \NC \Test{+2.01abc}               \NC \NR 
  \NC           \NC \Test{-3.01abc}               \NC \NR
  \HL
  \NC Integer:  \NC \Test{  abc}                  \NC \NR
  \NC           \NC \Test{ 5abc}                  \NC \NR 
  \NC           \NC \Test{+6abc}                  \NC \NR
  \NC           \NC \Test{-7abc}                  \NC \NR
  \HL
  \NC Floating Point: \NC \Test{ 5.34abc}         \NC \NR
  \NC                 \NC \Test{+6.34abc}         \NC \NR
  \NC                 \NC \Test{-7.34abc}         \NC \NR
  \HL
  \NC Number Only:    \NC \Test{3}                \NC \NR
  \NC                 \NC \Test{3.2}              \NC \NR 
  \NC                 \NC \Test{-5.1}             \NC \NR
  \NC                 \NC \Test{+5.1}             \NC \NR
  \HL
  \NC No Digits:      \NC \Test{abc}              \NC \NR
  \HL
  \NC Formatted Text: \NC \Test{  8$abc_1$}       \NC \NR 
  \NC                 \NC \Test{-8.2$abc_1$}      \NC \NR 
  \NC                 \NC \Test{+$abc_1$}         \NC \NR
  \NC                 \NC \Test{$abc_1$}          \NC \NR
  \HL
\stoptabulate
\stoptext

这使

在此处输入图片描述

答案4

为了完整起见,我可以展示该问题的纯 TeX 解决方案。

\def\separeparts#1{\def\firstpart{}\def\listchars{0123456789.}\separepartsA#1\end}
\def\separepartsA#1{\isinlist{+-}#1%
   \iftrue
      \def\firstpart{#1}\expandafter\separepartsB
   \else 
      \def\next{\separepartsB#1}\expandafter\next
   \fi
}
\def\separepartsB#1{\isinlist\listchars#1%
   \iftrue
      \addto\firstpart#1%
      \ifx.#1\def\listchars{0123456789}\fi
      \expandafter\separepartsB
   \else
      \def\next{\separepartsC#1}\expandafter\next
   \fi
}
\def\separepartsC#1\end{\def\secondpart{#1}}

请注意,允许的数字列表\listchars包括小数点,但如果发现小数点,则\listchars重新定义,因为不允许第二个小数点。

此代码不需要任何包,仅opmac.tex使用两个宏。您可以从opmac.tex或此处复制并粘贴这些宏:

\def\isinlist#1#2#3{\def\tmp##1#2##2\end{\def\tmp{##2}%
   \ifx\tmp\empty \csname iffalse\expandafter\endcsname \else
                  \csname iftrue\expandafter\endcsname \fi}% end of \def\tmp
   \expandafter\tmp#1\endlistsep#2\end
}
\long\def\addto#1#2{\expandafter\def\expandafter#1\expandafter{#1#2}}

现在,您可以进行测试:

\def\test#1{\separeparts{#1}
   \immediate\write16{"#1" = "\firstpart" and "\secondpart"}
}

\test{+2.01abc}   % output: "+2.01abc" = "+2.01" and "abc"
\test{-3.01abc}   % output: "-3.01abc" = "-3.01" and "abc"

\test{  abc}      % output: " abc" = "" and "abc"
\test{ 5abc}      % output: " 5abc" = "5" and "abc"
\test{+6abc}      % output: "+6abc" = "+6" and "abc"

\test{1.23.36abc} % output: "1.23.36abc" = "1.23" and ".36abc"

您可以在纯 TeX、LaTeX 或 ConTeXt 中使用此代码。这无关紧要。该代码仅基于 TeX 基元。

相关内容