我想从字符串中提取前导数字和后续文本。我有一个想法,使用xstring
包从右侧吞噬字符,直到最终得到一个数字或一个空字符串,但想知道是否有更简单的方法。前导数字是直到第一个不是数字、句点或加号和减号的字符为止的所有文本。
真的不需要担心错误情况,例如:
- 额外的
+
,-
在数量内 - 第一个非数字后面还有更多数字
\ExtractLeadingNumber
因此,我应该获得以下输出,其中完成了和的定义\ExtractTralingNonDigits
:
代码:
\documentclass[border=2pt]{standalone}
\usepackage{booktabs}
\newcommand*{\ExtractLeadingNumber}[1]{#1}%
\newcommand*{\ExtractTralingNonDigits}[1]{#1}%
% ignore #2 and #3 as those are only needed to produce the desired output
\newcommand{\Test}[3]{#1&\ExtractLeadingNumber{#1}&\ExtractTralingNonDigits{#1}\\}%
%\newcommand{\Test}[3]{#1\\}% This produces desired output
\begin{document}
\begin{tabular}{l r r r}
& &Number &Non-Digits\\
\midrule
Decimal:
&\Test{ 1.01abc}{ 1.01}{abc}
&\Test{+2.01abc}{+2.01}{abc}
&\Test{-3.01abc}{-3.01}{abc}
\midrule
Integer:
&\Test{ abc}{ }{abc}
&\Test{ 5abc}{ 5}{abc}
&\Test{+6abc}{+6}{abc}
&\Test{-7abc}{-7}{abc}
\midrule
Floating Point:
&\Test{ 5.34abc}{ 5.34}{abc}
&\Test{+6.34abc}{+6.34}{abc}
&\Test{-7.34abc}{-7.34}{abc}
\midrule
Number Only:
&\Test{3}{3}{}
&\Test{3.2}{3.2}{}
&\Test{-5.1}{-5.1}{}
&\Test{+5.1}{+5.1}{}
\midrule
No Digits:
&\Test{abc}{}{abc}
\midrule
Formatted Text:
&\Test{ 8$abc_1$}{ 8}{$abc_1$}
&\Test{-8.2$abc_1$}{-8.2}{$abc_1$}
&\Test{+$abc_1$}{+}{$abc_1$}
&\Test{$abc_1$}{}{$abc_1$}% no digits
\end{tabular}
\end{document}
答案1
这是一个解决方案xstring
:
\documentclass[border=2pt]{standalone}
\usepackage{booktabs}
\usepackage{xstring}
\makeatletter
% first, need to fix a bug in xstring:
\@xs@newmacro\IfDecimal{}{1}{0}{%
\@xs@formatnumber{#1}\@xs@reserved@A
\decimalpart\z@
\afterassignment\@xs@defafterinteger\integerpart\@xs@reserved@A\relax\@xs@nil
\expandafter\@xs@testdot\@xs@afterinteger\@xs@nil
\ifx\@empty\@xs@afterdecimal\expandafter\@firstoftwo\else\expandafter\@secondoftwo\fi}
\newcommand*\Test[1]{%
\IfBeginWith{#1}{ }{\StrBehind{#1}{ }[\temp@@]}{\def\temp@@{#1}}%
\IfDecimal\temp@@
{\def\temp@{#1&}}
{\def\temp@{#1&}%
\StrBefore{#1}\@xs@afterdecimal[\temp@@]%
\expandafter\g@addto@macro\expandafter\temp@\expandafter{\temp@@&}%
\expandafter\g@addto@macro\expandafter\temp@\expandafter{\@xs@afterdecimal}%
}%
\temp@\\}
\makeatother
\begin{document}
\begin{tabular}{l r r r}
& &Number &Non-Digits\\
\midrule
Decimal:
&\Test{ 1.01abc}
&\Test{+2.01abc}
&\Test{-3.01abc}
\midrule
Integer:
&\Test{ abc}
&\Test{ 5abc}
&\Test{+6abc}
&\Test{-7abc}
\midrule
Floating Point:
&\Test{ 5.34abc}
&\Test{+6.34abc}
&\Test{-7.34abc}
\midrule
Number Only:
&\Test{3}
&\Test{3.2}
&\Test{-5.1}
&\Test{+5.1}
\midrule
No Digits:
&\Test{abc}
\midrule
Formatted Text:
&\Test{ 8$abc_1$}
&\Test{-8.2$abc_1$}
&\Test{+$abc_1$}
&\Test{$abc_1$}
\end{tabular}
\end{document}
编辑:这是如何处理\ExtractLeadingNumber
和\ExtractTralingNonDigits
\makeatletter
% first, need to fix a bug in xstring:
\@xs@newmacro\IfDecimal{}{1}{0}{%
\@xs@formatnumber{#1}\@xs@reserved@A
\decimalpart\z@
\afterassignment\@xs@defafterinteger\integerpart\@xs@reserved@A\relax\@xs@nil
\expandafter\@xs@testdot\@xs@afterinteger\@xs@nil
\ifx\@empty\@xs@afterdecimal\expandafter\@firstoftwo\else\expandafter\@secondoftwo\fi}
\newcommand*\ExtractLeadingNumber[1]{%
\IfBeginWith{#1}{ }{\StrBehind{#1}{ }[\temp@@]}{\def\temp@@{#1}}%
\IfDecimal\temp@@{#1}{\StrBefore{#1}\@xs@afterdecimal}%
}
\newcommand*\ExtractTralingNonDigits[1]{%
\IfBeginWith{#1}{ }{\StrBehind{#1}{ }[\temp@@]}{\def\temp@@{#1}}%
\IfDecimal\temp@@{}\@xs@afterdecimal
}
\makeatother
\newcommand*\Test[1]{#1&\ExtractLeadingNumber{#1}&\ExtractTralingNonDigits{#1}\\}
答案2
l3regex
使用 LaTeX3模块的方法
\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage{array,booktabs,expl3,l3regex}
\ExplSyntaxOn
\tl_new:N \l_extract_tl
\regex_set:Nn \l_extract_tl { ^\s*([+-]?\d*\.?\d*)\s*(.*) }
\seq_new:N \l_extract_seq
\tl_new:N \NumberValue
\tl_new:N \OtherValue
\cs_new_protected:Npn \extract_number:n #1
{
\regex_extract_once:NnN \l_extract_tl {#1} \l_extract_seq
\tl_gset:Nx \NumberValue { \seq_item:Nn \l_extract_seq { 2 } }
\tl_gset:Nx \OtherValue { \seq_item:Nn \l_extract_seq { 3 } }
}
\cs_new_protected:Npn \Test #1
{
\extract_number:n {#1}
& \detokenize{#1} & \NumberValue & \OtherValue
}
\ExplSyntaxOff
\begin{document}
\begin{tabular}{l>{\ttfamily}r>{\ttfamily}r>{\ttfamily}r}
\toprule
& \multicolumn{1}{r}{Input} &
\multicolumn{1}{r}{Digit} & \multicolumn{1}{r}{Non-digit} \\
\midrule
Decimal: \Test{ 1.01abc} \\
\Test{+2.01abc} \\
\Test{-3.01abc} \\
\midrule
Integer: \Test{ abc} \\
\Test{ 5abc} \\
\Test{+6abc} \\
\Test{-7abc} \\
\midrule
Floating Point: \Test{ 5.34abc} \\
\Test{+6.34abc} \\
\Test{-7.34abc} \\
\midrule
Number Only: \Test{3} \\
\Test{3.2} \\
\Test{-5.1} \\
\Test{+5.1} \\
\midrule
No Digits: \Test{abc} \\
\midrule
Formatted Text: \Test{ 8$abc_1$} \\
\Test{-8.2$abc_1$} \\
\Test{+$abc_1$} \\
\Test{$abc_1$} \\
\bottomrule
\end{tabular}
\end{document}
目前,该模块是“实验性的”,因此需要单独加载expl3
,但我希望它能在不久的将来(年底之前)转移到“内核”。
其工作原理是,当我们进行正则表达式匹配时,捕获组会按从 0(完整匹配)向上索引的顺序存储。因此,我将第一个捕获组作为数字部分,将第二个捕获组作为非数字部分。请注意,我还删除了\s*
这两个组中的任何前导空格:如果您错过了这一点,那么您还会将空格作为匹配的一部分。
还要注意,此处的结果已去标记化,因此如果您想要格式化文本,则需要\scantokens
结果。(这里可以做类似的事情\scantokens\expandafter{\OtherValue}
。)
答案3
如果您可以使用 luatex,则可以使用适当的解析器(下面的代码在 ConTeXt 中,只是因为我不知道在 LaTeX 中使用 luatex 的所有细节)。
\startluacode
local P, R, S, V, match = lpeg.P, lpeg.R, lpeg.S, lpeg.V, lpeg.match
local Ct, C, Cs, Cc = lpeg.Ct, lpeg.C, lpeg.Cs, lpeg.Cc
local format = string.format
local digit = R("09")
local sign = S('+-')
local integer = sign^0 * digit^0 -- NOTE: I'd rather use digit^1, but
-- the requirements want to capture a
-- single sign as well
local float = sign^0 * digit^0 * P('.') * digit^1
local space = P(" ")^0
local number = Cs(float + integer)
local any = Cs(P(1)^0)
local number_value = Cc("\\global\\def\\NumberValue{%s}") * number / format
local other_value = Cc("\\global\\def\\OtherValue{%s}") * any / format
local parser = Cs(space * number_value * other_value)
function commands.extract_number(s)
context(match(parser,s))
end
\stopluacode
\unprotect
\def\extract#1%
{\let\NumberValue\relax
\let\OtherValue \relax
\ctxcommand{extract_number(\!!bs\detokenize{#1}\!!es)}}
\protect
然后您可以按如下方式使用它。
\def\Test#1%
{\extract{#1}%
#1 \NC \NumberValue \NC \OtherValue}
\starttext
\starttabulate[|l|r|r|r|]
\HL
\NC \NC Input \NC Digit \NC Non-Digit \NC \NR
\HL
\NC Decimal: \NC \Test{ 1.01abc} \NC \NR
\NC \NC \Test{+2.01abc} \NC \NR
\NC \NC \Test{-3.01abc} \NC \NR
\HL
\NC Integer: \NC \Test{ abc} \NC \NR
\NC \NC \Test{ 5abc} \NC \NR
\NC \NC \Test{+6abc} \NC \NR
\NC \NC \Test{-7abc} \NC \NR
\HL
\NC Floating Point: \NC \Test{ 5.34abc} \NC \NR
\NC \NC \Test{+6.34abc} \NC \NR
\NC \NC \Test{-7.34abc} \NC \NR
\HL
\NC Number Only: \NC \Test{3} \NC \NR
\NC \NC \Test{3.2} \NC \NR
\NC \NC \Test{-5.1} \NC \NR
\NC \NC \Test{+5.1} \NC \NR
\HL
\NC No Digits: \NC \Test{abc} \NC \NR
\HL
\NC Formatted Text: \NC \Test{ 8$abc_1$} \NC \NR
\NC \NC \Test{-8.2$abc_1$} \NC \NR
\NC \NC \Test{+$abc_1$} \NC \NR
\NC \NC \Test{$abc_1$} \NC \NR
\HL
\stoptabulate
\stoptext
这使
答案4
为了完整起见,我可以展示该问题的纯 TeX 解决方案。
\def\separeparts#1{\def\firstpart{}\def\listchars{0123456789.}\separepartsA#1\end}
\def\separepartsA#1{\isinlist{+-}#1%
\iftrue
\def\firstpart{#1}\expandafter\separepartsB
\else
\def\next{\separepartsB#1}\expandafter\next
\fi
}
\def\separepartsB#1{\isinlist\listchars#1%
\iftrue
\addto\firstpart#1%
\ifx.#1\def\listchars{0123456789}\fi
\expandafter\separepartsB
\else
\def\next{\separepartsC#1}\expandafter\next
\fi
}
\def\separepartsC#1\end{\def\secondpart{#1}}
请注意,允许的数字列表\listchars
包括小数点,但如果发现小数点,则\listchars
重新定义,因为不允许第二个小数点。
此代码不需要任何包,仅opmac.tex
使用两个宏。您可以从opmac.tex
或此处复制并粘贴这些宏:
\def\isinlist#1#2#3{\def\tmp##1#2##2\end{\def\tmp{##2}%
\ifx\tmp\empty \csname iffalse\expandafter\endcsname \else
\csname iftrue\expandafter\endcsname \fi}% end of \def\tmp
\expandafter\tmp#1\endlistsep#2\end
}
\long\def\addto#1#2{\expandafter\def\expandafter#1\expandafter{#1#2}}
现在,您可以进行测试:
\def\test#1{\separeparts{#1}
\immediate\write16{"#1" = "\firstpart" and "\secondpart"}
}
\test{+2.01abc} % output: "+2.01abc" = "+2.01" and "abc"
\test{-3.01abc} % output: "-3.01abc" = "-3.01" and "abc"
\test{ abc} % output: " abc" = "" and "abc"
\test{ 5abc} % output: " 5abc" = "5" and "abc"
\test{+6abc} % output: "+6abc" = "+6" and "abc"
\test{1.23.36abc} % output: "1.23.36abc" = "1.23" and ".36abc"
您可以在纯 TeX、LaTeX 或 ConTeXt 中使用此代码。这无关紧要。该代码仅基于 TeX 基元。