如何通过正则表达式读取文件行并提取和评估匹配行中包含的数字?

如何通过正则表达式读取文件行并提取和评估匹配行中包含的数字?

是否可以推广使用某种正则表达式,而不仅仅是索引,然后使用一些可修改的任意分隔符(例如空格)提取和评估所选行开头或结尾的数字?谢谢

% 样本输出:

\ReadFile{\myarray}{somefile.txt}

\myarray{text:}{end}[]
11
\myarray{3}{end}[]
11

\myarray{text:}{beginning}[]
NA
\myarray{13}{beginning}[]
NA

\myarray{_top_}{end}[]
NA
\myarray{_top_}{beginning}[]
46 %45+1+0

\myarray{new stuff}{end}[]
NA
\myarray{three}{beginning}[]
93 %90+3+0

% 示例乳胶代码file.tex

\documentclass{article}
    \usepackage{xparse}
    
    \ExplSyntaxOn
    \ior_new:N \g_hringriin_file_stream
    
    \NewDocumentCommand{\ReadFile}{mm}
     {
      \hringriin_read_file:nn { #1 } { #2 }
      \cs_new:Npn #1 ##1
       {
        \str_if_eq:nnTF { ##1 } { * }
          { \seq_count:c { g_hringriin_file_ \cs_to_str:N #1 _seq } }
          { \seq_item:cn { g_hringriin_file_ \cs_to_str:N #1 _seq } { ##1 } }
       }
     }
    
    \cs_new_protected:Nn \hringriin_read_file:nn
     {
      \ior_open:Nn \g_hringriin_file_stream { #2 }
      \seq_gclear_new:c { g_hringriin_file_ \cs_to_str:N #1 _seq }
      \ior_map_inline:Nn \g_hringriin_file_stream
       {
        \seq_gput_right:cx 
         { g_hringriin_file_ \cs_to_str:N #1 _seq }
         { \tl_trim_spaces:n { ##1 } }
       }
      \ior_close:N \g_hringriin_file_stream
     }

% 样本数据somearray.txt

File: file.tex
Encoding: ascii
Words in text: 11
Words in headers: 12
Words outside text (captions, etc.): 13
Number of headers: 31
Number of floats/tables/figures: 32
Number of math inlines: 33
Number of math displayed: 34
Subcounts:
  text+headers+captions (#headers/#floats/#inlines/#displayed)
  45+1+0 (1/0/0/0) _top_
  56+1+0 (1/0/1/0) Section: Beginning
  67+2+0 (1/0/0/0) Section: Main part
  78+5+0 (1/0/1/1) Subsection: Sub part one new stuff
  894+5+0 (1/0/2/0) Subsection: Sub part two old stuff 
  90+3+0 (1/0/0/0) Subsection: Sub part three
  12+1+0 (1/0/0/0) Section: End

答案1

在此处输入图片描述

这里不需要正则表达式,您可以使用自定义分隔的宏参数来分离行,并使用自定义宏来访问值。





\documentclass{article}

\newread\fl
\openin\fl=somearray.txt

\def\splitlineA#1:#2\relax{%
\def\lastline{#1}%
\ifx\lastline\Subcountstest
\let\splitline\splitlineB
\else
\expandafter\gdef\csname FL-#1\endcsname{#2}%
\fi
}
\def\splitlineB#1+#2+#3 (#4/#5/#6/#7)#8\relax{%
\expandafter\gdef\csname FL-#8\endcsname{{#1}{#2}{#3}{#4}{#5}{#6}{#7}}%
}
 
\def\Subcountstest{Subcounts}
{
\catcode`\#=12
\let\splitline\splitlineA
\makeatletter
\endlinechar=-1
\loop
\read\fl to \tmp
\ifx\@empty\tmp
\else
\expandafter\splitline\tmp\relax
\repeat
}

\def\FLval#1{\csname FL-#1\endcsname}
\def\FLsubcount#1#2{%
  \def\tmp##1##2##3##4##5##6##7{###2}%
  \expandafter\expandafter\expandafter\tmp\csname FL- #1\endcsname}

\begin{document}

\textbf{Words in headers} = \FLval{Words in headers}

\textbf{Number of headers} = \FLval{Number of headers}

\textbf{text} in \textbf{Subsection: Sub part three} = \FLsubcount{Subsection: Sub part three}{1}

\textbf{inlines} in \textbf{Subsection: Sub part two old stuff}  = \FLsubcount{Subsection: Sub part two old stuff}{6}



\end{document}

评论中请求的版本将 #1+#2+#3 相加并丢弃接下来的 4 个数字

在此处输入图片描述

\documentclass{article}

\makeatletter
\newread\fl
\openin\fl=somearray.txt

\def\splitlineA#1:#2\relax{%
\def\lastline{#1}%
\ifx\lastline\Subcountstest
\let\splitline\splitlineB
\else
\expandafter\gdef\csname FL-#1\endcsname{#2}%
\fi
}
\def\splitlineB#1+#2+#3 (#4/#5/#6/#7)#8\relax{%
\def\tmpa{#1}\ifx\tmpa\texttest\else
\expandafter\xdef\csname FL-\@firstofone#8\endcsname{\the\numexpr#1+#2+#3}%
\fi
}

\def\texttest{text}
\def\Subcountstest{Subcounts}
{
\catcode`\#=12
\let\splitline\splitlineA
\endlinechar=-1
\loop
\read\fl to \tmp
\ifx\@empty\tmp
\else
\expandafter\splitline\tmp\relax
\repeat
}

\makeatother

\def\FLval#1{\csname FL-#1\endcsname}


\begin{document}

\textbf{Words in headers} = \FLval{Words in headers}

\textbf{Number of headers} = \FLval{Number of headers}

\textbf{Subsection: Sub part three} = \FLval{Subsection: Sub part three}

\textbf{Subsection: Sub part two old stuff}  = \FLval{Subsection: Sub part two old stuff}



\end{document}

相关内容