我刚刚编写了一个涉及简单正则表达式的代码,但它比字符串搜索慢得多(至少慢 100 倍)。因此,为了优化它,我尝试编译正则表达式,不幸的是它给出了编译错误:
Braced quantifier '{' may not be followed by '_'
知道我做错了什么吗?
梅威瑟:
\documentclass[]{article}
\begin{document}
\ExplSyntaxOn
\cs_generate_variant:Nn \regex_extract_all:nnN { VVN, nVN }
\str_new:N \l__robExt_my_str
\str_set:Nn \l__robExt_my_str {I like \vegetable and \fruit.}
% Without compilation, this is fine
\regex_extract_all:nVN { \\[A-Za-z]+ } \l__robExt_my_str \l__robExt_output_seq
\show\l__robExt_output_seq
% We compile the regex: we get an error
\regex_const:Nn \l__robExt_macro_regex { \\[A-Za-z]+ }
\regex_extract_all:VVN \l__robExt_macro_regex \l__robExt_my_str \l__robExt_output_seq
\show\l__robExt_output_seq
\ExplSyntaxOff
\end{document}
答案1
我找到了解决方案,我需要NVN
从NnN
和 而不是生成变体nnN
。它可能快了 25%,但仍然比我想象的要慢得多。如果有人知道为什么正则表达式很慢,我很乐意听听。
\documentclass[]{article}
\begin{document}
\ExplSyntaxOn
\cs_generate_variant:Nn \regex_extract_all:nnN { VVN, nVN }
\cs_generate_variant:Nn \regex_extract_all:NnN { NVN }
\str_new:N \l__robExt_my_str
\str_set:Nn \l__robExt_my_str {I like \vegetable and \fruit.}
% Without compilation, this is fine
\regex_extract_all:nVN { \\[A-Za-z]+ } \l__robExt_my_str \l__robExt_output_seq
\show\l__robExt_output_seq
% We compile the regex: we get an error
\regex_const:Nn \l__robExt_macro_regex { \\[A-Za-z]+ }
\regex_show:N \l__robExt_macro_regex
\regex_extract_all:NVN \l__robExt_macro_regex \l__robExt_my_str \l__robExt_output_seq
\show\l__robExt_output_seq
\ExplSyntaxOff
\end{document}
编辑
我不知道为什么,但在我的测试中,正则表达式搜索似乎\regex_match:nVTF
或\regex_extract_all:NVN
比慢得多\str_if_in:NnTF
。这是我的基准测试文件:
\documentclass{article}
\usepackage{amsmath}
\usepackage{forest}
% grab latest .sty file from https://github.com/leo-colisson/robust-externalize/
\usepackage{robust-externalize}
\robExtConfigure{
% If you do not want to enable shell-escape, just manually
% inspect benchmark-robExt-compile-missing-figures.sh
% and run "bash benchmark-robExt-compile-missing-figures.sh"
enable fallback to manual mode,
compile in parallel after=3,
}
\cacheEnvironment{forest}{
latex,
add to preamble={
\usepackage{forest}
},
% Uncomment one line at a time to compare efficiency:
% (ordered by faster -> slower)
% 2.86s (adding one "if matches" seems to add around 0.01s, try to add dummy matches to test)
%if matches={mainName}{forward=\mainName},
% 3.37s (adding one "if matches" seems to add around 1s, if you uncomment the last the compilation will go to 14s)
%if matches regex={mainName}{forward=\mainName},
% 5.21s ==> that's the time I'd like to optimize the most.
auto forward,
%%% Just to see that matches takes a really different time from matches regex:
% Adds ~10s!
% if matches regex={xxxxA}{},
% if matches regex={xxxxB}{},
% if matches regex={xxxxC}{},
% if matches regex={xxxxD}{},
% if matches regex={xxxxE}{},
% if matches regex={xxxxF}{},
% if matches regex={xxxxG}{},
% if matches regex={xxxxH}{},
% if matches regex={xxxxI}{},
% if matches regex={xxxxJ}{},
}
\NewDocumentCommandAutoForward{\mainName}{}{John}
\begin{document}
\foreach \j in {0,...,200}{
% This is always the same picture, this mostly drastically reduces the first compilation time
% without significantly changing the next runs, which is what we try to optimize right now
\begin{forest}
[\mainName
[\mainName [\mainName]]
[\mainName
[\mainName [\mainName]]
[\mainName[\mainName]]
[\mainName[D[a]][NP[\mainName]]]
]
]
\end{forest}\\
}
\end{document}