如果 latex3 字符串中存在某些单词,我想执行一些代码。我想出了自己的实现,基本上使用 进行循环\str_map_inline
并使用 跟踪当前单词的最后一部分\str_put_right
,但结果比我预期的要慢 500 倍(与\str_if_in:NnTF
大致执行相同数量的操作相比),这使我的整个库在一个微小的操作上慢了 20%。知道我做错了什么吗?
梅威瑟:
\documentclass{article}
\usepackage{l3benchmark}
\begin{document}
Test
\ExplSyntaxOn
%%%%%%%%%%%%%% Library to make more efficient
% \__robExt_auto_forward_words:N \commandToRunOnEachWord \stringToSearchOn
\cs_set:Nn \__robExt_auto_forward_words:NN {
% \l_tmpa_str will contain the current word read so far
\str_set:Nn \l_tmpa_str {}%
\str_map_inline:Nn #2 {
% \token_case_charcode:NnTF ##1 {} {} {}
\__robExt_if_letter:nTF {##1} {
\str_put_right:Nn \l_tmpa_str {##1}
}{
\str_if_empty:NTF \l_tmpa_str { } {
% if the string is empty, we run the command on the string
#1 \l_tmpa_str%
\str_set:Nn \l_tmpa_str {}% we reset its value
}
}
}
}
%% \__robExt_if_letter:nTF {char} {true} {false} tests if an element is a letter
%% https://tex.stackexchange.com/a/700864/116348
\prg_new_conditional:Npnn \__robExt_if_letter:n #1 { TF }
{
\bool_lazy_or:nnTF
{
\bool_lazy_and_p:nn
{ \int_compare_p:nNn { `#1 } > { `a - 1 } }
{ \int_compare_p:nNn { `#1 } < { `z + 1 } }
}
{
\bool_lazy_and_p:nn
{ \int_compare_p:nNn { `#1 } > { `A - 1 } }
{ \int_compare_p:nNn { `#1 } < { `Z + 1 } }
}
\prg_return_true:
\prg_return_false:
}
% \robExt_register_match_word {namespace that defaults to empty} {word} {code to run if word is present}
\cs_set:Nn \robExt_register_match_word:nnn {
\cs_set:cn {l__robExt_execute_if_word_present_#1_#2:} {#3}
}
% \robExt_try_to_execute_if_match_word:nn {namespace} {word}
\cs_set:Nn \robExt_try_to_execute_if_match_word:nn {
\cs_if_exist:cTF {l__robExt_execute_if_word_present_#1_#2:} {%
\cs_if_exist:cTF {l__robExt_execute_if_word_present_#1_#2__already_forwarded:}{\message{Already forwarded}}{
\use:c {l__robExt_execute_if_word_present_#1_#2:}%
% define it so that we do not import twice next time
\cs_set:cx {l__robExt_execute_if_word_present_#1_#2__already_forwarded:} {}
}
} { }
}
\cs_generate_variant:Nn \robExt_try_to_execute_if_match_word:nn { nV }
%%%%%%%%%%%%%% Usage
\robExt_register_match_word:nnn {} {grapes} {I~like~grapes.\\}
\robExt_register_match_word:nnn {} {grapefruits} {In~hate~grapefruits.\\}
%% This string is already created for other reasons, so you can safely assume it exists
\str_new:N \l_my_str
\str_set:Nn \l_my_str {In~the~market~you~can~find~some~grapes~and~grapefruits.}
My~string~is~''\l_my_str''.\newline
\NewDocumentCommand{\testAutoForward}{}{
\cs_set:Nn \__robExt_tmp_fct:N {
\message{I will try to run ##1}
\robExt_try_to_execute_if_match_word:nV {} ##1
}
\__robExt_auto_forward_words:NN \__robExt_tmp_fct:N \l_my_str
}
\cs_new:Nn \robExt_benchmark_me:n {
\benchmark:n {#1}
Number~of~operations~taken~by:\par\texttt{\detokenize{#1}}\par~is~\fp_to_scientific:N\g_benchmark_ops_fp.
Time~taken~by:\par\texttt{\detokenize{#1}}\par is~\fp_to_scientific:N\g_benchmark_time_fp.
}
\fp_new:N \l_robExt_fp
\fp_set_eq:NN \l_robExt_fp \g_benchmark_time_fp
\robExt_benchmark_me:n {\testAutoForward}
\par Second test (reference time I'd like to reach):\par
\robExt_benchmark_me:n {
\str_if_in:NnTF \l_my_str {grapes}{%
% Not sure why I cannot print this with getting "TeX capacity exceeded", I guess because it repeats it a lot?
% I~like~grapes.
}{}
\str_if_in:NnTF \l_my_str {grapefruits}{}{}
}
% Not sure why this prints "ERROR: Use of \??? doesn't match its definition."
% The~reference~implementation~is~\fp_eval:n{(\g_benchmark_time_fp) / (\l_robExt_fp)}~times~faster.
\ExplSyntaxOff
\end{document}
编辑
为了更准确地回答评论,我有一个字符串(latex3,即我认为所有内容都应该是聊天代码其他或空格)\mystring
,并且我想提取所有单词([a-zA-Z]+
)来运行some code
可能已通过注册的相应单词\registerWord{myWord}{some code}
。因此,如果\mystring
包含:
In the market you can find some grapes, apples, and grapefruits.
如果我跑\registerWord{grapes}{\message{I like grapes}}
,那么跑步\extractAndExecuteWords \mystring
就应该跑\message{I like grapes}
。
我第一次尝试使用普通乳胶(但出现多个问题:字符串中的空格被删除,而且我找不到如何在宏中插入括号,因此我插入了 bgroups,但它并不等效,并且如何向宏中添加单个花括号?给了我奇怪的错误):
\documentclass{article}
\begin{document}
\ExplSyntaxOn
\str_new:N \l_my_str
\str_set:Nn \l_my_str {In~the~market~you~can~find~some~grapes, apples,~and~grapefruits.}
\let\myString\l_my_str
\ExplSyntaxOff
\makeatletter
% \autoForwardWords \stringToSearchOn
\def\autoForwardWords#1#2{%
\def\robExt@tmp@word{}%
\let\robExt@cmd@to@run#1%
\message{AAAAAAAAA #2}%
\edef\robExt@list@of@commands{%
\noexpand\robExt@cmd@to@run\noexpand\bgroup%
\expandafter\autoForwardWords@aux#2\robExt@end@of@string% \autoForwardWords@aux is the end of the string
}%
%% This shows the command to run, with two issues:
%% 1) it removed spaces in the string
%% 2) I can't find how to add braces instead of bgroups.
%% I tried https://tex.stackexchange.com/questions/506613/how-can-i-add-a-single-curly-brace-to-a-macro
%% but I was getting errors.
%%\show\robExt@list@of@commands
\robExt@list@of@commands
}
\def\autoForwardWords@aux#1{%
\ifx#1\robExt@end@of@string% We arrived at the end of the string
\noexpand\bgroup%
\else%
\ifnum`#1>\numexpr `a-1\relax%
\ifnum`#1<\numexpr `z+1\relax%
#1%
\else%
\noexpand\egroup\noexpand\robExt@cmd@to@run\noexpand\bgroup%
\fi%
\else%
\ifnum`#1>\numexpr `A-1\relax%
\ifnum`#1<\numexpr `Z+1\relax%
#1%
\else%
\noexpand\egroup\noexpand\robExt@cmd@to@run\noexpand\bgroup%
\fi%
\else%
\noexpand\egroup\noexpand\robExt@cmd@to@run\noexpand\bgroup%
\fi%
\fi%
\expandafter\autoForwardWords@aux% let it grap the next character
\fi%
}
\def\robExt@end@of@string{}
\def\printWord#1{I saw --((#1))--.}
\autoForwardWords\printWord\myString
\makeatother
\end{document}
答案1
你可以尝试这个代码:
\let\ea=\expandafter
\def\scanmacro#1{%
\bgroup \settocomma { };:."?!@+=\{\}\relax
\lowercase\ea{\ea\gdef\ea#1\ea{#1}}%
\edef#1{\detokenize\ea{#1}}%
% \message{\string#1: \meaning#1} % prints the modified format of the scanned macro
\ea\egroup
\ea\wordscan#1,\relax,%
}
\def\settocomma #1{\ifx\relax#1\else \lccode`#1=`, \ea\settocomma\fi}
\def\wordscan#1,{\ifx\relax#1\empty\else
% \message{{#1}} % prints each scanned "word"
\ifcsname doword:#1\endcsname \csname doword:#1\endcsname \fi
\ea\wordscan\fi
}
\def\regword#1#2{\ea\gdef\csname doword:\string#1\endcsname{#2}}
\regword {grapes} {\message{I like grapes.}}
\regword {find} {\message{We are searching somewhat.}}
\def\mystring{In the {market} you can find some grapes, apples? and grapefruits.}
\scanmacro\mystring % runs \message{We are seachring somewhat.}
% and \message{I like grapes.}
我们使用 将所有出现的非字母字符替换为逗号,\lowercase
并使用 将这些逗号的 catcode 重新设置为“普通逗号” \detokenize
。因此,宏
In the {market} you can find some grapes, apples? and grapefruits.
修改后如下所示:
in,the„market„you,can,find,some,grapes„apples„and,grapefruits,
\scanword
然后我们用逗号分隔的参数扫描这样的宏#1
,并单独处理每个扫描到的单词。请注意,有几个“空词”。这没有问题,因为空词没有被注册。删除,,
之前的出现\scanword
会增加更多无用的计算时间。
您必须将所有不同于字母的字符(您希望在扫描的宏中使用)写入\settocomma
由 确定后的字符列表中\relax
。请注意,第一个{ }
表示空格,最后一个\{\}
表示{
和}
,因此它们也被替换为逗号。
此代码中只有 内的控制序列\mymacro
未解析。我们假设它们不存在于此处。如果不是这样,那么您必须添加第二个
\edef#1{\detokenize\ea{#1}}%
就在 之前\lowercase
。您可以决定 是否\word
应解释为word
(添加\\
到“到逗号”字符列表)或应忽略(不添加\\
到“到逗号”)。在第二种情况下,您可以注册\word
与 不同的东西word
。
编辑
由于您关于保留大写字母的评论,我创建了另一种方法,该方法不使用\lowercase
,但对每个标记运行一个宏,以便将非字母字符替换为逗号。这种方法的优点是您不需要对“其他字符”列表(可能非常大)运行宏,也不需要对所有大写字母列表(在 Unicode 集中也可能非常大)运行宏。缺点是每个标记的宏处理可能不如 高效\lowercase
。
\let\ea=\expandafter
\def\scanmacro#1{%
\bgroup
\edef#1{\detokenize\ea{#1}}%
\edef#1{\ea\replspace#1 \relax}% replaces spaces to comma
\edef#1{\ea\replothers#1\relax}% replaces other characters to comma
% \message{\string#1: \meaning#1} % prints the modified format
\ea\egroup \ea\wordscan#1,\relax,%
}
\def\replspace#1 #2{#1\ifx#2\relax \else ,#2\ea\replspace\fi}
\def\replothers#1{\ifx#1\relax\else \ifnum\lccode`#1=0 ,\else #1\fi \ea\replothers\fi}
\def\wordscan#1,{\ifx\relax#1\empty\else
% \message{{#1}} % prints each scanned "word"
\ifcsname doword:#1\endcsname \csname doword:#1\endcsname \fi
\ea\wordscan\fi
}
\def\regword#1#2{\ea\gdef\csname doword:\string#1\endcsname{#2}}
\regword {grapes} {\message{I like grapes}}
\regword {find} {\message{We are searching somewhat}}
\def\mystring{In the {market} you can find some grapes, apples? and grapefruits.}
\scanmacro\mystring % runs \message{We are seachring somewhat}
% and \message{I like grapes}
\bye
主要概念是相同的:我们将空格和非字母字符替换为逗号并运行\wordscan
。我们将非字母字符识别为其\lccode
等于零的字符。