Latex3 编程和 tex 原语:如何“执行后”而不是 expandafter

Latex3 编程和 tex 原语:如何“执行后”而不是 expandafter

这个问题有点长。我希望它不是太具体。

我正在尝试扫描一些 tex 文档,并希望直接生成文本文件。我将通过逐个扫描 tex 输入标记来实现此目的。类别代码为 11 或 12 的字符将写入字符串。宏将展开为不可扩展项,然后继续。

但是,当遇到不可扩展的基元时,我的代码会失败(可能是因为它进入了无限循环,基元会扩展为自身)。我想要做的是将不可扩展的基元连同它们的参数一起传递给 tex 引擎,然后继续扫描。我只是不确定如何实现这一点。一种方法可能是为每个基元定义一个单独的宏,该宏会收集基元及其参数,然后执行基元并继续扫描。但是也许有更简单的解决方案。

第一个示例文档有效并演示了我想要做的事情。

第二个示例由于文档主体中的 \def 原语而失败。

第一个测试文档testdoc.tex:

\documentclass{article}

\usepackage{madtohtml}

\def\HELLO#1{HELLO #1}

\begin{document}
Lorem ipsum dolor sit amet, consectetur adipiscing elit. \if ab A IS B \else A IS NOT B \fi. Nullam et volutpat nulla. \if cc C IS C \else C IS NOT C \fi. Suspendisse ultrices tortor eu elit hendrerit tristique. \HELLO{WORLD}. Praesent ut viverra mauris.
\end{document}

第一个测试文档testdoc.txt的结果:

Lorem ipsum dolor sit amet,consectetur adipiscing elit。 A 不是 B。一切都归于虚无,一切都归于虚无。 C 就是 C。 Suspendisse ultrices tortor eu elit hendrerit tristique。你好世界。 Praesent ut viverra mauris。

第二个测试文档testdoc-2.tex:

\documentclass{article}

\usepackage{madtohtml}

\begin{document}
Suspendisse ultrices tortor eu elit hendrerit tristique. \def\HELLO#1{HELLO #1} \HELLO{WORLD}. Praesent ut viverra mauris.
\end{document}

摘自样式文件 madtohtml.sty:

\ProvidesPackage{madtohtml}

\RequirePackage{xparse}
\RequirePackage{etoolbox}

\ExplSyntaxOn




%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Initialise html string and create methods for appending stuff %%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\str_new:N\g_madtohtml_htmlstring_str
\cs_new:Npn\madtohtml_htmlstring_append:n #1 {\str_gput_right:Nn\g_madtohtml_htmlstring_str{#1}}
\cs_generate_variant:Nn\madtohtml_htmlstring_append:n{x}
\NewDocumentCommand{\@madtohtml@htmlstring@append@n}{m}{\madtohtml_htmlstring_append:n{#1}}
\NewDocumentCommand{\@madtohtml@htmlstring@append@x}{m}{\madtohtml_htmlstring_append:x{#1}}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Start and stop scanning of tex input %%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\NewDocumentCommand{\MYSTARTSCAN}{}{\madtohtml_scanner_scan_start:}
\NewDocumentCommand{\MYENDSCAN}{}{ENDENDENDENDENDEND}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Scanner: scan tex input token by token %%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\cs_new:Nn\madtohtml_scanner_scan_start:{\madtohtml_scanner_scan_start_a:\madtohtml_scanner_main:}
\cs_new:Nn\madtohtml_scanner_scan_start_a:{SCAN\c_space_tl STARTS}% Just for seeing if it works
\cs_new:Nn\madtohtml_scanner_scan_stop:{\madtohtml_scanner_scan_stop_a:N}
\cs_new:Nn\madtohtml_scanner_scan_stop_a:N{SCAN\c_space_tl STOPS}% Just for seeing if it works

\cs_new:Nn\madtohtml_scanner_main:{\peek_after:Nw\madtohtml_scanner_main_a:}%check what is next token
\cs_new:Nn\madtohtml_scanner_main_a:{\madtohtml_scanner_tests:n{\l_peek_token}}
\cs_new:Nn\madtohtml_scanner_tests:n{%
    \token_if_eq_meaning:NNTF{#1}{\MYENDSCAN}{\madtohtml_scanner_scan_stop:}{%
        \token_if_primitive:NTF{#1}{%
            \token_if_expandable:NTF{#1}%
                {\madtohtml_scanner_doexpprim:n{#1}}%
                {\madtohtml_scanner_dounexpprim:n{#1}}%
        }{%
            \token_if_macro:NTF{#1}{\madtohtml_scanner_domacro:n{#1}}{%\def*ed or similar, or active char
            \token_if_eq_catcode:NNTF{#1}{\c_catcode_other_token}{\madtohtml_scanner_dochar:n{#1}}{%
            \token_if_eq_catcode:NNTF{#1}{\c_catcode_letter_token}{\madtohtml_scanner_dochar:n{#1}}{%
            \token_if_space:NTF{#1}{\madtohtml_scanner_dospace:n{#1}}{%
        NONE}}}}}%
    }
}

\cs_new:Nn\madtohtml_scanner_doexpprim:n{\madtohtml_scanner_domacro:n{#1}}
\cs_new:Nn\madtohtml_scanner_dounexpprim:n{\madtohtml_scanner_domacro:n{#1}}
\cs_new:Nn\madtohtml_scanner_domacro:n{\madtohtml_scanner_domacro_a:N}
\cs_new:Nn\madtohtml_scanner_domacro_a:N{\exp_after:wN\madtohtml_scanner_main:#1}
\cs_new:Nn\madtohtml_scanner_dochar:n{\madtohtml_scanner_dochar_a:N}
\cs_new:Nn\madtohtml_scanner_dochar_a:N{#1\madtohtml_scanner_dochar_b:n{#1}\madtohtml_scanner_main:}
\cs_new:Nn\madtohtml_scanner_dochar_b:n{\madtohtml_htmlstring_append:n{#1}}
\cs_new:Nn\madtohtml_scanner_dospace:n{\madtohtml_scanner_dospace_a:N}
\cs_new:Nn\madtohtml_scanner_dospace_a:N{\c_space_tl\madtohtml_scanner_dospace_b:n{\c_space_tl}\madtohtml_scanner_main:#1}
\cs_new:Nn\madtohtml_scanner_dospace_b:n{\madtohtml_htmlstring_append:x{#1}}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%%%%%%%%%%%%%%%%%%%%%%%%%
%% Write html-file %%%%%%
\ExplSyntaxOn
\iow_new:N\madtohtml_writeout_html_iow
\NewDocumentCommand{\@madtohtml@writeout@writehtml}{}{%
    \iow_open:Nn\madtohtml_writeout_html_iow{\c_sys_jobname_str .txt}
    \iow_now:Nx\madtohtml_writeout_html_iow{\str_use:N\g_madtohtml_htmlstring_str}
    \iow_close:N\madtohtml_writeout_html_iow
}
\ExplSyntaxOff
\AfterEndDocument{
    \@madtohtml@writeout@writehtml
}
%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%


\AtBeginDocument{\everypar{\def\par{\MYENDSCAN\@@par}\MYSTARTSCAN}}
\let\@@enddocument\enddocument
\def\enddocument{\MYENDSCAN\def\par{\@@par}\@@enddocument}


\endinput

您可能从我的代码中猜出我的最终目标是生成 html。请注意,我非常了解 LaTeXML 和 Tralics。这更多是为了学习 latex3 编程并了解它能实现什么。

答案1

我会使用 LuaLaTeX 并连接到pre_linebreak_filter。此时所有内容都已展开,但未插入连字符点。然后我遍历当前段落中的节点并将所有字形的 UTF-8 表示写入文件。粘连只是用空格代替。输出写入名为 的文件export.txt

目前它不是递归后代,即嵌套的 hlist 和 vlist 未被处理。我将这留作练习。

\documentclass{article}

\usepackage{luacode}
\begin{luacode*}
local local_par_id = node.id("local_par")
local glue_id = node.id("glue")
local glyph_id = node.id("glyph")

local export_file = io.open("export.txt", "w")
local first_par = true

local function export(head)
    local n = head
    while n do
        if n.id == local_par_id and not first_par then
            export_file:write("\n\n")
        elseif n.id == glyph_id then
            export_file:write(utf8.char(n.char))
        elseif n.id == glue_id then
            export_file:write(" ")
        end
        n = n.next
        first_par = false
    end
end

luatexbase.add_to_callback("pre_linebreak_filter",
                           function(head)
                               export(head)
                               return head
                           end,
                           "export")
luatexbase.add_to_callback("finish_pdffile",
                           function() io.close(export_file) end,
                           "export")
\end{luacode*}

\def\HELLO#1{HELLO #1}

\begin{document}
Lorem ipsum dolor sit amet, consectetur adipiscing elit. \if ab A IS B \else A IS NOT B \fi. Nullam et volutpat nulla. \if cc C IS C \else C IS NOT C \fi. Suspendisse ultrices tortor eu elit hendrerit tristique. \HELLO{WORLD}. Praesent ut viverra mauris.

Lorem ipsum dolor sit amet, consectetur adipiscing elit. \if ab A IS B \else A IS NOT B \fi. Nullam et volutpat nulla. \if cc C IS C \else C IS NOT C \fi. Suspendisse ultrices tortor eu elit hendrerit tristique. \HELLO{WORLD}. Praesent ut viverra mauris.
\end{document}

内容export.txt

Lorem ipsum dolor sit amet, consectetur adipiscing elit. A IS NOT B . Nullam et volutpat nulla.  C IS C . Suspendisse ultrices tortor eu elit hendrerit tristique. HELLO WORLD. Praesent ut viverra mauris. 

Lorem ipsum dolor sit amet, consectetur adipiscing elit. A IS NOT B . Nullam et volutpat nulla.  C IS C . Suspendisse ultrices tortor eu elit hendrerit tristique. HELLO WORLD. Praesent ut viverra mauris. 

相关内容