这个问题有点长。我希望它不是太具体。
我正在尝试扫描一些 tex 文档,并希望直接生成文本文件。我将通过逐个扫描 tex 输入标记来实现此目的。类别代码为 11 或 12 的字符将写入字符串。宏将展开为不可扩展项,然后继续。
但是,当遇到不可扩展的基元时,我的代码会失败(可能是因为它进入了无限循环,基元会扩展为自身)。我想要做的是将不可扩展的基元连同它们的参数一起传递给 tex 引擎,然后继续扫描。我只是不确定如何实现这一点。一种方法可能是为每个基元定义一个单独的宏,该宏会收集基元及其参数,然后执行基元并继续扫描。但是也许有更简单的解决方案。
第一个示例文档有效并演示了我想要做的事情。
第二个示例由于文档主体中的 \def 原语而失败。
第一个测试文档testdoc.tex:
\documentclass{article}
\usepackage{madtohtml}
\def\HELLO#1{HELLO #1}
\begin{document}
Lorem ipsum dolor sit amet, consectetur adipiscing elit. \if ab A IS B \else A IS NOT B \fi. Nullam et volutpat nulla. \if cc C IS C \else C IS NOT C \fi. Suspendisse ultrices tortor eu elit hendrerit tristique. \HELLO{WORLD}. Praesent ut viverra mauris.
\end{document}
第一个测试文档testdoc.txt的结果:
Lorem ipsum dolor sit amet,consectetur adipiscing elit。 A 不是 B。一切都归于虚无,一切都归于虚无。 C 就是 C。 Suspendisse ultrices tortor eu elit hendrerit tristique。你好世界。 Praesent ut viverra mauris。
第二个测试文档testdoc-2.tex:
\documentclass{article}
\usepackage{madtohtml}
\begin{document}
Suspendisse ultrices tortor eu elit hendrerit tristique. \def\HELLO#1{HELLO #1} \HELLO{WORLD}. Praesent ut viverra mauris.
\end{document}
摘自样式文件 madtohtml.sty:
\ProvidesPackage{madtohtml}
\RequirePackage{xparse}
\RequirePackage{etoolbox}
\ExplSyntaxOn
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Initialise html string and create methods for appending stuff %%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\str_new:N\g_madtohtml_htmlstring_str
\cs_new:Npn\madtohtml_htmlstring_append:n #1 {\str_gput_right:Nn\g_madtohtml_htmlstring_str{#1}}
\cs_generate_variant:Nn\madtohtml_htmlstring_append:n{x}
\NewDocumentCommand{\@madtohtml@htmlstring@append@n}{m}{\madtohtml_htmlstring_append:n{#1}}
\NewDocumentCommand{\@madtohtml@htmlstring@append@x}{m}{\madtohtml_htmlstring_append:x{#1}}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Start and stop scanning of tex input %%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\NewDocumentCommand{\MYSTARTSCAN}{}{\madtohtml_scanner_scan_start:}
\NewDocumentCommand{\MYENDSCAN}{}{ENDENDENDENDENDEND}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Scanner: scan tex input token by token %%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\cs_new:Nn\madtohtml_scanner_scan_start:{\madtohtml_scanner_scan_start_a:\madtohtml_scanner_main:}
\cs_new:Nn\madtohtml_scanner_scan_start_a:{SCAN\c_space_tl STARTS}% Just for seeing if it works
\cs_new:Nn\madtohtml_scanner_scan_stop:{\madtohtml_scanner_scan_stop_a:N}
\cs_new:Nn\madtohtml_scanner_scan_stop_a:N{SCAN\c_space_tl STOPS}% Just for seeing if it works
\cs_new:Nn\madtohtml_scanner_main:{\peek_after:Nw\madtohtml_scanner_main_a:}%check what is next token
\cs_new:Nn\madtohtml_scanner_main_a:{\madtohtml_scanner_tests:n{\l_peek_token}}
\cs_new:Nn\madtohtml_scanner_tests:n{%
\token_if_eq_meaning:NNTF{#1}{\MYENDSCAN}{\madtohtml_scanner_scan_stop:}{%
\token_if_primitive:NTF{#1}{%
\token_if_expandable:NTF{#1}%
{\madtohtml_scanner_doexpprim:n{#1}}%
{\madtohtml_scanner_dounexpprim:n{#1}}%
}{%
\token_if_macro:NTF{#1}{\madtohtml_scanner_domacro:n{#1}}{%\def*ed or similar, or active char
\token_if_eq_catcode:NNTF{#1}{\c_catcode_other_token}{\madtohtml_scanner_dochar:n{#1}}{%
\token_if_eq_catcode:NNTF{#1}{\c_catcode_letter_token}{\madtohtml_scanner_dochar:n{#1}}{%
\token_if_space:NTF{#1}{\madtohtml_scanner_dospace:n{#1}}{%
NONE}}}}}%
}
}
\cs_new:Nn\madtohtml_scanner_doexpprim:n{\madtohtml_scanner_domacro:n{#1}}
\cs_new:Nn\madtohtml_scanner_dounexpprim:n{\madtohtml_scanner_domacro:n{#1}}
\cs_new:Nn\madtohtml_scanner_domacro:n{\madtohtml_scanner_domacro_a:N}
\cs_new:Nn\madtohtml_scanner_domacro_a:N{\exp_after:wN\madtohtml_scanner_main:#1}
\cs_new:Nn\madtohtml_scanner_dochar:n{\madtohtml_scanner_dochar_a:N}
\cs_new:Nn\madtohtml_scanner_dochar_a:N{#1\madtohtml_scanner_dochar_b:n{#1}\madtohtml_scanner_main:}
\cs_new:Nn\madtohtml_scanner_dochar_b:n{\madtohtml_htmlstring_append:n{#1}}
\cs_new:Nn\madtohtml_scanner_dospace:n{\madtohtml_scanner_dospace_a:N}
\cs_new:Nn\madtohtml_scanner_dospace_a:N{\c_space_tl\madtohtml_scanner_dospace_b:n{\c_space_tl}\madtohtml_scanner_main:#1}
\cs_new:Nn\madtohtml_scanner_dospace_b:n{\madtohtml_htmlstring_append:x{#1}}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%
%% Write html-file %%%%%%
\ExplSyntaxOn
\iow_new:N\madtohtml_writeout_html_iow
\NewDocumentCommand{\@madtohtml@writeout@writehtml}{}{%
\iow_open:Nn\madtohtml_writeout_html_iow{\c_sys_jobname_str .txt}
\iow_now:Nx\madtohtml_writeout_html_iow{\str_use:N\g_madtohtml_htmlstring_str}
\iow_close:N\madtohtml_writeout_html_iow
}
\ExplSyntaxOff
\AfterEndDocument{
\@madtohtml@writeout@writehtml
}
%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%
\AtBeginDocument{\everypar{\def\par{\MYENDSCAN\@@par}\MYSTARTSCAN}}
\let\@@enddocument\enddocument
\def\enddocument{\MYENDSCAN\def\par{\@@par}\@@enddocument}
\endinput
您可能从我的代码中猜出我的最终目标是生成 html。请注意,我非常了解 LaTeXML 和 Tralics。这更多是为了学习 latex3 编程并了解它能实现什么。
答案1
我会使用 LuaLaTeX 并连接到pre_linebreak_filter
。此时所有内容都已展开,但未插入连字符点。然后我遍历当前段落中的节点并将所有字形的 UTF-8 表示写入文件。粘连只是用空格代替。输出写入名为 的文件export.txt
。
目前它不是递归后代,即嵌套的 hlist 和 vlist 未被处理。我将这留作练习。
\documentclass{article}
\usepackage{luacode}
\begin{luacode*}
local local_par_id = node.id("local_par")
local glue_id = node.id("glue")
local glyph_id = node.id("glyph")
local export_file = io.open("export.txt", "w")
local first_par = true
local function export(head)
local n = head
while n do
if n.id == local_par_id and not first_par then
export_file:write("\n\n")
elseif n.id == glyph_id then
export_file:write(utf8.char(n.char))
elseif n.id == glue_id then
export_file:write(" ")
end
n = n.next
first_par = false
end
end
luatexbase.add_to_callback("pre_linebreak_filter",
function(head)
export(head)
return head
end,
"export")
luatexbase.add_to_callback("finish_pdffile",
function() io.close(export_file) end,
"export")
\end{luacode*}
\def\HELLO#1{HELLO #1}
\begin{document}
Lorem ipsum dolor sit amet, consectetur adipiscing elit. \if ab A IS B \else A IS NOT B \fi. Nullam et volutpat nulla. \if cc C IS C \else C IS NOT C \fi. Suspendisse ultrices tortor eu elit hendrerit tristique. \HELLO{WORLD}. Praesent ut viverra mauris.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. \if ab A IS B \else A IS NOT B \fi. Nullam et volutpat nulla. \if cc C IS C \else C IS NOT C \fi. Suspendisse ultrices tortor eu elit hendrerit tristique. \HELLO{WORLD}. Praesent ut viverra mauris.
\end{document}
内容export.txt
:
Lorem ipsum dolor sit amet, consectetur adipiscing elit. A IS NOT B . Nullam et volutpat nulla. C IS C . Suspendisse ultrices tortor eu elit hendrerit tristique. HELLO WORLD. Praesent ut viverra mauris.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. A IS NOT B . Nullam et volutpat nulla. C IS C . Suspendisse ultrices tortor eu elit hendrerit tristique. HELLO WORLD. Praesent ut viverra mauris.