我想将下面的文件转换为MathML
需要语义标签LaTeX
编码的格式MathML
。
梅威瑟:
\documentclass{article}
\usepackage[T1]{fontenc}
\begin{document}
\article{Article Title Here}
\author{Author Name Here}
\maketitle
\section{Introduction}
This is the sample paragraph.
\begin{equation}\label{eq1-11}
T\,^{\prime}_{\mu \nu} = \left( \frac{\partial \xi^\alpha} {\partial\xi^{\prime\mu}}\right) \left( \frac{\partial \xi^\beta}{\partial \xi^{\prime\nu}} \right) T_{\alpha \beta}
\end{equation}
Please refer the equations \ref{eq1-11} for the further testing.
\end{document}
答案1
有几种可能的方法可以实现这一点:
- 配置 TeX4ht 来捕获所有数学内容并对其进行两次排版 - 一次使用 MathML,第二次作为逐字文本。
- 解析 MathML 内容并将其转换回 LaTeX 代码
- 预处理输入的 TeX 文件,并对其进行修改,使其更容易处理
第一种方法可以重用我们在 TeX4ht 中用于 MathJax 选项的代码,mathjax-latex-4ht.4ht
有关详细信息,请参阅文件。
第二种方法不会产生与原始输入相同的 LaTeX 代码。这对你来说可能是一个问题。LuaXML 可以使用用于转换。
我将在我的回答中介绍第三种方法。它由两个组件组成 - 输入过滤器,用于解析输入的 LaTeX 文件中的数学内容并使用一些附加宏对其进行标记;以及 make4ht DOM 过滤器,用于修改生成的 HTML 文件以生成正确的 MathML 结构。
这是输入过滤器。它从标准输入读取输入并打印修改后的输出。
文件altmath.lua
:
-- insert envrionmnets that should be handled by the script here
local math_environments = {
equation = true,
displaymath = true,
["equation*"] = true,
}
-- macros that will be inserted to the updated document
local macros = [[
\NewDocumentCommand\inlinemath {mv} {\HCode{<span class="inlinemath">}#1\HCode{<span class="alt">}\NoFonts #2\EndNoFonts\HCode{</span></span>}}
\NewDocumentEnvironment{altdisplaymath}{} {\ifvmode\IgnorePar\fi\EndP\HCode{<div class="altmath">}} {\ifvmode\IgnorePar\fi\EndP\HCode{</div>}}
]]
-- we will insert macros before the second control sequence (we assume that first is \documentclass
local cs_counter = 0
-- we will hanlde inline and diplay math differently
local inline = 1
local display = 2
local function handle_math(input, nexts, stop, buffer, mathtype)
local content = input:sub(nexts, stop)
local format = "\\inlinemath{%s}{%s}" -- format used to insert math content back to the doc
-- set format for display math
if mathtype == display then
format = [[
\begin{altdisplaymath}
%s
\begin{verbatim}
%s
\end{verbatim}
\end{altdisplaymath}
]]
end
buffer[#buffer + 1] = string.format(format, content, content )
end
local function find_next(input, start, buffer)
-- find next cs or math start
local nexts, stop = input:find("[$\\]", start)
local mathtype
if nexts then
-- save current text chunk from the input buffer
buffer[#buffer+1] = input:sub(start, nexts - 1)
local kind, nextc = input:match("(.)(.)", nexts)
if kind == "\\" then -- handle cs
-- insert our custom TeX macros before second control sequence
cs_counter = cs_counter + 1
if cs_counter == 2 then
buffer[#buffer+1] = macros
end
if nextc == "(" then -- inline math
_, stop = input:find("\\)", nexts)
mathtype = inline
elseif nextc == "[" then -- display math
_, stop = input:find("\\]", nexts)
mathtype = display
else -- maybe environment?
-- find environment name
local env_name = input:match("^begin%s*{(.-)}", nexts+1)
-- it must be enabled as math environment
if env_name and math_environments[env_name] then
_, stop = input:find("\\end%s*{" .. env_name .. "}", nexts)
mathtype = display
else -- not math environment
buffer[#buffer+1] = "\\" -- save backspace that was eaten by the processor
return stop + 1 -- return back to the main loop
end
end
else -- handle $
if nextc == "$" then -- display math
_, stop = input:find("%$%$", nexts + 1)
mathtype = display
else -- inline math
_, stop = input:find("%$", nexts + 1)
mathtype = inline
end
end
if not stop then -- something failed, move one char next
return nexts + 1
end
-- save math content to the buffer
handle_math(input, nexts, stop, buffer, mathtype)
else
-- if we cannot find any more cs or math, we need to insert rest of the input
-- to the output buffer
buffer[#buffer+1] = input:sub(start, string.len(input))
return nil
end
return stop + 1
end
-- process the input buffer, detect inline and display math and also math environments
local function process(input)
local buffer = {} -- buffer where text chunks are stored
local start = 1
start = find_next(input, start,buffer)
while start do
start = find_next(input, start, buffer)
end
return table.concat(buffer) -- convert output buffer to string
end
local content = io.read("*all")
print(process(content))
您可以使用以下命令进行测试:
texlua altmath.lua < sample.tex
这是原始 TeX 文件的修改版本:
\documentclass{article}
\NewDocumentCommand\inlinemath {mv} {\HCode{<span class="inlinemath">}#1\HCode{<span class="alt">}\NoFonts #2\EndNoFonts\HCode{</span></span>}}
\NewDocumentEnvironment{altdisplaymath}{} {\ifvmode\IgnorePar\fi\EndP\HCode{<div class="altmath">}} {\ifvmode\IgnorePar\fi\EndP\HCode{</div>}}
\usepackage[T1]{fontenc}
\begin{document}
\title{Article Title Here}
\author{Author Name Here}
\maketitle
\section{Introduction}
This is the sample paragraph with \inlinemath{$a=b^2$}{$a=b^2$} inline math. Different \inlinemath{\(a=c^2\)}{\(a=c^2\)} type of math.
\begin{altdisplaymath}
\begin{equation}\label{eq1-11}
T\,^{\prime}_{\mu \nu} = \left( \frac{\partial \xi^\alpha} {\partial\xi^{\prime\mu}}\right) \left( \frac{\partial \xi^\beta}{\partial \xi^{\prime\nu}} \right) T_{\alpha \beta}
\end{equation}
\begin{verbatim}
\begin{equation}\label{eq1-11}
T\,^{\prime}_{\mu \nu} = \left( \frac{\partial \xi^\alpha} {\partial\xi^{\prime\mu}}\right) \left( \frac{\partial \xi^\beta}{\partial \xi^{\prime\nu}} \right) T_{\alpha \beta}
\end{equation}
\end{verbatim}
\end{altdisplaymath}
Please refer the equations \ref{eq1-11} for the further testing.
\end{document}
您可以看到它在\documentclass
命令后插入了宏定义。它定义了\inlinemath
命令和altdisplaymath
环境。这些定义包含将 HTML 标签直接插入转换后文件的代码。它们仅供 TeX4ht 使用。
您可以使用以下方式将文件转换为 HTML
texlua altmath.lua < sample.tex | make4ht -j sample - "mathml"
它产生以下代码:
<span class='inlinemath'><!-- l. 14 --><math xmlns='http://www.w3.org/1998/Math/MathML' display='inline'><mi>a</mi> <mo class='MathClass-rel'>=</mo> <msup><mrow><mi>b</mi></mrow><mrow><mn>2</mn></mrow></msup></math><span class='alt'>$a=b^2$</span></span>
或者
<div class='altmath'> <!-- tex4ht:inline --><table class='equation'><tr><td>
<!-- l. 16 --><math xmlns='http://www.w3.org/1998/Math/MathML' display='block' class='equation'>
<mstyle class='label' id='x1-1001r1'></mstyle><!-- endlabel --><mi>T</mi><msubsup><mrow><mspace width='0.17em' class='thinspace'></mspace></mrow><mrow><mi mathvariant='italic'>μν</mi></mrow><mrow><mi>′</mi></mrow></msubsup> <mo class='MathClass-rel'>=</mo> <mrow><mo form='prefix' fence='true'> (</mo><mrow> <mfrac><mrow><mi>∂</mi><msup><mrow><mi>ξ</mi></mrow><mrow><mi>α</mi></mrow></msup></mrow>
<mrow><mi>∂</mi><msup><mrow><mi>ξ</mi></mrow><mrow><mi mathvariant='italic'>′μ</mi></mrow></msup></mrow></mfrac> </mrow><mo form='postfix' fence='true'>)</mo></mrow> <mrow><mo form='prefix' fence='true'> (</mo><mrow> <mfrac><mrow><mi>∂</mi><msup><mrow><mi>ξ</mi></mrow><mrow><mi>β</mi></mrow></msup></mrow>
<mrow><mi>∂</mi><msup><mrow><mi>ξ</mi></mrow><mrow><mi mathvariant='italic'>′ν</mi></mrow></msup></mrow></mfrac> </mrow><mo form='postfix' fence='true'>)</mo></mrow> <msub><mrow><mi>T</mi></mrow><mrow><mi mathvariant='italic'>αβ</mi></mrow></msub>
</math></td><td class='eq-no'>(1)</td></tr></table>
<!-- l. 18 --><p class='nopar'>
</p>
<pre id='verbatim-1' class='verbatim'>
\begin{equation}\label{eq1-11}
T\,^{\prime}_{\mu \nu} = \left( \frac{\partial \xi^\alpha} {\partial\xi^{\prime\mu}}\right) \left( \frac{\partial \xi^\beta}{\partial \xi^{\prime\nu}} \right) T_{\alpha \beta}
\end{equation}
</pre>
<!-- l. 23 --><p class='nopar'> </p></div>
我们需要使用 make4ht DOM 过滤器来创建正确的 MathML 结构。将以下文件保存为build.lua
:
local domfilter = require "make4ht-domfilter"
-- find mathml and insert TeX as an alternative annotation
local function update_mathml(element, class)
local alt_element_t = element:query_selector(class)
if not alt_element_t and not alt_element_t[1] then return nil end
-- save alt element contents and remove it from the document
local alt_contents = alt_element_t[1]:get_children()
alt_element_t[1]:remove_node()
-- create a new structure of the mathml element ->
-- mathml
-- semantics
-- mrow -> math content
-- annotation -> saved TeX
local mathml = element:query_selector("math")[1]
local mathml_contents = mathml:get_children()
local semantics = mathml:create_element("semantics")
local mrow = semantics:create_element("mrow")
mrow._children = mathml_contents -- this trick places saved original mathml content into a new <mrow>
semantics:add_child_node(mrow)
local annotation = semantics:create_element("annotation", {encoding="application/x-tex"})
annotation._children = alt_contents
semantics:add_child_node(annotation)
mathml._children = {semantics}
end
local process = domfilter {
function(dom)
for _, inline in ipairs(dom:query_selector(".inlinemath")) do
update_mathml(inline, ".alt")
end
for _, display in ipairs(dom:query_selector(".altmath")) do
update_mathml(display, ".verbatim")
end
return dom
end
}
它解析我们自定义的 HTML 文件<span>
和<div>
元素,获取替代文本并将其作为 MathML 代码的 '` 元素插入。
结果如下:
<h3 class='sectionHead'><span class='titlemark'>1 </span> <a id='x1-10001'></a>Introduction</h3>
<!-- l. 14 --><p class='noindent'>This is the sample paragraph with
<span class='inlinemath'><!-- l. 14 --><math display='inline' xmlns='http://www.w3.org/1998/Math/MathML'><semantics><mrow><mi>a</mi> <mo class='MathClass-rel'>=</mo> <msup><mrow><mi>b</mi></mrow><mrow><mn>2</mn></mrow></msup></mrow><annotation encoding='application/x-tex'>$a=b^2$</annotation></semantics></math></span> inline math.
Different <span class='inlinemath'><!-- l. 14 --><math display='inline' xmlns='http://www.w3.org/1998/Math/MathML'><semantics><mrow><mrow><mi>a</mi> <mo class='MathClass-rel'>=</mo> <msup><mrow><mi>c</mi></mrow><mrow><mn>2</mn></mrow></msup></mrow></mrow><annotation encoding='application/x-tex'>\(a=c^2\)</annotation></semantics></math></span>
type of math. </p><div class='altmath'> <!-- tex4ht:inline --><table class='equation'><tr><td>
<!-- l. 16 --><math class='equation' xmlns='http://www.w3.org/1998/Math/MathML' display='block'><semantics><mrow>
<mstyle id='x1-1001r1' class='label'></mstyle><!-- endlabel --><mi>T</mi><msubsup><mrow><mspace width='0.17em' class='thinspace'></mspace></mrow><mrow><mi mathvariant='italic'>μν</mi></mrow><mrow><mi>′</mi></mrow></msubsup> <mo class='MathClass-rel'>=</mo> <mrow><mo fence='true' form='prefix'> (</mo><mrow> <mfrac><mrow><mi>∂</mi><msup><mrow><mi>ξ</mi></mrow><mrow><mi>α</mi></mrow></msup></mrow>
<mrow><mi>∂</mi><msup><mrow><mi>ξ</mi></mrow><mrow><mi mathvariant='italic'>′μ</mi></mrow></msup></mrow></mfrac> </mrow><mo fence='true' form='postfix'>)</mo></mrow> <mrow><mo fence='true' form='prefix'> (</mo><mrow> <mfrac><mrow><mi>∂</mi><msup><mrow><mi>ξ</mi></mrow><mrow><mi>β</mi></mrow></msup></mrow>
<mrow><mi>∂</mi><msup><mrow><mi>ξ</mi></mrow><mrow><mi mathvariant='italic'>′ν</mi></mrow></msup></mrow></mfrac> </mrow><mo fence='true' form='postfix'>)</mo></mrow> <msub><mrow><mi>T</mi></mrow><mrow><mi mathvariant='italic'>αβ</mi></mrow></msub>
</mrow><annotation encoding='application/x-tex'>
\begin{equation}\label{eq1-11}
T\,^{\prime}_{\mu \nu} = \left( \frac{\partial \xi^\alpha} {\partial\xi^{\prime\mu}}\right) \left( \frac{\partial \xi^\beta}{\partial \xi^{\prime\nu}} \right) T_{\alpha \beta}
\end{equation}
</annotation></semantics></math></td><td class='eq-no'>(1)</td></tr></table>
<!-- l. 18 --><p class='nopar'>
</p>
<!-- l. 23 --><p class='nopar'> </p></div>
答案2
由于MWE
存在许多 LaTeX 编码错误,我已修复,修改的标签如下:
\documentclass{article}
\usepackage[T1]{fontenc}
\begin{document}
\title{Article Title Here}
\author{Author Name Here}
\maketitle
\section{Introduction}
This is the sample paragraph.
\begin{equation}\label{eq1-11}
T\,^{\prime}_{\mu \nu} = \left( \frac{\partial \xi^{\alpha}}
{\partial\xi^{\prime\mu}}\right) \left( \frac{\partial \xi^{\beta}}{\partial \xi^{\prime\nu}} \right) T_{\alpha \beta}
\end{equation}
Please refer the equations \ref{eq1-11} for the further testing.
\end{document}
纠正错误后,我运行了命令
htlatex test "xhtml,mathml,mathml-" " -cunihft" "-cvalidate -p"
它转换得很好...
编辑
如果您需要显示LaTeX
转换后的标签HTML
,请使用以下.cfg
文件:
转换配置文件
\RequirePackage{verbatim,etoolbox}
\Preamble{xhtml}
\def\AltMathOne#1${\HCode{\detokenize{\(#1\)}}$}
\Configure{$}{}{}{\expandafter\AltMathOne}
\def\AltlMath#1\){\HCode{\detokenize{\(#1\)}}\)}
\Configure{()}{\AltlMath}{}
\def\AltlDisplay#1\]{\HCode{\detokenize{\[#1\]}}\]}
\Configure{[]}{\AltlDisplay}{}
\def\AltDisplayOne#1#2$${#1\HCode{\detokenize{$$#2$$}}$$}
\Configure{$$}{}{}{\AltDisplayOne}{}{}
\newcommand\VerbMath[1]{%
\ifcsdef{#1}{%
\renewenvironment{#1}{%
\NoFonts%
\Configure{verbatim}{}{} % suppress <br /> tags
\texttt{\string\begin\{#1\}}\HCode{\Hnewline}% we need to use \texttt to get all characters right
\verbatim}{\endverbatim\texttt{\string\end\{#1\}}\EndNoFonts}%
}{}%
}
\VerbMath{align}
\VerbMath{equation}
\VerbMath{equation*}
\begin{document}
\EndPreamble
然后运行命令:
htlatex sample "conversion" " " "-cvalidate -p"