我正在将 Latex 写入XML
。在编译时,pdflatex
我们使用标记命令生成xml
文件\immediate\write{text}
。但是我怎样才能将普通文本写入XML
文件。有人能用示例解释一下吗?
答案1
我不知道使用 LaTeX 和 pdftex 引擎的可能解决方案,但 ConTeXt MkIV(使用 LuaTeX 引擎)支持用于生成 XML 后端电子书和标记的 PDF。
要从文件获取 XML 输出,您需要添加
\setupbackend[export=yes]
举例来说,考虑一个包含一些图形、数学和列表的简单文件。
\setupbackend[export=yes]
\setuppapersize[A5]
\starttext
\startsection[title={Sample Section}]
\startplacefigure
[location=right, title={A sample figure}]
\externalfigure[cow][width=2cm]
\stopplacefigure
\input knuth
\placeformula[eq:1]
\startformula
E = mc^2
\stopformula
Einstein gave the expression~(\in[eq:1]).
\startitemize[n]
\startitem
First point
\stopitem
\startitem
Second point
\stopitem
\stopitemize
\stopsection
\stoptext
生成以下 PDF 输出
此外,它还生成以下 XML 文件\jobname.export
(请注意,所有结构信息都保留,并且数学导出为 MathML)
<?xml version='1.0' encoding='UTF-8' standalone='yes' ?>
<!-- input filename : test -->
<!-- processing date : Tue Dec 4 00:21:55 2012 -->
<!-- context version : 2012.11.16 23:51 -->
<!-- exporter version : 0.30 -->
<document language="en" file="test" date="Tue Dec 4 00:21:55 2012" context="2012.11.16 23:51" version="0.30" xmlns:m="http://www.w3.org/1998/Math/MathML">
<section detail="section" location='aut:1'>
<sectionnumber>1</sectionnumber>
<sectiontitle>Sample Section</sectiontitle>
<sectioncontent>
<float detail="figure" location='aut:2'>
<floatcontent><image name="cow" id='image-1' width='2.000cm' height='1.455cm'></image></floatcontent>
<floatcaption><floatlabel detail="figure">Figure </floatlabel><floatnumber detail="figure">1</floatnumber> <floattext>A sample figure</floattext></floatcaption>
</float>
Thus, I came to the conclusion that the designer of a new system must not only be the implementer and first large--scale user; the designer should also write the first user manual.
<break/>
The separation of any of these four components would have hurt TEX significantly. If I had not participated fully in all these activities, literally hundreds of improvements would never have been made, because I would never have thought of them or perceived why they were important.
<break/>
But a system cannot be successful if it is too strongly influenced by a single person. Once the initial design is complete and fairly robust, the real test begins as people with many different viewpoints undertake their own experiments.
<formula>
<formulacontent>
<m:math display="block">
<m:mrow>
<m:mi>
答案2
我认为还有另一种方法:LaTeXML:LaTeX 到 XML 转换器。
安装完成后,可以按如下方式进行。
考虑以下 MWE test_xml.tex
:
\documentclass[a4paper,11pt]{article}
\usepackage{graphicx}
\begin{document}
Here is some text that precedes the image.
\begin{figure}
\includegraphics[scale=0.5]{ctan_lion} % http://www.ctan.org/lion.html
\end{figure}
Here is a formula:
\begin{equation}
e=mc^2
\end{equation}
\end{document}
我们有一个外部模块,graphicx
可以直接绑定:参见手册第 5 页(加载绑定)和附录 B。因此我们只需要处理终端:
latexml --preload=graphicx.sty --preload=LaTeX.pool --destination=test_xml.xml test_xml
结果test_xml.xml
是:
<?xml version="1.0" encoding="UTF-8"?>
<?latexml searchpaths=".,//home/claudio/Scrivania/prova/"?>
<?latexml package="graphicx"?>
<?latexml options="a4paper,11pt" class="article"?>
<?latexml package="graphicx"?>
<?latexml RelaxNGSchema="LaTeXML"?>
<document xmlns="http://dlmf.nist.gov/LaTeXML">
<para xml:id="p1">
<p>Here is some text that precedes the image.</p>
</para>
<figure refnum="1" xml:id="S0.F1">
<graphics graphic="ctan_lion" options="scale=0.5"/>
<!-- %http://www.ctan.org/lion.html -->
</figure>
<para xml:id="p2">
<p>Here is a formula:</p>
<equation refnum="1" xml:id="S0.E1">
<Math mode="display" tex="e=mc^{2}" xml:id="S0.E1.m1" text="e = m * c ^ 2">
<XMath>
<XMApp>
<XMTok meaning="equals" role="RELOP">=</XMTok>
<XMTok role="UNKNOWN" font="italic">e</XMTok>
<XMApp>
<XMTok meaning="times" role="MULOP"></XMTok>
<XMTok role="UNKNOWN" font="italic">m</XMTok>
<XMApp>
<XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
<XMTok role="UNKNOWN" font="italic">c</XMTok>
<XMTok meaning="2" role="NUMBER">2</XMTok>
</XMApp>
</XMApp>
</XMApp>
</XMath>
</Math>
</equation>
</para>
</document>
现在,也可以进行一些后处理以获得例如.xhtml
或.html
文件(当然不仅仅是这些文件,请参阅手册以供参考)。
对于.xhtml
文件:
latexmlpost --graphicimages --destination=test_xml.xhtml test_xml
对于.html
文件:
latexmlpost --format=html --graphicimages --destination=test_xml.html test_xml
这些操作将自动转换公式和图像(因为有选项--graphicimages
)。结果将类似于: