使用 pdflatex 将 Latex 转换为 XML

使用 pdflatex 将 Latex 转换为 XML

我正在将 Latex 写入XML。在编译时,pdflatex我们使用标记命令生成xml文件\immediate\write{text}。但是我怎样才能将普通文本写入XML文件。有人能用示例解释一下吗?


我不知道使用 LaTeX 和 pdftex 引擎的可能解决方案,但 ConTeXt MkIV(使用 LuaTeX 引擎)支持用于生成 XML 后端电子书标记的 PDF

要从文件获取 XML 输出,您需要添加




\startsection[title={Sample Section}]

      [location=right, title={A sample figure}]

  \input knuth

    E = mc^2 

  Einstein gave the expression~(\in[eq:1]).

      First point

      Second point


生成以下 PDF 输出


此外,它还生成以下 XML 文件\jobname.export(请注意,所有结构信息都保留,并且数学导出为 MathML)

<?xml version='1.0' encoding='UTF-8' standalone='yes' ?>

<!-- input filename   : test              -->
<!-- processing date  : Tue Dec  4 00:21:55 2012 -->
<!-- context version  : 2012.11.16 23:51  -->
<!-- exporter version : 0.30              -->

<document language="en" file="test" date="Tue Dec  4 00:21:55 2012" context="2012.11.16 23:51" version="0.30" xmlns:m="http://www.w3.org/1998/Math/MathML">
  <section detail="section" location='aut:1'>
    <sectiontitle>Sample Section</sectiontitle> 
      <float detail="figure" location='aut:2'>
        <floatcontent><image name="cow" id='image-1' width='2.000cm' height='1.455cm'></image></floatcontent>
        <floatcaption><floatlabel detail="figure">Figure </floatlabel><floatnumber detail="figure">1</floatnumber> <floattext>A sample figure</floattext></floatcaption>
Thus, I came to the conclusion that the designer of a new system must not only be the implementer and first large--scale user; the designer should also write the first user manual.
The separation of any of these four components would have hurt TEX significantly. If I had not participated fully in all these activities, literally hundreds of improvements would never have been made, because I would never have thought of them or perceived why they were important.
But a system cannot be successful if it is too strongly influenced by a single person. Once the initial design is complete and fairly robust, the real test begins as people with many different viewpoints undertake their own experiments.
          <m:math display="block">


我认为还有另一种方法:LaTeXML:LaTeX 到 XML 转换器


考虑以下 MWE test_xml.tex


Here is some text that precedes the image.
\includegraphics[scale=0.5]{ctan_lion} % http://www.ctan.org/lion.html

Here is a formula:


我们有一个外部模块,graphicx可以直接绑定:参见手册第 5 页(加载绑定)和附录 B。因此我们只需要处理终端:

latexml --preload=graphicx.sty --preload=LaTeX.pool --destination=test_xml.xml test_xml


<?xml version="1.0" encoding="UTF-8"?>
<?latexml searchpaths=".,//home/claudio/Scrivania/prova/"?>
<?latexml package="graphicx"?>
<?latexml options="a4paper,11pt" class="article"?>
<?latexml package="graphicx"?>
<?latexml RelaxNGSchema="LaTeXML"?>
<document xmlns="http://dlmf.nist.gov/LaTeXML">
  <para xml:id="p1">
    <p>Here is some text that precedes the image.</p>
  <figure refnum="1" xml:id="S0.F1">
    <graphics graphic="ctan_lion" options="scale=0.5"/>
    <!-- %http://www.ctan.org/lion.html -->
  <para xml:id="p2">
    <p>Here is a formula:</p>
    <equation refnum="1" xml:id="S0.E1">
      <Math mode="display" tex="e=mc^{2}" xml:id="S0.E1.m1" text="e = m * c ^ 2">
            <XMTok meaning="equals" role="RELOP">=</XMTok>
            <XMTok role="UNKNOWN" font="italic">e</XMTok>
              <XMTok meaning="times" role="MULOP">⁢</XMTok>
              <XMTok role="UNKNOWN" font="italic">m</XMTok>
                <XMTok role="SUPERSCRIPTOP" scriptpos="post2"/>
                <XMTok role="UNKNOWN" font="italic">c</XMTok>
                <XMTok meaning="2" role="NUMBER">2</XMTok>



latexmlpost --graphicimages --destination=test_xml.xhtml test_xml


latexmlpost --format=html --graphicimages --destination=test_xml.html test_xml


