使用 Make4ht 将书籍 LaTeX 转换为 JATS XML

使用 Make4ht 将书籍 LaTeX 转换为 JATS XML

如何转换LaTeXJATS XML使用Make4HT?我有使用PdfLaTeX。我有使用TeXLive 2022Windows 10

在命令/终端中我使用了:

make4ht -a debug -u Sample.TeX

我的 MWE 是:

\providecommand{\pgfsyspdfmark}[3]{}
\documentclass{acm-book}

\usepackage{showframe}
\usepackage{balance}
\usepackage{amsmath}
\usepackage{booktabs,hyperref,listings,xcolor,colortbl}
\usepackage[inactive]{fancytooltips}
\usepackage{wrapfig}
\usepackage{afterpage}
\usepackage{makeidx}

\hypersetup{
pdftitle={A Technical History},
pdfauthor={W. Trcy -- Rose-Hulman, IN, USA},
pdfkeywords={Morgan \& Claypool},%
}

\newcommand\BookSeries[1]{#1}
\newcommand\BookAffil[1]{#1}
\newcommand\HalfTitle[1]{#1}
\newcommand\Author[1]{#1}
\newcommand\Affiliation[1]{#1}
\definecolor{titlecolor}{cmyk}{0, 0.7808, 0.4429, 0.1412}


\begin{document}
\frontmatter

\BookSeries{Embracing Interference Systems}

\BookAffil{Gollakota Shyamnath, \textit{University of Washington}\\
2014}

\title{Software}

\HalfTitle{A Technical History}

\Author{\hyperref[KWT]{\textbf{Kim W. Tracy}}\\[2pt]

\Affiliation{Rose-Hulman Institute of Technology, IN, USA}}

\maketitle

\mainmatter
\chapter*{Preface}
\addtocontents{toc}{\protect\contentsline {chapter}{\color{titlecolor}Preface}{\bfseries\thepage}{page.\thepage}}
Software professionals and students are focused on creating \textit{new} technologies involving software. As a result, many may view software history as not directly relevant to their work or studies.

\markboth{Preface}{Preface}

Furthermore, legacy software systems are notoriously difficult to replace. As noted in \hyperlink{Charette:2020}{Charette} [\hyperlink{Charette:2020}{2020}] and as experienced by this author as a chief information officer, legacy systems take considerable effort and money to replace and tend to be built upon, rather than replaced. So, those working on systems for complex organizations are likely to have to deal with these existing software systems. \hyperlink{Charette:2020}{Charette} [\hyperlink{Charette:2020}{2020}] also cites examples such as the US Social Security Administration still dependencies on legacy software further entrenches its use. Other systems used by the US government have software sub-systems [\hyperlink{Charette:2020}{Charette 2020}].

\begin{quote}
But we [historians] remain largely ignorant about the origins and development of the dynamic processes running on those devices [computers], but primarily they will be histories of software.
\end{quote}

In the last couple of decades, software has gotten attention as a distinct topic from computer history. In particular there are wide-scoping works on the software industry (such as \hyperlink{CampbellKelly:2003}{CampbellKelly} [\hyperlink{CampbellKelly:2003}{2003}] and \hyperlink{Cortada:2012}{Cortada} [\hyperlink{Cortada:2012}{2012}]) (such as \hyperlink{Ensmenger:2010}{Ensmenger} [\hyperlink{Ensmenger:2010}{2010}]). There's also been work on the evolved (such as \hyperlink{Mahoney:2011}{Mahoney} [\hyperlink{Mahoney:2011}{2011}].

\section*{Use of the Book}\pdfbookmark[1]{Use of the Book}{Preface:UseoftheBook}
\addtocontents{toc}{\protect\contentsline {section}{\hskip28.5pt\noindent{Use of the Book}}{\rmfamily\bfseries\thepage}{page.\thepage}}
These two chapters introduce the overall issues and ways that the history of software is approached in this book. Chapters~\hyperref[chap:3]{3} to \hyperref[chap:8]{8} are meant to stand by themselves, with Chapters \hyperref[chap:3]{3} to \hyperref[chap:7]{7} covering software topics that are foundational in nature, generally closer to the system level. It is anticipated that later editions or volumes will add chapters related to higher-level software history such as artificial intelligence, graphics, security, enterprise applications, among others. Chapter~\hyperref[chap:8]{8} summarizes the lessons learned from earlier chapters and is intended to solidify the goals of the course.

Additional resources are available online at \href{http://software-history.net/}{software-history.net}.

\section*{Acknowledgments}\pdfbookmark[1]{Acknowledgments}{preface:Acknowledgments}
\addtocontents{toc}{\protect\contentsline {section}{\hskip28.5pt\noindent{Acknowledgments}}{\rmfamily\bfseries\thepage}{page.\thepage}}
Undoubtedly, this work does not contain every important detail about software or its development. The intent is to cover the most important details for students of software technology. Certainly, entire books could be written on each of the chapters included here or even on single topics, and some have been written.

\end{document}

使用时显示undefined control sequence错误make4ht。但运行时pdflatex没有显示任何错误。如何纠正这个问题?

答案1

您可以使用命令获取 JATS 输出make4ht -f jats。TeX4ht 中的 JATS 支持非常基础,但我已对其进行了更新,至少涵盖了章节、数学、脚注等。大多数更新应该已经在 TeX Live 中,或者您可以使用这个版本的jats.4ht。它太大了,无法在这里发布。

您使用一些自定义命令来获取文档元数据,因此您需要使用配置文件为这些命令提供一些标记:

\Preamble{xhtml}

% I don't know about an element for Book Series
\renewcommand\BookSeries[1]{}
\renewcommand\BookAffil[1]{}
\renewcommand\HalfTitle[1]{\ifvmode\IgnorePar\fi\EndP\HCode{<subtitle>}#1\HCode{</subtitle>}}
\renewcommand\Author[1]{\bgroup\HtmlParOff\renewcommand\hyperref[2][]{##2}\HCode{<contrib contrib-type="author"><name><string-name>}#1\HCode{</contrib>}\HtmlParOn\egroup}
\renewcommand\Affiliation[1]{\HCode{</string-name></name><aff>}#1\HCode{</aff>}}

\begin{document}
\EndPreamble

JATS 对文档元数据结构也有相当严格的规定,所以我们需要使用make4ht构建文件将元素移动到正确的位置:

local domfilter = require "make4ht-domfilter"


-- some elements need to be moved from the document flow to the document meta
local article_meta 
local elements_to_move_to_meta = {}
local function move_to_meta(el)
  -- we don't move elements immediatelly, because it would prevent them from further 
  -- processing in the filter. so we save them in an array, and move them once 
  -- the full DOM was processed
  table.insert(elements_to_move_to_meta, el)
end

local elements_to_move_to_title = {}
local function move_to_title_group(el)
  -- there can be only one title and subtitle
  local name = el:get_element_name()
  if not elements_to_move_to_title[name] then
    elements_to_move_to_title[name] = el
  end
end

local elements_to_move_to_contribs = {}
local function move_to_contribs(el)
  table.insert(elements_to_move_to_contribs, el)
end



local function process_moves()
  if article_meta then
    if elements_to_move_to_title["article-title"] then
      local title_group = article_meta:create_element("title_group")
      for _, name in ipairs{ "article-title", "subtitle" } do
        local v = elements_to_move_to_title[name] 
        if v then
          title_group:add_child_node(v:copy_node())
          v:remove_node()
        end
      end
      article_meta:add_child_node(title_group)
    end
    if #elements_to_move_to_contribs > 0 then
      local contrib_group = article_meta:create_element("contrib-group")
      for _, el in ipairs(elements_to_move_to_contribs) do
        contrib_group:add_child_node(el:copy_node())
        el:remove_node()
      end
      article_meta:add_child_node(contrib_group)
    end
    for _, el in ipairs(elements_to_move_to_meta) do
      -- move elemnt's copy, and remove the original
      article_meta:add_child_node(el:copy_node())
      el:remove_node()
    end
  end
end

local function has_no_text(el)
  -- detect if element contains only whitespace
  return el:get_text():match("^%s*$")
end

local function is_xref_id(el)
  return el:get_element_name() == "xref" and el:get_attribute("id") and el:get_attribute("rid") == nil and has_no_text(el)
end
-- set id to parent element for <xref> that contain only id
local function xref_to_id(el)
  local parent = el:get_parent()
  -- set id only if it doesn't exist yet
  if parent:get_attribute("id") == nil then
    parent:set_attribute("id", el:get_attribute("id"))
    el:remove_node()
  end
end

local function make_text(el)
  local text = el:get_text():gsub("^%s*", ""):gsub("%s*$", "")
  local text_el = el:create_text_node(text)
  el._children = {text_el}
end

local function is_empty_par(el)
  return el:get_element_name() == "p" and has_no_text(el)
end



local process =  domfilter {
  function(dom)
    dom:traverse_elements(function(el)
      -- some elements need special treatment
      local el_name = el:get_element_name()
      if is_xref_id(el) then
        xref_to_id(el)
      elseif el_name == "article-meta" then
        -- save article-meta element for further processig
        article_meta = el
      elseif el_name == "article-title" then
        move_to_title_group(el)
      elseif el_name == "subtitle" then
        move_to_title_group(el)
      elseif el_name == "abstract" then
        move_to_meta(el)
      elseif el_name == "string-name" then
        make_text(el)
      elseif el_name == "contrib" then
        move_to_contribs(el)
      elseif is_empty_par(el) then
        -- remove empty paragraphs
        el:remove_node()
      elseif el_name == "div" and el:get_attribute("class") == "maketitle" then
        el:remove_node()
      end

    end)
    -- move elements that are marked for move
    process_moves()
    return dom
  end, "joincharacters"
}

filter_settings("joincharacters", {charclasses = {italic=true, bold=true}})

Make:match("xml$", process)

编译使用:

make4ht -c config.cfg -e build.lua -f jats -a debug Sample.tex

结果如下:

<?xml version='1.0' encoding='UTF-8' ?> 
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.3 20130915//EN" "http://jats.nlm.nih.gov/archiving/1.3/JATS-archivearticle1-mathml3.dtd"> 
<article dtd-version='1.3' xml:lang='en-US' xmlns:mml='http://www.w3.org/1998/Math/MathML' xmlns:xlink='http://www.w3.org/1999/xlink'> 
<front>  <article-meta><title_group><article-title></article-title><subtitle>A Technical History</subtitle></title_group><contrib-group><contrib contrib-type='author'><name><string-name>Kim W. Tracy</string-name></name><aff>Rose-Hulman Institute of Technology, IN, USA</aff></contrib></contrib-group></article-meta></front><body>
                                                                                                                                                            

                                                                                                                                                            
                                                                                                                                                            

                                                                                                                                                            
         
         
         
                                                                                                                                   

                                                                                                                                   
                                                                                                                                   

                                                                                                                                   
                                                                                                                                   

                                                                                                                                   
       <!-- l. 46 -->
         <sec> 
<title id='x1-1000'> Preface</title>
       <!-- l. 48 --><p>     Software professionals and students are focused on creating <italic>new</italic> technologies involving software.
       As a result, many may view software history as not directly relevant to their work or
       studies.
       </p><!-- l. 52 --><p>  Furthermore, legacy software systems are notoriously difficult to replace. As noted in
       <xref rid='Charette:2020'>Charette</xref> [<xref rid='Charette:2020'>2020</xref>] and as experienced by this author as a chief information officer, legacy
       systems take considerable effort and money to replace and tend to be built upon, rather than
       replaced. So, those working on systems for complex organizations are likely to have to deal
       with these existing software systems. <xref rid='Charette:2020'>Charette</xref> [<xref rid='Charette:2020'>2020</xref>] also cites examples such as the US
       Social Security Administration still dependencies on legacy software further entrenches its
       use. Other systems used by the US government have software sub-systems [<xref rid='Charette:2020'>Charette
       2020</xref>].
           </p><disp-quote>
           <!-- l. 55 --><p>But we [historians] remain largely ignorant about the origins and development of
           the dynamic processes running on those devices [computers], but primarily they
           will be histories of software.</p></disp-quote>
       <!-- l. 58 --><p>     In the last couple of decades, software has gotten attention as a distinct topic from computer
       history. In particular there are wide-scoping works on the software industry (such as <xref rid='CampbellKelly:2003'>CampbellKelly</xref>
       [<xref rid='CampbellKelly:2003'>2003</xref>] and <xref rid='Cortada:2012'>Cortada</xref> [<xref rid='Cortada:2012'>2012</xref>]) (such as <xref rid='Ensmenger:2010'>Ensmenger</xref> [<xref rid='Ensmenger:2010'>2010</xref>]). There’s also been work on the evolved
       (such as <xref rid='Mahoney:2011'>Mahoney</xref> [<xref rid='Mahoney:2011'>2011</xref>].
       </p>
         <sec> 
<title id='x1-2000'> Use of the Book</title>
       <!-- l. 62 --><p>     These two chapters introduce the overall issues and ways that the history of software is
       approached in this book. Chapters <italic>??</italic>  to <italic>??</italic>  are meant to stand by themselves, with Chapters <italic>??</italic>  to
       <italic>??</italic> covering software topics that are foundational in nature, generally closer to the system level. It is
       anticipated that later editions or volumes will add chapters related to higher-level software history
       such as artificial intelligence, graphics, security, enterprise applications, among others. Chapter <italic>??</italic>
       summarizes the lessons learned from earlier chapters and is intended to solidify the goals of the
       course.
       </p><!-- l. 64 --><p>  Additional resources are available online at <xref rid='http://software-history.net/'>software-history.net</xref>.
                                                                                                                                                            

                                                                                                                                                            
       </p><!-- l. 66 -->
         </sec> 
<sec> 
<title id='x1-3000'> Acknowledgments</title>
       <!-- l. 68 --><p id='x1-3000doc'>     Undoubtedly, this work does not contain every important detail about software or its development.
       The intent is to cover the most important details for students of software technology. Certainly, entire
       books could be written on each of the chapters included here or even on single topics, and some have
       been written.
       
       </p>
         </sec> 
</sec> 
 
</body> 
</article>

相关内容