使用 tex4ebook 转换为 ePUB

使用 tex4ebook 转换为 ePUB

我正在尝试使用 tex4ebook 将我的文档转换为 ePUB。在这篇双语文章中,免费字体卡尔普鲁什用来。

这是 MWE file.tex

% !TEX program = xelatex
% !BIB program = biblatex
\documentclass[12pt, twoside]{book}
% For a bilingual document    
\usepackage[banglamainfont=Kalpurush, banglattfont=Kalpurush]{latexbangla}                               
%activate polyglossia
\setdefaultlanguage[numerals=Bengali, changecounternumbering=true]{bengali}
%number all levels
\setcounter{secnumdepth}{5}
  \setotherlanguage{english}
\usepackage[autostyle]{csquotes}
% \usepackage[backend=biber, sorting=none, language=english, autolang=other, block=ragged]{biblatex}
% \addbibresource{bookbib.bib}

\usepackage{lipsum}
\usepackage{enumitem}
\setlist[itemize]{label*={\fontfamily{lmr}\selectfont\textbullet}}
\begin{document}
\tableofcontents

\chapter{First Chapter}
\section*{First Section}
পিথাগোরাস (Pythagoras)-এর উপপাদ্যটি হল,\\
``সমকোণী ত্রিভুজের অতিভুজের উপর অঙ্কিত বর্গক্ষেত্রের ক্ষেত্রফল অপর দুই বাহুর উপর অঙ্কিত বর্গক্ষেত্রের ক্ষেত্রফলের সমষ্টির সমান।" \\
অর্থাৎ কোন সমকোণী ত্রিভুজের অতিভুজ $c$ এবং অপর দুই বাহু $a$ এবং $b$ হলে,
\[c^2=a^2+b^2\]

\begin{itemize}
  \item The individual entries are indicated with a black dot, a so-called bullet.
  \item The text in the entries may be of any length.
\end{itemize}

% \nocite{*} % adds all entries in the bib file to the bibliography  % https://tex.stackexchange.com/a/13513/114006
% \printbibliography

\end{document}

使用以下命令:

tex4ebook -x -f epub3 file.tex mathml

我收到以下错误:

[STATUS]  tex4ebook: Conversion started                                                                 
[STATUS]  tex4ebook: Input file: testing_bangla.tex                                                     
[WARNING] tocid: char-def module not found                                                              
[WARNING] tocid: cannot fix section id's                                                                
This is XeTeX, Version 3.141592653-2.6-0.999994 (MiKTeX 22.3) (preloaded format=xelatex.fmt)            
 restricted \write18 enabled.                                                                           
entering extended mode     

但没有生成 ePUB 文件。我该如何解决?如果需要配置文件,应该包含什么?

答案1

问题在于孟加拉数字被写入 TeX4ht 用于存储交叉引用的辅助文件中。当加载交叉引用时,这些数字会导致编译错误,因为它们是活动字符。

解决方法是,使用 TeX4ebook 关闭孟加拉语编号:

% !TEX program = xelatex
% !BIB program = biblatex
\documentclass[12pt, twoside]{book}
% For a bilingual document    
\usepackage[banglamainfont=Kalpurush, banglattfont=Kalpurush]{latexbangla}                               
%activate polyglossia
\ifdefined\HCode
\setdefaultlanguage[numerals=Bengali, changecounternumbering=false]{bengali}
\else
\setdefaultlanguage[numerals=Bengali, changecounternumbering=true]{bengali}
\fi
%number all levels
\setcounter{secnumdepth}{5}
  \setotherlanguage{english}
\usepackage[autostyle]{csquotes}
% \usepackage[backend=biber, sorting=none, language=english, autolang=other, block=ragged]{biblatex}
% \addbibresource{bookbib.bib}

\usepackage{lipsum}
\usepackage{enumitem}
\setlist[itemize]{label*={\fontfamily{lmr}\selectfont\textbullet}}
\begin{document}
\tableofcontents

\chapter{First Chapter}
\section*{First Section}
পিথাগোরাস (Pythagoras)-এর উপপাদ্যটি হল,\\
``সমকোণী ত্রিভুজের অতিভুজের উপর অঙ্কিত বর্গক্ষেত্রের ক্ষেত্রফল অপর দুই বাহুর উপর অঙ্কিত বর্গক্ষেত্রের ক্ষেত্রফলের সমষ্টির সমান।" \\
অর্থাৎ কোন সমকোণী ত্রিভুজের অতিভুজ $c$ এবং অপর দুই বাহু $a$ এবং $b$ হলে,
\[c^2=a^2+b^2\]

\begin{itemize}
  \item The individual entries are indicated with a black dot, a so-called bullet.
  \item The text in the entries may be of any length.
\end{itemize}

% \nocite{*} % adds all entries in the bib file to the bibliography  % https://tex.stackexchange.com/a/13513/114006
% \printbibliography

\end{document}

请注意检查 TeX4ht 的这段代码:

\ifdefined\HCode
\setdefaultlanguage[numerals=Bengali, changecounternumbering=false]{bengali}
\else
\setdefaultlanguage[numerals=Bengali, changecounternumbering=true]{bengali}
\fi

有时,有条件地包含包或其选项是最简单的。

借助构建文件(build.lua),您仍然能够获得章节和目录中的数字:

local domfilter = require "make4ht-domfilter"
local filter    = require "make4ht-filter"
local domobject = require "luaxml-domobject"

-- we will calculate unicode character from this
local bengali_zero = 0x09E6 - 48
local uchar = utf8.char
local ubyte = utf8.codepoint

-- convert arabic number to bengali
local function arabic_to_bengali(text)
  return text:gsub("([0-9])", function(a)
    return uchar(ubyte(a) + bengali_zero)
  end)
end

local function process_children(head)
  for _, child in ipairs(head._children) do
    if child:is_text() then
      child._text = arabic_to_bengali(child._text)
    end
  end
end

local process = domfilter {
  function(dom)
    -- process section numbers
    for _, head in ipairs(dom:query_selector(".titlemark")) do
      process_children(head)
    end
    -- process TOC
    for _, toc in ipairs(dom:query_selector(".tableofcontents span,nav#toc li")) do
      process_children(toc)
    end
    return dom
  end
}

-- we must fix also the ncx file, which is used for Epub TOC
-- we must clean it first, in order to be able to process it using LuaXML
local ncx_process = filter {
  function(text)
    local text = text:gsub("^%s*", "") -- remove whitespace at the beginning
    local dom  = domobject.parse(text) -- convert text to DOM
    for _, mark in ipairs(dom:query_selector("navmark")) do -- process elements that can contain numbers
      process_children(mark)
    end
    return dom:serialize()
  end
}


Make:match("html$", process)
Make:match("ncx", ncx_process)

它使用 LuaXML 处理生成的 HTML 文件,并将章节和目录中的阿拉伯语替换为孟加拉语数字。

你可以通过以下方式执行:

tex4ebook -f epub3 -x -e build.lua file.tex "mathml"

结果如下:

在此处输入图片描述

相关内容