使用 tex4ebook 转换为 ePUB

Question

问题在于孟加拉数字被写入 TeX4ht 用于存储交叉引用的辅助文件中。当加载交叉引用时，这些数字会导致编译错误，因为它们是活动字符。

解决方法是，使用 TeX4ebook 关闭孟加拉语编号：

% !TEX program = xelatex
% !BIB program = biblatex
\documentclass[12pt, twoside]{book}
% For a bilingual document    
\usepackage[banglamainfont=Kalpurush, banglattfont=Kalpurush]{latexbangla}                               
%activate polyglossia
\ifdefined\HCode
\setdefaultlanguage[numerals=Bengali, changecounternumbering=false]{bengali}
\else
\setdefaultlanguage[numerals=Bengali, changecounternumbering=true]{bengali}
\fi
%number all levels
\setcounter{secnumdepth}{5}
  \setotherlanguage{english}
\usepackage[autostyle]{csquotes}
% \usepackage[backend=biber, sorting=none, language=english, autolang=other, block=ragged]{biblatex}
% \addbibresource{bookbib.bib}

\usepackage{lipsum}
\usepackage{enumitem}
\setlist[itemize]{label*={\fontfamily{lmr}\selectfont\textbullet}}
\begin{document}
\tableofcontents

\chapter{First Chapter}
\section*{First Section}
পিথাগোরাস (Pythagoras)-এর উপপাদ্যটি হল,\\
``সমকোণী ত্রিভুজের অতিভুজের উপর অঙ্কিত বর্গক্ষেত্রের ক্ষেত্রফল অপর দুই বাহুর উপর অঙ্কিত বর্গক্ষেত্রের ক্ষেত্রফলের সমষ্টির সমান।" \\
অর্থাৎ কোন সমকোণী ত্রিভুজের অতিভুজ $c$ এবং অপর দুই বাহু $a$ এবং $b$ হলে,
\[c^2=a^2+b^2\]

\begin{itemize}
  \item The individual entries are indicated with a black dot, a so-called bullet.
  \item The text in the entries may be of any length.
\end{itemize}

% \nocite{*} % adds all entries in the bib file to the bibliography  % https://tex.stackexchange.com/a/13513/114006
% \printbibliography

\end{document}

请注意检查 TeX4ht 的这段代码：

\ifdefined\HCode
\setdefaultlanguage[numerals=Bengali, changecounternumbering=false]{bengali}
\else
\setdefaultlanguage[numerals=Bengali, changecounternumbering=true]{bengali}
\fi

有时，有条件地包含包或其选项是最简单的。

借助构建文件（build.lua），您仍然能够获得章节和目录中的数字：

local domfilter = require "make4ht-domfilter"
local filter    = require "make4ht-filter"
local domobject = require "luaxml-domobject"

-- we will calculate unicode character from this
local bengali_zero = 0x09E6 - 48
local uchar = utf8.char
local ubyte = utf8.codepoint

-- convert arabic number to bengali
local function arabic_to_bengali(text)
  return text:gsub("([0-9])", function(a)
    return uchar(ubyte(a) + bengali_zero)
  end)
end

local function process_children(head)
  for _, child in ipairs(head._children) do
    if child:is_text() then
      child._text = arabic_to_bengali(child._text)
    end
  end
end

local process = domfilter {
  function(dom)
    -- process section numbers
    for _, head in ipairs(dom:query_selector(".titlemark")) do
      process_children(head)
    end
    -- process TOC
    for _, toc in ipairs(dom:query_selector(".tableofcontents span,nav#toc li")) do
      process_children(toc)
    end
    return dom
  end
}

-- we must fix also the ncx file, which is used for Epub TOC
-- we must clean it first, in order to be able to process it using LuaXML
local ncx_process = filter {
  function(text)
    local text = text:gsub("^%s*", "") -- remove whitespace at the beginning
    local dom  = domobject.parse(text) -- convert text to DOM
    for _, mark in ipairs(dom:query_selector("navmark")) do -- process elements that can contain numbers
      process_children(mark)
    end
    return dom:serialize()
  end
}


Make:match("html$", process)
Make:match("ncx", ncx_process)

它使用 LuaXML 处理生成的 HTML 文件，并将章节和目录中的阿拉伯语替换为孟加拉语数字。

你可以通过以下方式执行：

tex4ebook -f epub3 -x -e build.lua file.tex "mathml"

结果如下：

Answer 1