tex4ht 对梵文文本的奇怪行为

tex4ht 对梵文文本的奇怪行为

我目前正在写一些引用大量国际文献的文本,我喜欢真实地引用,即不使用拉丁近似值。这通常使用 效果很好polyglossia,但不知何故,天城文似乎有所不同。以下测试在 中按预期工作XeLaTeX,但在 - 中不会产生天城文(梵文)tex4ebook,也不会产生段落,并且在 处出现错误\begin{document}。所有其他外国字体都运行良好。

\documentclass[twoside]{book}
\usepackage{fontspec}
\usepackage{etoolbox}
\usepackage{polyglossia}
\setmainlanguage{english}
\setotherlanguages{greek,hebrew,sanskrit,arabic,thai}

\newfontfamily\latinfont[Script=Latin,Ligatures=TeX]{Linux Libertine O}
\newfontfamily\devanagarifont[Script=Devanagari]{Noto Sans Devanagari}
\newfontfamily\greekfont[Ligatures=TeX,Script=Greek]{Linux Libertine O}
\newfontfamily\hebrewfont[Script=Hebrew]{Noto Sans Hebrew}
\newfontfamily\arabicfont[Script=Arabic]{Noto Sans Arabic}
\newfontfamily{\thaifont}[Script=Thai]{Noto Sans Thai}

\begin{document}

This is some English test text.

The apology of Socrate: \textgreek{ἀπολογία Σωκράτους}

And some text in Sanskrit: \textsanskrit{न चाशुश्रूषवे वाच्यं न च मां योऽभ्यसूयति}

Some Arabic: \textarabic{لكن لا بد أن أوضح لك أن كل}

Some Hebrew:  \texthebrew{בראשית ברא אלהים את השמים ואת הארץ}

Some Thai: \textthai{โปรแซลมอน เยลลี่ แพตเทิร์นสไตล์สเต็ป จังโก้ สหัสวรรษ}

\end{document}

因此,XeLaTeX所有文本都显示出来。tex4ebook除了天城文文本外,所有内容都显示出来。它完全缺失了.html

<?xml version='1.0' encoding='utf-8' ?> 
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"><html xmlns='http://www.w3.org/1999/xhtml'> 
<head><title></title> 
<meta http-equiv='Content-Type' content='text/html; charset=utf-8' /> 
<meta content='TeX4ht (https://tug.org/tex4ht/)' name='generator' /> 
<meta content='TeX4ht (https://tug.org/tex4ht/)' name='originator' /> 
<!--  xhtml,charset=utf-8,epub,uni-html4,html  --> 
<meta content='CorruptDevanagari.tex' name='src' /> 
<link type='text/css' href='CorruptDevanagari.css' rel='stylesheet' /> 
</head><body>
   This is some English test text.
   The apology of Socrate: ἀπολογία Σωκράτους
   And some text in Sanskrit:
   Some Arabic: لكن لا بد أن أوضح لك أن كل
   Some Hebrew: בראשית ברא אלהים את השמים ואת הארץ
   Some Thai: โปรแซลมอน เยลลี่ แพตเทิร์นสไตล์สเต็ป จังโก้ สหัสวรรษ   

</body></html>

如果我将文本粘贴到.html浏览器中,它会很好地显示出来。另一个令人费解的观察结果是<p>缺少标记。

除此之外,我在使用开关时收到以下错误-a debug,但这似乎与天城文问题无关:

Package polyglossia Warning: Patchingbiditablefailed! on input line 25.

! You can't use `\relax' after \the.
\NoHtmlEnv ....0pt\ht:everypar {\the \ht:everypar 
                                              }
l.25 \begin{document}
                 
?

这个错误没有出现,XeLaTeX而且我不知道,我能做些什么 - 或者我是否首先必须关心它。

答案1

在 XeTeX 模式下,TeX4ht 将 Unicode 字符定义为活动字符,以便可以将它们作为特殊标记插入到输出中,这将在 HTML 输出中产生相同的 Unicode 字符。由于有成千上万个 Unicode 字符,我们仅根据字体脚本声明所需的字符。由于处理中的错误\newfontfamily,当您在字体名称前使用可选参数时,这无法正常工作。

这是修复版本。将以下代码保存为usepackage-fontspec.4ht

% usepackage-fontspec.4ht (2020-09-02-14:24), generated from tex4ht-4ht.tex
% Copyright 2017-2020 TeX Users Group
%
% This work may be distributed and/or modified under the
% conditions of the LaTeX Project Public License, either
% version 1.3c of this license or (at your option) any
% later version. The latest version of this license is in
%   http://www.latex-project.org/lppl.txt
% and version 1.3c or later is part of all distributions
% of LaTeX version 2005/12/01 or later.
%
% This work has the LPPL maintenance status "maintained".
%
% The Current Maintainer of this work
% is the TeX4ht Project <http://tug.org/tex4ht>.
%
% If you modify this program, changing the
% version identification would be appreciated.
\immediate\write-1{version 2020-09-02-14:24}

% \RequirePackage{expl3}% we need to disable them before loading
\ExplSyntaxOn
\seq_new:N \fontspec_ht_scripts
\gdef\texfourhtfontspecloaded{yes}% used to prevent subsequent loading of this file
\ExplSyntaxOff
\ifdefined\XeTeXversion%
\xenunidelblock{Latin-expl3}% expl3 package makes some characters active
\xeuniuseblock{Latin-expl3}% and define again
\fi%
\PassOptionsToPackage{no-math}{fontspec}
\ExplSyntaxOn
\:AtEndOfPackage{%
  \tl_gset:Nx \l__fontspec_nfss_enc_tl {T1}
  \tl_gset:Nx \g_fontspec_encoding_tl {T1}
  \tl_gset:Nx \l__fontspec_ttfamily_encoding_tl {T1}
  \tl_gset:Nx \l__fontspec_sffamily_encoding_tl {T1}
  \tl_gset:Nx \l__fontspec_rmfamily_encoding_tl {T1}
  \seq_new:N \fontspec_ht_fontfamilies
  \ifdefined\XeTeXversion
  \keys_define:nn {fontspec4ht}{
    Script .code:n = \xeuniuseblock{#1}
  }
  \else
  \keys_define:nn {fontspec4ht}{
    Script .code:n = \seq_put_right:Nn \fontspec_ht_scripts {#1}
  }
  \fi
\cs_set:Nn \fontspec_set_family:Nnn
 {
  % \tl_set:Nn \l__fontspec_family_label_tl { #1 }
  % \__fontspec_select_font_family:nn {#2}{#3}
  % \tl_set_eq:NN #1 \l_fontspec_family_tl
  \def#1{\relax}
 }


\prg_set_conditional:Nnn \fontspec_if_fontspec_font: {TF,T,F}
{
  \prg_return_false:
}

\DeclareDocumentCommand \setmainfont { O{} m O{} }
 {
   % Optional argument can be in both first and third parameter
  \keys_set_known:nn {fontspec4ht}{#1}
  \keys_set_known:nn {fontspec4ht}{#3}
  \seq_put_right:Nn \fontspec_ht_fontfamilies {#2}
  \use:x { \exp_not:n { \DeclareRobustCommand \rmfamily }
   {
    \relax
   }
  }
  \normalfont
  \ignorespaces
 }

 % define aliases for other user commands
\cs_set_eq:NN \fontspec\setmainfont
\cs_set_eq:NN \setsansfont\setmainfont
\cs_set_eq:NN \setmonofont\setmainfont
\cs_set_eq:NN \setromanfont\setmainfont
\cs_set_eq:NN \setmathrm\setmainfont
\cs_set_eq:NN \setmathsf\setmainfont
\cs_set_eq:NN \setboldmathrm\cs_set_eq:NN
\cs_set_eq:NN \setmatht\cs_set_eq:NN



\DeclareDocumentCommand \newfontfamily { m O{} m O{} }
 {
  % \fontspec_set_family:cnn { g__fontspec_ \cs_to_str:N #1 _family } {#2} {#3}
  \keys_set_known:nn {fontspec4ht}{#2}
  \keys_set_known:nn {fontspec4ht}{#4}
  \seq_put_right:Nn \fontspec_ht_fontfamilies {#3}
  \use:x
   {
    \exp_not:N \DeclareRobustCommand \exp_not:N #1
     {
       \relax
     }
   }
 }
 % \tl_set:Nn \g_fontspec_encoding_tl{T1}
 %  \tl_set_eq:NN \encodingdefault\g_fontspec_encoding_tl
 \DeclareDocumentCommand \addfontfeatures {m}
 {
   \keys_set_known:nn {fontspec4ht}{#1}
   \typeout{Add font features}
 }
 \cs_set_eq:NN \addfontfeature \addfontfeatures
  \global\expandafter\let\csname [email protected]\endcsname\relax
  \global\expandafter\let\csname [email protected]\endcsname\relax
}
\ExplSyntaxOff
\edef\TivhTcats{%
  \catcode`:=12%
  \catcode`@=\the\catcode`@%
}

\endinput

经过这种改变,你应该会得到正确的结果:

在此处输入图片描述

或者,您可以使用-l选项代替-x。得益于 LuaTeX,这将使用不同的方法处理 Unicode。它不会遇到未定义的 Unicode 字符的问题。

关于 Polyglossia 错误,我无法重现。您使用哪个 TeX 发行版?

相关内容