抑制 UTF-8 列表 HTML 输出中的 BOM

抑制 UTF-8 列表 HTML 输出中的 BOM

我使用以下方式将一些列表添加到我的 LaTeX 代码中

\lstinputlisting[inputencoding=utf8/latin1]{listing.txt}

列表取自使用 UTF-8(带 BOM)编码的文件。LaTeX 文件使用 UTF-8(不带 BOM)编码。

当使用 XeLaTeX 将我的代码编译为 PDF 时,一切正常,但是当使用 tex4ht 编译它时,我得到了实际列表开头的字符串(似乎来自 BOM 字符U+FEFF)。

有没有什么办法可以抑制它?

这是我的 MWE:

\documentclass{article}
\usepackage[utf8]{inputenc} %Files are encoded using UTF-8
\usepackage[T1]{fontenc} %The text uses German umlauts
\usepackage{listingsutf8}       
\lstset{literate=%
    {ä}{{\"a}}1 {ë}{{\"e}}1 {ï}{{\"i}}1 {ö}{{\"o}}1 {ü}{{\"u}}1
    {Ä}{{\"A}}1 {Ë}{{\"E}}1 {Ï}{{\"I}}1 {Ö}{{\"O}}1 {Ü}{{\"U}}1
    {á}{{\'a}}1 {é}{{\'e}}1 {í}{{\'i}}1 {ó}{{\'o}}1 {ú}{{\'u}}1
    {Á}{{\'A}}1 {É}{{\'E}}1 {Í}{{\'I}}1 {Ó}{{\'O}}1 {Ú}{{\'U}}1
    {à}{{\`a}}1 {è}{{\`e}}1 {ì}{{\`i}}1 {ò}{{\`o}}1 {ù}{{\`u}}1
    {À}{{\`A}}1 {È}{{\'E}}1 {Ì}{{\`I}}1 {Ò}{{\`O}}1 {Ù}{{\`U}}1
    {â}{{\^a}}1 {ê}{{\^e}}1 {î}{{\^i}}1 {ô}{{\^o}}1 {û}{{\^u}}1
    {Â}{{\^A}}1 {Ê}{{\^E}}1 {Î}{{\^I}}1 {Ô}{{\^O}}1 {Û}{{\^U}}1
    {œ}{{\oe}}1 {Œ}{{\OE}}1 {æ}{{\ae}}1 {Æ}{{\AE}}1 {ß}{{\ss}}1
    {ű}{{\H{u}}}1 {Ű}{{\H{U}}}1 {ő}{{\H{o}}}1 {Ő}{{\H{O}}}1
    {ç}{{\c c}}1 {Ç}{{\c C}}1
    {ã}{{\~a}}1 {å}{{\r a}}1 {Å}{{\r A}}1
    {ø}{{\o}}1 {€}{{\EUR}}1 {£}{{\pounds}}1
    {~}{{\textasciitilde}}1
}
\begin{document}
\lstinputlisting[language={},inputencoding=utf8/latin1]{listing.txt}
\end{document}

答案1

可能最简单的方法是从包含的文件中删除BOM。另一个解决方案是使用 Unicode TeX 引擎,这意味着LuaTeX目前tex4ht不支持XeTeX。有两种可能的解决方案可以让您的文件与tex4ht和 一起使用LuaTeX

inputenc用。。。来代替luainputenc

\documentclass{article}
\usepackage[utf8]{luainputenc} %Files are encoded using UTF-8
\usepackage[T1]{fontenc} %The text uses German umlauts
\usepackage{listingsutf8}       
\lstset{literate=%
    {ä}{{\"a}}1 {ë}{{\"e}}1 {ï}{{\"i}}1 {ö}{{\"o}}1 {ü}{{\"u}}1
    {Ä}{{\"A}}1 {Ë}{{\"E}}1 {Ï}{{\"I}}1 {Ö}{{\"O}}1 {Ü}{{\"U}}1
    {á}{{\'a}}1 {é}{{\'e}}1 {í}{{\'i}}1 {ó}{{\'o}}1 {ú}{{\'u}}1
    {Á}{{\'A}}1 {É}{{\'E}}1 {Í}{{\'I}}1 {Ó}{{\'O}}1 {Ú}{{\'U}}1
    {à}{{\`a}}1 {è}{{\`e}}1 {ì}{{\`i}}1 {ò}{{\`o}}1 {ù}{{\`u}}1
    {À}{{\`A}}1 {È}{{\'E}}1 {Ì}{{\`I}}1 {Ò}{{\`O}}1 {Ù}{{\`U}}1
    {â}{{\^a}}1 {ê}{{\^e}}1 {î}{{\^i}}1 {ô}{{\^o}}1 {û}{{\^u}}1
    {Â}{{\^A}}1 {Ê}{{\^E}}1 {Î}{{\^I}}1 {Ô}{{\^O}}1 {Û}{{\^U}}1
    {œ}{{\oe}}1 {Œ}{{\OE}}1 {æ}{{\ae}}1 {Æ}{{\AE}}1 {ß}{{\ss}}1
    {ű}{{\H{u}}}1 {Ű}{{\H{U}}}1 {ő}{{\H{o}}}1 {Ő}{{\H{O}}}1
    {ç}{{\c c}}1 {Ç}{{\c C}}1
    {ã}{{\~a}}1 {å}{{\r a}}1 {Å}{{\r A}}1
    {ø}{{\o}}1 {€}{{\EUR}}1 {£}{{\pounds}}1
    {~}{{\textasciitilde}}1
}
\begin{document}
\lstinputlisting[language={},inputencoding=utf8/latin1]{listing.txt}
\end{document}

使用技巧来获取fontspec使用 tex4ht这样,它就可以与tex4htLuaTeX甚至 一起使用XeTeX

\documentclass{article}
% \usepackage[utf8]{luainputenc} %Files are encoded using UTF-8
% \usepackage[T1]{fontenc} %The text uses German umlauts
\usepackage{alternative4ht}
\altusepackage{fontspec}
% \usepackage{listingsutf8}       
\usepackage{listings}
\lstset{literate=%
    {ä}{{\"a}}1 {ë}{{\"e}}1 {ï}{{\"i}}1 {ö}{{\"o}}1 {ü}{{\"u}}1
    {Ä}{{\"A}}1 {Ë}{{\"E}}1 {Ï}{{\"I}}1 {Ö}{{\"O}}1 {Ü}{{\"U}}1
    {á}{{\'a}}1 {é}{{\'e}}1 {í}{{\'i}}1 {ó}{{\'o}}1 {ú}{{\'u}}1
    {Á}{{\'A}}1 {É}{{\'E}}1 {Í}{{\'I}}1 {Ó}{{\'O}}1 {Ú}{{\'U}}1
    {à}{{\`a}}1 {è}{{\`e}}1 {ì}{{\`i}}1 {ò}{{\`o}}1 {ù}{{\`u}}1
    {À}{{\`A}}1 {È}{{\'E}}1 {Ì}{{\`I}}1 {Ò}{{\`O}}1 {Ù}{{\`U}}1
    {â}{{\^a}}1 {ê}{{\^e}}1 {î}{{\^i}}1 {ô}{{\^o}}1 {û}{{\^u}}1
    {Â}{{\^A}}1 {Ê}{{\^E}}1 {Î}{{\^I}}1 {Ô}{{\^O}}1 {Û}{{\^U}}1
    {œ}{{\oe}}1 {Œ}{{\OE}}1 {æ}{{\ae}}1 {Æ}{{\AE}}1 {ß}{{\ss}}1
    {ű}{{\H{u}}}1 {Ű}{{\H{U}}}1 {ő}{{\H{o}}}1 {Ő}{{\H{O}}}1
    {ç}{{\c c}}1 {Ç}{{\c C}}1
    {ã}{{\~a}}1 {å}{{\r a}}1 {Å}{{\r A}}1
    {ø}{{\o}}1 {€}{{\EUR}}1 {£}{{\pounds}}1
    {~}{{\textasciitilde}}1
}
\begin{document}
\lstinputlisting[language={},inputencoding=utf8/latin1]{listing.txt}
\end{document}

在这两种情况下,都使用

make4ht -ul filename

结果:

在此处输入图片描述

相关内容