有人能帮助我使用 tex4ht (htlatex) 后需要更新哪个文件才能删除 html 文件中 html 标签后的换行符/空格,如下所示:
从:
span
class="cmr-9"
到:
span
谢谢 Prasad
答案1
换行符通常由配置直接插入tex4ht
,可能是因为需要防止可能的连字符。您可以使用tidy
命令来清理 html。
因为您没有提供示例,所以有一个小文件sample.tex
:
\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\begin{document}
Helo \textit{world}, \texttt{hello} again.
Příliš \textit{žluťoučký kůň úpěl} ďábelské ódy.
\end{document}
默认转换:
<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<!--http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd-->
<html xmlns="http://www.w3.org/1999/xhtml"
>
<head><title></title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="generator" content="TeX4ht (http://www.tug.org/tex4ht/)" />
<meta name="originator" content="TeX4ht (http://www.tug.org/tex4ht/)" />
<!-- xhtml,charset=utf-8,html -->
<meta name="src" content="sample.tex" />
<meta name="date" content="2015-05-12 14:47:00" />
<link rel="stylesheet" type="text/css" href="sample.css" />
</head><body
>
<!--l. 7--><p class="noindent" >Helo <span
class="ecti-1000">world</span>, <span
class="ectt-1000">hello </span>again.
</p><!--l. 9--><p class="indent" > Příliš <span
class="ecti-1000">žlu</span><span
class="ecti-1000">ťou</span><span
class="ecti-1000">čk</span><span
class="ecti-1000">ý ků</span><span
class="ecti-1000">ň </span><span
class="ecti-1000">úp</span><span
class="ecti-1000">ěl </span>ďábelské ódy. </p>
</body></html>
并使用转换制作4小时使用此构建文件sample.mk4
:
local filter = require "make4ht-filter"
local process = filter{"cleanspan-nat", "fixligatures", "hruletohr"}
Make:htlatex()
Make:htlatex()
Make:match("html$",process)
Make:match("html$", "tidy -m -asxhtml -utf8 -q -i ${filename}")
编译
make4ht -u 样本.tex
结果:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<!--http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd-->
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator" content=
"HTML Tidy for Linux (vers 25 March 2009), see www.w3.org" />
<title></title>
<meta http-equiv="Content-Type" content=
"text/html; charset=utf-8" />
<meta name="generator" content=
"TeX4ht (http://www.tug.org/tex4ht/)" />
<meta name="originator" content=
"TeX4ht (http://www.tug.org/tex4ht/)" />
<!-- xhtml,charset=utf-8,html -->
<meta name="src" content="sample.tex" />
<meta name="date" content="2015-05-12 14:49:00" />
<link rel="stylesheet" type="text/css" href="sample.css" />
</head>
<body>
<!--l. 7-->
<p class="noindent">Helo <span class="ecti-1000">world</span>,
<span class="ectt-1000">hello</span> again.</p><!--l. 9-->
<p class="indent">Příliš <span class="ecti-1000">žluťoučký kůň
úpěl</span> ďábelské ódy.</p>
</body>
</html>
你可以运行命令
tidy -m -asxhtml -utf8 -q -i filename.html
如果你不想使用make4ht