我无法删除 html 文件中 html 标签后的不必要的空格或换行符

我无法删除 html 文件中 html 标签后的不必要的空格或换行符

有人能帮助我使用 tex4ht (htlatex) 后需要更新哪个文件才能删除 html 文件中 html 标签后的换行符/空格,如下所示:

从:

span
class="cmr-9"

到:

span

谢谢 Prasad

答案1

换行符通常由配置直接插入tex4ht,可能是因为需要防止可能的连字符。您可以使用tidy命令来清理 html。

因为您没有提供示例,所以有一个小文件sample.tex

\documentclass{article}

\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\begin{document}

Helo \textit{world}, \texttt{hello} again.

Příliš \textit{žluťoučký kůň úpěl} ďábelské ódy.
\end{document}

默认转换:

<?xml version="1.0" encoding="utf-8" ?> 
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">  
<!--http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd-->  
<html xmlns="http://www.w3.org/1999/xhtml"  
> 
<head><title></title> 
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> 
<meta name="generator" content="TeX4ht (http://www.tug.org/tex4ht/)" /> 
<meta name="originator" content="TeX4ht (http://www.tug.org/tex4ht/)" /> 
<!-- xhtml,charset=utf-8,html --> 
<meta name="src" content="sample.tex" /> 
<meta name="date" content="2015-05-12 14:47:00" /> 
<link rel="stylesheet" type="text/css" href="sample.css" /> 
</head><body 
>
<!--l. 7--><p class="noindent" >Helo <span 
class="ecti-1000">world</span>, <span 
class="ectt-1000">hello </span>again.
</p><!--l. 9--><p class="indent" >   Příliš <span 
class="ecti-1000">žlu</span><span 
class="ecti-1000">ťou</span><span 
class="ecti-1000">čk</span><span 
class="ecti-1000">ý ků</span><span 
class="ecti-1000">ň </span><span 
class="ecti-1000">úp</span><span 
class="ecti-1000">ěl </span>ďábelské ódy. </p> 
</body></html> 

并使用转换制作4小时使用此构建文件sample.mk4

local filter = require "make4ht-filter"
local process = filter{"cleanspan-nat", "fixligatures", "hruletohr"}
Make:htlatex()
Make:htlatex()
Make:match("html$",process)
Make:match("html$", "tidy -m -asxhtml -utf8 -q -i ${filename}")

编译

make4ht -u 样本.tex

结果:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<!--http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd-->

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <meta name="generator" content=
  "HTML Tidy for Linux (vers 25 March 2009), see www.w3.org" />

  <title></title>
  <meta http-equiv="Content-Type" content=
  "text/html; charset=utf-8" />
  <meta name="generator" content=
  "TeX4ht (http://www.tug.org/tex4ht/)" />
  <meta name="originator" content=
  "TeX4ht (http://www.tug.org/tex4ht/)" />
  <!-- xhtml,charset=utf-8,html -->
  <meta name="src" content="sample.tex" />
  <meta name="date" content="2015-05-12 14:49:00" />
  <link rel="stylesheet" type="text/css" href="sample.css" />
</head>

<body>
  <!--l. 7-->

  <p class="noindent">Helo <span class="ecti-1000">world</span>,
  <span class="ectt-1000">hello</span> again.</p><!--l. 9-->

  <p class="indent">Příliš <span class="ecti-1000">žluťoučký kůň
  úpěl</span> ďábelské ódy.</p>
</body>
</html>

你可以运行命令

tidy -m -asxhtml -utf8 -q -i filename.html

如果你不想使用make4ht

相关内容