tex4ht 中的 pictureenv 更改了表格内列表中的某些字母

tex4ht 中的 pictureenv 更改了表格内列表中的某些字母

我正在pictureenv使用 使用 SVG 时,tex4ht 与表格内和表格外的数学运算发生冲突

当向包含的表中添加第二列时,它会因为某种原因将列表中任何位置的listings字符[和更改为字母。_x

这是 MWE

\documentclass[11pt]{article}
\usepackage{amsmath,mathtools,amssymb}   

\usepackage{listings}
\usepackage{pictureenv}

\begin{document} 

\ifdefined\HCode
\begin{pictureenv}
\fi                     
\begin{tabular}{|p{3in}|p{2.5in}|}\hline    
${\frac {d}{{d}x}}y \left( x \right) = \left( -2+x \right) ^{2}$&
\begin{lstlisting}
[_quadrature_]
\end{lstlisting}\\\hline
\end{tabular}
\ifdefined\HCode
\end{pictureenv}
\fi                     

\end{document}

编译为

 make4ht -ulm default -f html5+dvisvgm_hashes T.tex "htm,pic-m,pic-align,svg,p-width"

我也试过

 make4ht -ulm default -f html5+dvisvgm_hashes T.tex "htm,pic-tabular,pic-align,svg,p-width"

给予

Mathematica 图形

pictureenv比较不使用时的输出:

\documentclass[11pt]{article}
\usepackage{amsmath,mathtools,amssymb}   

\usepackage{listings}
\usepackage{pictureenv}

\begin{document} 

\begin{tabular}{|p{3in}|p{2.5in}|}\hline    
${\frac {d}{{d}x}}y \left( x \right) = \left( -2+x \right) ^{2}$&
\begin{lstlisting}
[_quadrature_]
\end{lstlisting}\\\hline
\end{tabular}

\end{document}

使用相同的编译命令,现在它给出

Mathematica 图形

Linux Ubuntu 上的 TL 2018

答案1

您看到的是 Unicode 支持的结果tex4ht。它通过在文档中插入两个代码来工作。第一个是特殊指令,它告诉 tex4ht 用指令中存储的 Unicode 值替换下一个字符,第二个是将被替换的字符。它通常是x,但它可以是任何字符。它只是用于将字体信息传递给 tex4ht,因此它可以将 Unicode 渲染为粗体、斜体等。

问题出在图片上,因为它们是由外部命令生成的,通常dvisvgm是 或dvipng。它们不知道如何处理 tex4ht 特殊字符,因此它们将被忽略,只x显示 。

我们可以尝试使用 LuaTeX 来解决这个问题。可以使用节点回调来处理文档节点,检测图片并手动替换字符。这并不像听起来那么容易,因为我们不能只将 Unicode 值设置为替换的字符。而是需要设置正确的字形编号。Unicode 和特定 TeX 字体中的字形之间没有通用的映射。幸运的是,tex4ht 以以下形式为大多数 TeX 字体提供了此类映射HTF 文件。可以编写 Lua 库来搜索 HTF 文件并解析映射。

事实证明这是相当复杂的事情,我必须承认我在图片生成中发现了一个严重的问题。有时 Unicode 值和字体字形之间的映射不存在。例如,即使\textellipsis使用此方法,该命令也不起作用。这在实践中不应该是问题,因为这种图片限制已经存在了很长一段时间,但没有人抱怨过。这只是我发现的一个限制,目前我找不到解决方案。

介绍已经足够了,我们现在可以开始讨论代码了。

首先,我们需要HTF文件库htffontreader.lua

kpse.set_program_name "luatex"
local entities = require "luaxml-entities"
local texmfdist = kpse.expand_var("$TEXMFDIST")
local default_paths = {
  texmfdist .. "/tex4ht/ht-fonts/mozilla/",
  texmfdist .. "/tex4ht/ht-fonts/unicode/",
  texmfdist .. "/tex4ht/ht-fonts/ascii/",
  texmfdist .. "/tex4ht/ht-fonts/alias/"
}

local function str_to_table(str)
  local characters = {}
  str:gsub(".", function(a) table.insert(characters, a) end)
  return characters
end

-- convert the .4ht string field to a Unicode codepoint
local function get_char(str)
  -- it is necessary to decode XML entites first
  local newstr = entities.decode(str)
  -- get Unicode codepoints of the string
  local chars = {}
  -- the string.utfvalues is LuaTeX extension 
  for codepoint in string.utfvalues(newstr) do
    chars[#chars+1] = codepoint
  end
  -- return whole string if there is more than one codepoint
  -- it is useless in tex4ht char to node.char mapping
  if #chars > 1 then return newstr end
  return chars[1]
end


local function read_file(filename)
  local f = io.open(filename, "r")
  if not f then return nil, "Cannot open file " .. filename end
  local content = f:read("*all")
  f:close()
  return content
end

local function traverse_htf_files(dir, addresses)
  -- local addresses = addresses or {}
  for file in lfs.dir(dir) do
    -- skip current and parent dir links"
    if file ~= "." and file ~=".." then
      local current_path  =  dir .. file
      local attr = lfs.attributes(current_path)
      if attr.mode == "directory" then
        traverse_htf_files(current_path .. "/", addresses)
      elseif attr.mode == "file" then
        if file:match("htf$") then
          file = file:gsub(".htf$", "")
          -- print(current_path, attr.mode)
          addresses[file] = current_path
        end
      end
    end
  end
  return addresses
end

-- find all .htf and .4hf files in list of directories
local function find_htf_files(directories)
  local addresses = {}
  for _, dir in ipairs(directories) do
    addresses = traverse_htf_files(dir, addresses)
  end
  return addresses
end

-- the htf files may contain only part of the font file name
-- we must build graph for efficient lookup for the correct
-- corresponding htf file
local function make_lookup_table(addresses)
  local function step(characters, lookup)
    if #characters > 0 then
      local char = table.remove(characters,1)
      local subtab = lookup[char] or {}
      lookup[char] = step(characters, subtab)
    end
    return lookup
  end
  local lookup = {}
  for file, _ in pairs(addresses) do
    -- get individual characters as a table
    local characters = str_to_table(file)
    lookup = step(characters, lookup)

  end
  return lookup
end

local function lookup_font(font_name, lookup_table)
  local function lookup(characters, tbl)
    if #characters < 1 then return "" end
    local char = table.remove(characters, 1) 
    local subtab = tbl[char]
    if not subtab then return "" end
    return char .. lookup(characters, subtab)
  end
  local characters = str_to_table(font_name)
  return lookup(characters, lookup_table)
end

local function get_htf_css(content)
  local htfcss = {}
  for name, style in content:gmatch("htfcss:%s*([%w]+)%s*([^\n]+)") do
    htfcss[name] = style
  end
  return htfcss
end

local function parse_htf_line(line)
  -- details about the htf file: https://tug.org/applications/tex4ht/mn-htf.html
  -- from the manual:
  --   The ‘string’ field may include any sequence of characters, except for
  --   its delimiters. The backslash character ‘\’ acts there as an escaped
  --   character. It may act as a delimiter for a character code, or be
  --   followed by another backslash (that is, ‘\\’ represents the character
  --   ‘\’ ). 
  --   In the string part, use ‘&lt;’ for the character ‘<’, ‘&gt;’ for ‘>’, and ‘&amp;’ for ‘&’; 
  local escape = function(str)
    local str = str or ""
    str = str:gsub("\\\\", "\\"):gsub("\\'","'")
    return str
  end
  local str, class = line:match("^%s*'(.-)'%s+'([0-9]*)'")
  -- from the manual: 
  --   A ‘class’ specified by an odd integer value asks for a
  --   pictorial character. An even integer number asks for a non-pictorial
  --   character, specified in the ‘string’ field. An empty class field is
  --   treated as a zero value. 
  if not str then return nil, "Cannot parse htf line: " .. line end
  class = class or "" -- add default value
  class = tonumber(class) or 0 -- convert empty class to zero
  return escape(str), class
end

local function parse_htf_glyphs(content, addresses)
  local map = {}
  local backmap = {}
  local readpos = 0
  local function readline()
    local start
    start, readpos, line = content:find("([^\n]-)\n", readpos)
    -- print(readpos, line)
    readpos = readpos + 1
    return line
  end
  -- first detect if the htf file isn't only link to another one
  local link = content:match("^%s*%.([^%s]+)")
  if link then
    local newfile = addresses[link]
    if not newfile then return nil, "Cannot load htf file for ".. link end
    local content = read_file(newfile)
    return parse_htf_glyphs(content, addresses)
  end
  -- read htf name, start char and end char
  local firstline = readline()
  local name, start, finish = firstline:match("^([^%s]+)%s+([%d]+)%s+([%d]+)")
  if not name then return nil, "cannot parse htf file" end
  -- convert the values to numbers
  local start, finish = tonumber(start), tonumber(finish)
  -- calculate number of lines to be read 
  local count = finish - start - 1
  for i = 1, count do
    local line = readline()
    -- char may be character code or list of character codes
    local str, class = parse_htf_line(line)
    local char = get_char(str) 
    -- print(start, line)
    -- print(start, str, class, char)
    -- map character code to the tfm font position
    if char then
      map[char] = start
    end
    -- map tfm position to tex4ht character class and the replacement strin
    backmap[start] = {class = class, str = str}
    start = start + 1
  end

  print("Parse htf font", name, start, finish)
  return map, backmap
end

local function load_font(font_name, addresses)
  --- todo: continue here
  local content, msg = read_file(font_name)
  if not content then return nil, msg end
  local htfcss = get_htf_css(content)
  -- return two tables, one from unicode to font positions, the other in the other direction
  local map, backmap = parse_htf_glyphs(content, addresses)
  return {htfcss = htfcss, map = map, backmap = backmap}
end

local function get_font(font_name, lookup_table, addresses)
  local htf_name = lookup_font(font_name, lookup_table)
  if htf_name and htf_name ~= "" then
    local font_file = addresses[htf_name]
    -- this shouldn't happen
    if not font_file then return nil, "Cannot find font file: " .. htf_name end
    return load_font(font_file, addresses)

  else
    return nil, "Cannot find HTF font: " .. font_name
  end
end

local function htfobject(paths)
  local paths = paths or default_paths
  local htfont = {}
  htfont.font_cache = {}
  htfont.addresses, msg = find_htf_files(paths)
  if not htfont.addresses then return nil, msg end
  htfont.lookup_table = make_lookup_table(htfont.addresses)
  function htfont:get_font(fontname)
    local f = self.font_cache[fontname] or get_font(fontname, self.lookup_table, self.addresses)
    self.font_cache[fontname] = f
    return f
  end
  htfont.__index = htfont
  return setmetatable({}, htfont)
end


-- some testing
if arg[0] == "htffontreader.lua" then
  local htfx = htfobject()

  local cmsy = htfx:get_font("rm-lmr10")
  -- print(get_font("cmsy10", lookup_table, addresses))
  -- print(get_font("cmmi10", lookup_table, addresses))
  -- print(get_font("lm-ec1000", lookup_table, addresses))
  local cmss = htfx:get_font("cmss")
  for name, style in pairs(cmss.htfcss) do
    print(name, style)
  end
end

local M = {}
M.htfobject = htfobject
return M

图片处理回调位于fixpictures4ht.lua库中:

local htffontreader = require "htffontreader"
local hlist_id = node.id "hlist"
local vlist_id = node.id "vlist"
local whatsit_id = node.id "whatsit"
local glyph_id = node.id "glyph"
-- get the special subtype
local whatsits = node.whatsits()
local special_id  

-- font database object
local fontdb = htffontreader.htfobject()

local supported_htf_fonts

-- from Luaotfload documentation
local function unsafe_getfont (id)
  local tfmdata = font.getfont (id)
  if not tfmdata then
    tfmdata = font.fonts[id]
  end
  return tfmdata
end

local font_infos = {}
local function get_font_info(id)
  local info = font_infos[id]
  if info then return info end
  local tfmdata = unsafe_getfont(id)
  local name = tfmdata.name
  local format = tfmdata.properties.format
  font_infos[id] = name
  print("Loading htf file for " .. name)
  return name
end



local utfchar = unicode.utf8.char
local in_picture = false
local function execute_tex4ht(head, n)
  local was_tex4ht = false
  local t4ht, data = n.data:match("(t4ht)(.+)")
  if t4ht == "t4ht" then was_tex4ht = true end
  if was_tex4ht then
    if in_picture then
      -- tex4ht.sty definition for the \Picture(+|*) commands redefines the \ht:special command to propend t4ht+ in fornt of 
      -- the special code. I guess that the tex4ht command then somehow handles that, but I didn't investigate that. anyway, 
      -- we need to remove the spurious +t4ht part
      data = data:gsub("^%+t4ht","")
    end
    if in_picture and data:match("^@") then
      -- interpolate tex4ht escaped entities
      data = data:gsub("{([0-9]+)}", function(x) return string.char(x) end)
      -- detect hexadecimal entities
      local char = data:match("%&%#x([0-9a-fA-F]+);") 
      if char then
        char = tonumber(char, 16)
      else
        -- decimal entity
        char = data:match("^@([0-9]+)") or data:match("^@%&%#([0-9]+;")
        if char then
          char = tonumber(char)
        end
      end
      if char then 
        -- we must replace the next glyph char with contents of this special
        local nextnode = n.next
        if nextnode.id == glyph_id then
          -- it is necessary to do new kerning
          local font_name = get_font_info(nextnode.font)
          local fontdata = fontdb:get_font(font_name)
          local nextchar = fontdata.map[char]
          if nextchar then
            nextnode.char = nextchar
          else
            -- the character is not available in the htf file. why?
            -- one possibility is the non breaking space
            if char == 160 or char==32 then
              -- replace it with ordinary space?
              local glue = node.new("glue")
              glue.width = tex.sp(".6em")
              n.next = glue
              glue.next = nextnode.next
            end
          end
        end
      else 
        print("data", data)
      end
    elseif data:match("%+%+") then
      local picture_name = data:match("%+%+(.+)")
      -- sometimes we match something different than filename
      -- so try to detect that it is really a filename (we check that it ends
      -- with extension)
      if picture_name:match("%.[a-zA-Z]-$") then
        print("start picture", picture_name)
        in_picture = true
        -- pagelist[picture_name] = tex.count[ "c@page" ]
      end
    elseif data == "+" then
      print "end picture"
      in_picture = false
    end
  end
  return head, was_tex4ht
end

local function process(head)
  for n in node.traverse(head) do
    local id = n.id
    if id == hlist_id or id == vlist_id  then
      n.head = process(n.head)
    elseif id == whatsit_id and (n.subtype == special_id or whatsits[n.subtype] == "special")  then
      special_id = n.subtype
      -- act on the special node and detect if it was tex4ht special
      local was_tex4ht 
      head, was_tex4ht= execute_tex4ht(head, n)
    end
  end
  return head
end

local M = {}
M.process = process
return M

必须安装回调,这可以在tuenc-luatex.4ht文件的重新定义版本中完成:

% tuenc-luatex.4ht, generated from tex4ht-4ht.tex
% Copyright 2017 TeX Users Group
%
% This work may be distributed and/or modified under the
% conditions of the LaTeX Project Public License, either
% version 1.3c of this license or (at your option) any
% later version. The latest version of this license is in
%   http://www.latex-project.org/lppl.txt
% and version 1.3c or later is part of all distributions
% of LaTeX version 2005/12/01 or later.
%
% This work has the LPPL maintenance status "maintained".
%
% This Current Maintainer of this work
% is the TeX4ht Project <[email protected]>.
%
% If you modify this program, changing the
% version identification would be appreciated.
\immediate\write-1{version 2017-01-24-15:21}

\RequirePackage{luatexbase}
\RequirePackage{luacode}

\begin{luacode*}
  local fontspec = require "fontspec-4ht"
  local fixfonts = require "fixpictures4ht"
  luatexbase.add_to_callback("pre_linebreak_filter", fontspec.char_to_entity, "Char to entity")
  luatexbase.add_to_callback("hpack_filter", fontspec.char_to_entity, "hpack-char-to-entity")
  luatexbase.add_to_callback("pre_linebreak_filter", fixfonts.process, "Fix unicode in pictures")
\end{luacode*}
\Hinput{tuenc-luatex}
\endinput

还有一个问题是列表的默认配置相当复杂,并且重新定义了很多东西。你不想让它处于图片模式,所以我们必须配置环境pictureenv以忽略其中的大部分内容:

\Preamble{xhtml}
\ConfigureEnv{pictureenv}{%
\Configure{listings-init}{\special{t4ht@(}\ttfamily}{\special{t4ht@)}}
\ConfigureEnv{lstlisting}{}{}{}{}
\Configure{listings}{{\leavevmode}}{}{}{\newline}
\Picture*{}}{\EndPicture}{}{}
\begin{document}
\EndPreamble

配置

\Configure{listings}{{\leavevmode}}{}{}{\newline}

对于多行列表尤其重要,因为默认配置会导致它们折叠为一行。

我准备了一个带有更多说明的示例:

\documentclass[11pt]{article}
\usepackage{amsmath,mathtools,amssymb}   

\usepackage{listings}
\usepackage{pictureenv}

\begin{document} 

\begin{pictureenv}
\begin{tabular}{|p{3in}|p{2.5in}|}\hline    
${\frac {d}{{d}x}}y \left( x \right) = \left( -2+x \right) ^{2}$&
\begin{lstlisting}
[_quadrature_]
\end{lstlisting}\\\hline
\end{tabular}
\end{pictureenv}


divný příliš~žluťoučký kůň \textunderscore 
\begin{pictureenv}
\begin{lstlisting}
\verb|now_|@/$
  some spaces
no spaces
\end{lstlisting}


divný příliš~žluťoučký kůň \textunderscore 

\begin{verbatim}
\verb|now_|@/$
\end{verbatim}

\end{pictureenv}

\end{document}

这是默认渲染(无需listings配置!):

在此处输入图片描述

这是处理后的结果fixpictures4ht.lua

在此处输入图片描述

相关内容