make4ht -ul 与“pic-tabular”

make4ht -ul 与“pic-tabular”

我正在尝试使用表格中的 Devanagari 脚本从 tex 文件生成 html 文件

make4ht -ul book.tex "pic-tabular"

book.tex 的 MWE 为:

\documentclass[11pt, oneside, onecolumn, openright, final]{article}

\usepackage{alternative4ht}
\usepackage{polyglossia}

\makeindex
\setmainlanguage[numerals=Devanagari]{hindi}
\setmainfont[Script=Devanagari, BoldFont={Sahadeva}]{Nakula}
\newfontfamily\devanagarifont[Script=Devanagari, BoldFont={Sahadeva}]{Nakula}


\begin{document}

\begin{tabular}{c}
\end{tabular}

\end{document}

但生成的图像以及 html 文件包含字符 X,而不是स

看起来 utf8 在某个地方出了问题,可能在

t4ht book.dvi 
....
Entering book.lg
System call: dvipng -T tight -D 144 -bg Transparent -pp 1:1 book.idv -o book0x.png

有没有办法使用 dvipng 或 t4ht 来处理 unicode?

非常感谢您对这方面提供的任何帮助。

谢谢

答案1

这个问题不容易解决。众所周知的问题是 tex4ht 不支持 OpenType 字体,这会导致编译失败。为了解决这个问题,我们使用了fontspecLuaLaTeX 和 XeLaTeX 的破解版软件包,这会阻止字体的加载,而是使用普通的 tfm 字体。为了获得 Unicode 字符支持,我们对 LuaTeX 和 XeTeX 使用了不同的技巧,但在这两种情况下,我们都插入了\special命令,该命令指示tex4ht用命令中保存的 Unicode 值替换下一个字符\special。我们使用“x”作为这个字符,它最终将被替换。

问题是,当您从文档中的区域生成图片时使用的不同 DVI 处理器无法理解tex4ht特殊字符,因此它们只会渲染这个“x”字符。此外,字体信息会丢失,因此即使有正确的 Unicode 字符,它们也无法正确渲染它,因为 TFM 字体不支持该字符。即使我们使用正确的字符和正确的字体,并非所有 DVI 处理器都支持 OpenType 字体,就像 一样tex4ht

所以,这似乎有点无望。幸运的是,我找到了一个解决方案,可能有点复杂,但似乎有效:

  • 我们可以使用 LuaLaTeX 以 PDF 模式编译文档,以及tex4ht
  • 在此模式下,必须抑制 fontspec 的配置,因为我们希望加载 OpenType 字体
  • 我们提供特殊配置,将每张图片输出到自己的页面上
  • 我们可以插入 Lua 回调来查找包含图片的页面并保存此信息以供后续使用
  • 我们必须tex4ht在此模式下抑制 DVI 处理和 DVI 图像生成
  • 执行处理图片页面的脚本,使用 Ghostscript 将图片页面转换为 PNG 或 SVG,pdf转svg

我创建了一个特殊的make4ht构建文件,一个带有配置的 LaTeX 包tex4ht和几个 Lua 库来完成繁重的工作。首先,构建文件sample.mk4

local pdftoimg4ht = require "pdftoimg4ht"
local function write_empty_file(filename)
  local f = io.open(filename, "w")
  f:write("")
  f:close()
end

Make:add("pdftoimg", pdftoimg4ht.run)
Make:add("fakefontspec", function()
  -- block execution of these files
  write_empty_file("fontspec.4ht")
  write_empty_file("polyglossia.4ht")
  write_empty_file("usepackage-fontspec.4ht")
end)


set_settings {t4ht_par = "-p"}
Make:add("removefontspec", function()
  os.remove("fontspec.4ht")
  os.remove("usepackage-fontspec.4ht")
  os.remove("polyglossia.4ht")
end)

if mode=="images" then
  -- disable the default tex4ht support for fontspec
  Make:fakefontspec {}
  -- to suppress compilation error
  settings_add{ tex4ht_sty_par =  ",new-accents" }
  Make:htlatex {htlatex="lualatex", packages= "\\usepackage{save4htimages}"  }
  Make:pdftoimg {}
  -- disable the image conversion from t4ht
  Make:removefontspec {}
end

它定义了一种新模式,称为images。在此模式下,make4ht将生成图片,但不生成 HTML 文件。它必须在生成 HTML 之前调用。它可以按以下方式执行:

 make4ht -e sample.mk4 -ulnm images book.tex "pic-tabular"

这些-uln选项需要 Unicode 输出、LuaLaTeX 并禁用tex4htDVI 处理,因为在 PDF 模式下不需要。-m images需要“图像”模式。

对于 HTML 生成,使用

 make4ht -e sample.mk4 -ul book.tex "pic-tabular"

在这种情况下不要使用-n和选项,这一点很重要。-m images

需要一些附加文件:

save4htimages.sty

\AtBeginDocument{%
  % Configure Picture commands to output it's contents on a new page
  \Configure{Picture+}{\newpage}{\newpage}
  \Configure{Picture*}{\newpage}{\newpage}
  % install Lua callbacks to remove tex4ht specials
  \directlua{
    local t4htcallback = require "fontspec4ht-images"
    luatexbase.add_to_callback("pre_linebreak_filter", t4htcallback.process, "remove tex4ht specials")
    luatexbase.add_to_callback("hpack_filter", t4htcallback.process, "remove tex4ht specials")
    luatexbase.add_to_callback("vpack_filter", t4htcallback.process, "remove tex4ht specials")
    % this callback will save the image pages
    luatexbase.add_to_callback("finish_pdffile", t4htcallback.save_pages, "save image pages")
  }
}

它配置 tex4ht 将图片放置在独立页面上,并安装回调函数,该回调函数将保存图片页面名称和目标文件名。此信息保存在文件中\jobname-pagelist.lua

回调定义在fontspec4ht-images.lua

local M = {}
local hlist_id = node.id "hlist"
local vlist_id = node.id "vlist"
local whatsit_id = node.id "whatsit"
local glyph_id = node.id "glyph"
-- get the special subtype
local whatsits = node.whatsits()
local special_id  

local pagelist = {}


local utfchar = unicode.utf8.char
local function execute_tex4ht(head, n)
  local was_tex4ht = false
  local t4ht, data = n.data:match("(t4ht)(.+)")
  if t4ht == "t4ht" then was_tex4ht = true end
  if was_tex4ht then
    if data:match("@%+") then
      -- detect unicode characters
      local char = data:match("%{35%}x([0-9a-fA-F]+)%{59%}")
      if char then 
        -- we must replace the next glyph char with contents of this special
        local nextnode = n.next
        if nextnode.id == glyph_id then
          nextnode.char = tonumber(char, 16)
        end
      end
    elseif data:match("%+%+") then
      local picture_name = data:match("%+%+(.+)")
      -- sometimes we match something different than filename
      -- so try to detect that it is really a filename (we check that it ends
      -- with extension)
      if picture_name:match("%.[a-zA-Z]-$") then
        pagelist[picture_name] = tex.count[ "c@page" ]
      end
    end
  end
  return head, was_tex4ht
end

local function process(head)
  for n in node.traverse(head) do
    local id = n.id
    if id == hlist_id or id == vlist_id then
      n.head = process(n.head)
    elseif id == whatsit_id and (n.subtype == special_id or whatsits[n.subtype] == "special")  then
      special_id = n.subtype
      -- act on the special node and detect if it was tex4ht special
      local was_tex4ht 
      head, was_tex4ht= execute_tex4ht(head, n)
      if was_tex4ht then
        -- remove the special node
        head = node.remove(head, n)
      end
    end
  end
  return head
end

local function save_pages()
  local pagefile = tex.jobname .. "-pagelist.lua"
  local f = io.open(pagefile, "w")
  -- we will write the page list as Lua module consisting only from table
  local t = {"return {"}
  for k,v in pairs(pagelist) do
    t[#t+1] = string.format("[%s] = '%s',", v, k)
    -- print("save page", k, v)
  end
  t[#t+1] = "}"
  f:write(table.concat(t, "\n"))
  f:close()
end

M.process = process
M.pagelist = pagelist
M.save_pages = save_pages

return M

图像是使用以下代码生成的pdftoimg4ht.lua

kpse.set_program_name "luatex"
local mkutils = require "mkutils"

local M = {}
local ghostscript = "gs"
-- output file resolution
local r = 72 * 4 
-- downscalling
local DownScaleFactor = 2 
local function get_bboxes(pdfname)
  -- the bounding box info is written to stderr, so we must redirect it to a temp file
  local tmpfile = pdfname .. "-tmp"
  local cmd = string.format("%s -q -sDEVICE=bbox -dNOPAUSE -dBATCH %s 2> %s", ghostscript, pdfname, tmpfile)
  print(cmd)
  local bboxes = {}
  os.execute(cmd)
  local executed = io.open(tmpfile, "r")
  local output = executed:read("*all")
  executed:close()
  os.remove(tmpfile)
  local page = 1
  -- find bounding boxes in the tmp file
  for x, y, x1, y1 in output:gmatch("%BoundingBox:%s*(%d+)%s*(%d+)%s*(%d+)%s(%d+)") do
    print(page, x,y,x1, y1)
    bboxes[page] = {x,y, x1, y1}
    page = page + 1
  end
  return bboxes
end

local function scale_point(point)
  return math.ceil(point * (r/72))
end
-- calculate dimensions for PDF cropping from the page bounding box
local function get_gs_dimension(x,y, x1, y1)
  local width = x1 - x
  local height = y1 - y
  return x, y, width, height
end

local function get_page_dimensions(bboxes, page)
  local bbox = bboxes[page] or {}
  return get_gs_dimension(bbox[1], bbox[2], bbox[3], bbox[4])
end

local function get_gs_page_options(bboxes, page)
  local x, y, width, height = get_page_dimensions(bboxes, page)
  print(x, y, width, height)
  return string.format('-r%d -dDownScaleFactor=%d -g%dx%d -c "<</Install {-%d -%d translate}>> setpagedevice" -dFirstPage=%d -dLastPage=%d', r, DownScaleFactor, scale_point(width)+1, scale_point(height)+1, x, y, page, page)
end

local function convert_png(filename, outputfile, bboxes, page)
  local options = get_gs_page_options(bboxes, page)
  local cmd = string.format("%s -q -sDEVICE=pngalpha -o %s %s %s", ghostscript, outputfile, options, filename)
  print(cmd)
  os.execute(cmd)
end

local function convert_svg(filename, outputfile, bboxes, page)
  local tmpname = os.tmpname() .. ".pdf"
  local options = get_gs_page_options(bboxes, page)
  local cmd = string.format("%s -q -sDEVICE=pdfwrite -o %s %s %s", ghostscript, tmpname, options, filename)
  print(cmd)
  os.execute(cmd)
  local pdf2svg = string.format("pdf2svg %s %s", tmpname, outputfile)
  print(pdf2svg)
  os.execute(pdf2svg)
  os.remove(tmpname)
end

local function convert_pagelist(filename, pagelist)
  local bboxes = get_bboxes(filename)
  for page, outputfile in pairs(pagelist) do
    if outputfile:match("png$") then
      convert_png(filename, outputfile, bboxes, page)
    elseif outputfile:match("svg$") then
      convert_svg(filename, outputfile, bboxes, page)
    else 
      print("unsupported output file format: ".. outputfile)
    end
  end
end

local function run(par)
  -- get options from the extension settings
  local ext_options = mkutils.get_filter_settings "pdftoimg4ht" or {}
  ghostscript = par.ghostscript or ext_options.ghostscript or ghostscript
  r = par.r or ext_options.r or r
  DownScaleFactor = par.DownScaleFactor or ext_options.DownScaleFactor or DownScaleFactor
  local pdffile = par.input .. ".pdf"
  local pagelist_file = par.pagelist_file or ext_options.pagelist_file or par.input .. "-pagelist.lua"
  local pagelist = require(pagelist_file)
  -- print(r, DownScaleFactor, pdffile, pagelist_file)
  convert_pagelist(pdffile, pagelist)
end




M.run = run
M.get_bboxes = get_bboxes
M.convert_png = convert_png
M.convert_svg = convert_svg
M.convert_pagelist = convert_pagelist

-- local pagelist = require "sample-pagelist"
-- convert_pagelist("sample.pdf", pagelist)

return M

可以在Make:pdftoimg {}构建文件中配置一些内容,例如 Ghostscript 的输出分辨率或命令名称。例如,以下内容可能在 Windows 上有效:

Make:pdftoimg {ghostscript = "gswin32c"}

就是这样。我修改了您的示例,以便更好地说明转换:

\documentclass[11pt, oneside, onecolumn, openright, final]{article}

\usepackage{alternative4ht}
\usepackage{polyglossia}

\makeindex
\setmainlanguage[numerals=Devanagari]{hindi}
\setmainfont[Script=Devanagari, BoldFont={Sahadeva}]{Nakula}
\newfontfamily\devanagarifont[Script=Devanagari, BoldFont={Sahadeva}]{Nakula}


\begin{document}

hello world

\begin{tabular}{c | c}
  \hline
  स & hello\\
  \hline
\end{tabular}

\end{document}

结果如下:

在此处输入图片描述

顺便说一句,这证明了可以tex4ht在 PDF 模式下使用,并且可以修改 Lua 回调以直接写入 HTML 和 CSS 文件。所以这是一个相当重要的实验。我需要考虑如何将所有这些合并到和make4httex4ht

相关内容