LuaLatex：如何从段落获取 ascii 文本，更改它并再次渲染

Question 1

概念证明：

\documentclass{article}
\usepackage{luacode}
\usepackage{libertine}
\begin{document}
\begin{luacode*}
function count_lines (head)
  local linecount = 0
  while head do
    if head.id == 0 then linecount = linecount + 1 end
    head = head.next
  end
  return linecount
end

function mknodes( text )
  local current_font = font.current()
  local font_parameters = font.getfont(current_font).parameters
  local n, head, last
  -- we should insert the paragraph indentation at the beginning
  head = node.new("glue")
  head.spec = node.new("glue_spec")
  head.spec.width = 20 * 2^16
  last = head

  for s in string.utfvalues( text ) do
    local char = unicode.utf8.char(s)
    if unicode.utf8.match(char,"%s") then
      -- its a space
      n = node.new("glue")
      n.spec = node.new("glue_spec")
      n.spec.width   = font_parameters.space
      n.spec.shrink  = font_parameters.space_shrink
      n.spec.stretch = font_parameters.space_stretch
    else -- a glyph
      n = node.new("glyph")
      n.font = current_font
      n.subtype = 1
      n.char = s
      n.lang = tex.language
      n.uchyph = 1
      n.left = tex.lefthyphenmin
      n.right = tex.righthyphenmin
    end

    last.next = n
    last = n
  end

  -- now add the final parts: a penalty and the parfillskip glue
  local penalty = node.new("penalty")
  penalty.penalty = 10000

  local parfillskip = node.new("glue")
  parfillskip.spec = node.new("glue_spec")
  parfillskip.spec.stretch = 2^16
  parfillskip.spec.stretch_order = 2

  last.next = penalty
  penalty.next = parfillskip

  -- just to create the prev pointers for tex.linebreak
  node.slide(head)
  return head
end

local txt = "A wonderful serenity has taken possession of my entire soul, like these sweet mornings of spring which I enjoy with my whole heart. I am alone, and feel the charm of existence in this spot, which was created for the bliss of souls like mine."

tex.baselineskip = node.new("glue_spec")
tex.baselineskip.width = 14 * 2^16

local head = mknodes(txt)
lang.hyphenate(head)
head = node.kerning(head)
head = node.ligaturing(head)

local vbox
local size = 90
lines = 0
lines_goal = 6

while lines < lines_goal do
  texio.write_nl(string.format("Formatting text to %d mm",size))
  local copy_of_head = node.copy_list(head)
  vbox = tex.linebreak(copy_of_head,{ hsize = tex.sp(string.format("%dmm",size))})
  size = size - 10
  lines = count_lines(vbox)
  texio.write_nl(string.format("lines=%d",lines))
end

node.write(vbox)

\end{luacode*}
\end{document}

这会在 Lua 端创建一个段落，并重新排版直到它有 6 行（或更长）。每次迭代时，它都会将 hsize 从 90mm 减少 10mm。

Answer

概念证明：

\documentclass{article}
\usepackage{luacode}
\usepackage{libertine}
\begin{document}
\begin{luacode*}
function count_lines (head)
  local linecount = 0
  while head do
    if head.id == 0 then linecount = linecount + 1 end
    head = head.next
  end
  return linecount
end

function mknodes( text )
  local current_font = font.current()
  local font_parameters = font.getfont(current_font).parameters
  local n, head, last
  -- we should insert the paragraph indentation at the beginning
  head = node.new("glue")
  head.spec = node.new("glue_spec")
  head.spec.width = 20 * 2^16
  last = head

  for s in string.utfvalues( text ) do
    local char = unicode.utf8.char(s)
    if unicode.utf8.match(char,"%s") then
      -- its a space
      n = node.new("glue")
      n.spec = node.new("glue_spec")
      n.spec.width   = font_parameters.space
      n.spec.shrink  = font_parameters.space_shrink
      n.spec.stretch = font_parameters.space_stretch
    else -- a glyph
      n = node.new("glyph")
      n.font = current_font
      n.subtype = 1
      n.char = s
      n.lang = tex.language
      n.uchyph = 1
      n.left = tex.lefthyphenmin
      n.right = tex.righthyphenmin
    end

    last.next = n
    last = n
  end

  -- now add the final parts: a penalty and the parfillskip glue
  local penalty = node.new("penalty")
  penalty.penalty = 10000

  local parfillskip = node.new("glue")
  parfillskip.spec = node.new("glue_spec")
  parfillskip.spec.stretch = 2^16
  parfillskip.spec.stretch_order = 2

  last.next = penalty
  penalty.next = parfillskip

  -- just to create the prev pointers for tex.linebreak
  node.slide(head)
  return head
end

local txt = "A wonderful serenity has taken possession of my entire soul, like these sweet mornings of spring which I enjoy with my whole heart. I am alone, and feel the charm of existence in this spot, which was created for the bliss of souls like mine."

tex.baselineskip = node.new("glue_spec")
tex.baselineskip.width = 14 * 2^16

local head = mknodes(txt)
lang.hyphenate(head)
head = node.kerning(head)
head = node.ligaturing(head)

local vbox
local size = 90
lines = 0
lines_goal = 6

while lines < lines_goal do
  texio.write_nl(string.format("Formatting text to %d mm",size))
  local copy_of_head = node.copy_list(head)
  vbox = tex.linebreak(copy_of_head,{ hsize = tex.sp(string.format("%dmm",size))})
  size = size - 10
  lines = count_lines(vbox)
  texio.write_nl(string.format("lines=%d",lines))
end

node.write(vbox)

\end{luacode*}
\end{document}

这会在 Lua 端创建一个段落，并重新排版直到它有 6 行（或更长）。每次迭代时，它都会将 hsize 从 90mm 减少 10mm。

Question 2

这是一个小型上下文模块，它将字符和空格写入文本文件。首先，Lua 代码位于一个单独的文件中charsperline.lua。它包含一个简单的回调（或上下文术语中的“节点终结器”），用于浏览构成段落的 hlist 以查找字符和单词间粘合。

thirddata                  = thirddata or { }
thirddata.chars_per_line   = thirddata.chars_per_line or { }

local stringformat         = string.format
local tableconcat          = table.concat
local utfchar              = utf.char

local traverse_nodetype    = node.traverse_id
local traverse_nodelist    = node.traverse

local nodecodes            = nodes.nodecodes
local listcodes            = nodes.listcodes
local skipcodes            = nodes.skipcodes

local hlist_t              = nodecodes.hlist
local vlist_t              = nodecodes.vlist
local glue_t               = nodecodes.glue
local glyph_t              = nodecodes.glyph
local line_t               = listcodes.line
local userskip_t           = skipcodes.userskip

local tasks                = nodes.tasks
local enableaction         = tasks.enableaction
local disableaction        = tasks.disableaction

local linedata             = { }

local resolve_ligatures
resolve_ligatures = function (lst, hd)
  for n in traverse_nodetype (glyph_t, hd) do
    local components = n.components
    if components then
      lst = resolve_ligatures (lst, components)
    else
      lst[#lst+1] = utfchar (n.char)
    end
  end
  return lst
end

local collect = function (hd, groupcode)
  if groupcode == "vbox" then
    return hd
  end

  for current in traverse_nodetype (hlist_t, hd) do

    if current.subtype == line_t then
      local chars, has_glyphs = { }, false

      for n in traverse_nodelist (current.list) do

        local ntype, nsubtype = n.id, n.subtype

        -- we care only for glyphs’n’glue
        if ntype == glyph_t then
          has_glyphs = true
          if n.components then
            chars = resolve_ligatures (chars, n.components)
          else
            chars[#chars+1] = utfchar (n.char)
          end
        elseif ntype == glue_t and nsubtype == userskip_t then
          chars[#chars+1] = " "
        end

      end

      if has_glyphs then
        linedata[#linedata+1] = chars
      end
    end
  end

  return hd
end

thirddata.chars_per_line.collect = collect

tasks.appendaction ("finalizers", "before",
                    "thirddata.chars_per_line.collect")
tasks.disableaction("finalizers",
                    "thirddata.chars_per_line.count_words")

local write_stats = function (...) texiowrite_nl(stringformat(...)) end

local datafile = "./linedata.txt"

write_linedata = function (filename)
  filename = filename or datafile

  local result = { }
  for i = 1, #linedata do local line = linedata[i]
    result[#result+1] = stringformat ("%q,%d",
                                      tableconcat (line),
                                      #line)
  end

  io.savedata (filename, result, "\n")
end

local active --- callback state

commands.start_chars_per_line = function ()
  if not active then
    enableaction("finalizers",
                 "thirddata.chars_per_line.count_words")
    active = true
  end
end

commands.stop_chars_per_line = function ()
  if active then
    disableaction("finalizers",
                  "thirddata.chars_per_line.count_words")
    active = false
  end
end

commands.write_linedata = write_linedata

用户界面在模块中定义t-charsperline.mkvi。除了通常的\start.../环境之外，它还在 TeX 运行结束时\stop...设置对的调用。write_linedata()

\startmodule [charsperline]

\unprotect

\ctxloadluafile{charsperline}

\def\startdumplines{\ctxcommand{start_chars_per_line ()}}

\def\stopdumplines{\endgraf\ctxcommand{stop_chars_per_line ()}}

\prependtoks \charsperline_dump \to \everystoptext

\def\charsperline_dump{\ctxcommand{write_linedata ()}}

\protect

\stopmodule \endinput

现在您可以通过加载模块在常规文档中使用宏\startdumplines/ ：\stopdumplines

\usemodule[charsperline]
\setuplayout[width=5cm]

\starttext
  \startdumplines
    \input knuth
  \stopdumplines
\stoptext

输出将写入linedata.txt当前目录中的文件。每行的架构为 CSV 格式"<line content>",<character count>：：

"Thus, I came to the conclu-",27
"sion that the designer of a",27
"new system must not only",24
"be the implementer and first",28
"large  scale user; the designer",31
"should also write the first user",32
"manual.",7
"The separation of any of these",30
"four components would have",26

链接到要点。

顺便说一句，您是否获得 ASCII 输出取决于文档包含的字形；如果您想要一个严格的解决方案来删除所有不在 ASCII 范围内的代码点，请告诉我。

Answer