使用 LuaTeX 自动将 s 替换为 ſ

Question

它留下了一些边缘情况，但你可以从

\fontfam[LM Fonts]

\directlua{
--[[ A list of characters which indicate a word break ]]
local non_word_chars = {
  [utf8.codepoint'.'] = true,
  [utf8.codepoint','] = true,
  [utf8.codepoint';'] = true,
  --[[ Add more here as appropriate ]]
}
--[[ A list of node types which indicate a word break ]]
local non_word_ids = {
  [node.id'glue'] = true,
  [node.id'rule'] = true,
  --[[ Add more here as appropriate ]]
}
local s = utf8.codepoint's'
local long_s = utf8.codepoint'ſ'
function replace_s_with_long_s(head)
  local after_s = false
  for n in node.traverse(head) do
    local char, id = node.is_char(n)
    if char == s then
      local after = n.next
      local after_char, after_id = node.is_char(n.next)
      local is_end_of_word = after == nil or non_word_chars[after_char] or non_word_ids[after_id]
      if not (after_s or is_end_of_word) then
        n.char = long_s
      end
      after_s = true
    elseif char or non_word_ids[id] then
      after_s = false
    end
  end
  return true
end
callback.add_to_callback("pre_shaping_filter", replace_s_with_long_s)}

\lipsum[1]

\bye

基本思想：避免process_input_buffer，否则您将转换输入而不是排版文本，从而导致与类似的问题。相反，您可以使用在字体处理完成之前直接运行的回调来\lipſum迭代节点列表。然后，您只需找到将字段设置为 Unicode 代码点的字符节点。您可以跟踪是否已经在使用简单布尔值进行替换之后，但识别工作的结束更加棘手：此代码使用以下节点类型或字符的简单启发式方法，但列表可能应该扩展，并且根据语言的不同，单词的结尾可能会更复杂。pre_shaping_filter.chars

剩余的三个问题：

单词的结尾是什么？https://www.unicode.org/reports/tr29/#Word_Boundaries是 Unicode 中确定该问题的标准算法，但具体实现起来有些复杂。
组合标记应如何处理？例如，ś也应该替换。目前，这取决于输入方式（预组合或分解），但它们可能应该以一致的方式处理。
s显式的内部内容\discretionary将被忽略。

Answer 1

它留下了一些边缘情况，但你可以从

\fontfam[LM Fonts]

\directlua{
--[[ A list of characters which indicate a word break ]]
local non_word_chars = {
  [utf8.codepoint'.'] = true,
  [utf8.codepoint','] = true,
  [utf8.codepoint';'] = true,
  --[[ Add more here as appropriate ]]
}
--[[ A list of node types which indicate a word break ]]
local non_word_ids = {
  [node.id'glue'] = true,
  [node.id'rule'] = true,
  --[[ Add more here as appropriate ]]
}
local s = utf8.codepoint's'
local long_s = utf8.codepoint'ſ'
function replace_s_with_long_s(head)
  local after_s = false
  for n in node.traverse(head) do
    local char, id = node.is_char(n)
    if char == s then
      local after = n.next
      local after_char, after_id = node.is_char(n.next)
      local is_end_of_word = after == nil or non_word_chars[after_char] or non_word_ids[after_id]
      if not (after_s or is_end_of_word) then
        n.char = long_s
      end
      after_s = true
    elseif char or non_word_ids[id] then
      after_s = false
    end
  end
  return true
end
callback.add_to_callback("pre_shaping_filter", replace_s_with_long_s)}

\lipsum[1]

\bye

基本思想：避免process_input_buffer，否则您将转换输入而不是排版文本，从而导致与类似的问题。相反，您可以使用在字体处理完成之前直接运行的回调来\lipſum迭代节点列表。然后，您只需找到将字段设置为 Unicode 代码点的字符节点。您可以跟踪是否已经在使用简单布尔值进行替换之后，但识别工作的结束更加棘手：此代码使用以下节点类型或字符的简单启发式方法，但列表可能应该扩展，并且根据语言的不同，单词的结尾可能会更复杂。pre_shaping_filter.chars

剩余的三个问题：

单词的结尾是什么？https://www.unicode.org/reports/tr29/#Word_Boundaries是 Unicode 中确定该问题的标准算法，但具体实现起来有些复杂。
组合标记应如何处理？例如，ś也应该替换。目前，这取决于输入方式（预组合或分解），但它们可能应该以一致的方式处理。
s显式的内部内容\discretionary将被忽略。

使用 LuaTeX 自动将 s 替换为 ſ

答案1

相关内容