如何在 lualatex 中禁用连字符?

如何在 lualatex 中禁用连字符?

我正在使用这个答案生成一个相当复杂的文档的纯文本版本,用于拼写检查。这是我第一次尝试使用 lualatex,因此可能会出现很多问题,但大多数情况下它都能满足我的要求:

\documentclass{article}
\usepackage{luatexbase}
\usepackage{lipsum}
\usepackage{filecontents}
\usepackage{ifluatex}

\begin{filecontents*}{luaFunctions.lua}
-- clear the file
file = io.open("output.txt", "w")
file:write()

exportParagraph = false

function exportText (head)

    if exportParagraph == false then
        --if you return nil no pdf will be created
        -- return nil 
        return head        
    end

    -- open the file in append-modus
    local out = io.open("output.txt", "a")
    local wordCounter = 0

    -- loop over all hboxes in the current paragraph
    for line in node.traverse_id (node.id("hlist"), head) do

        -- loop over each element in the line
        for item in node.traverse (line.list) do
            -- check if the element is a char
            if item.id == node.id("glyph") then
                out:write(string.char(item.char))
            -- check if the element is a 'space'
            elseif item.id == node.id("glue") then
                wordCounter = wordCounter + 1
                out:write(" ")
            end
        end
        -- a newline in the file after each (tex)line
        out:write("\n")        
    end

    wordCounter = wordCounter - 1
    out:write("Words: "..wordCounter.."\n")

    -- a newline in the file after each paragraph 
    out:write("\n")  

    assert(out:close())  
    exportParagraph = false  

    --if you return nil no pdf will be created
    -- return nil
    return head    
end


function disableLigatures(head)
    -- disable ligatures
end

function SetExportParagraph(export)
    exportParagraph = export
end

luatexbase.add_to_callback("ligaturing", disableLigatures, "disableLigatures")
luatexbase.add_to_callback("post_linebreak_filter", exportText, "exportText")
\end{filecontents*}

\ifluatex
    \directlua{dofile("luaFunctions.lua")}
\fi

\def\exportParagraph{%
    \ifluatex
        \directlua{SetExportParagraph(true)}
    \fi
}

\begin{document}
\exportParagraph 
ff fi Lorem ipsum dolor sit amet, \textbf{consectetuer adipiscing elit. Ut purus elit,
vestibulum ut, placerat ac, adipiscing vitae, felis.} Curabitur dictum gravida
mauris. Nam arcu libero, nonummy eget, consectetuer id, vulputate a, magna.
Donec vehicula augue eu neque. Pellentesque habitant morbi tristique senectus
et netus et malesuada fames ac turpis egestas. Mauris ut leo. Cras viverra
metus rhoncus sem. Nulla et lectus vestibulum urna fringilla ultrices. Phasellus
eu tellus sit amet tortor gravida placerat. Integer sapien est, iaculis in, pretium
quis, viverra ac, nunc. Praesent eget sem vel leo ultrices bibendum. Aenean
faucibus. Morbi dolor nulla, malesuada eu, pulvinar at, mollis ac, nulla. Cur-
abitur auctor semper nulla. Donec varius orci eget risus. Duis nibh mi, congue
eu, accumsan eleifend, sagittis quis, diam. Duis eget orci sit amet orci dignissim
rutrumrutrumrutrumrutrumrutrumrutrumrutrumrutrumrutrumrutrumrutrumrutrumrutrumrutrum.

Nam dui ligula, fringilla a, euismod sodales, sollicitudin vel, wisi. Morbi
auctor lorem non justo. Nam lacus libero, pretium at, lobortis vitae, ultricies et,
tellus. Donec aliquet, tortor sed accumsan bibendum, erat ligula aliquet magna,
vitae ornare odio metus a mi. Morbi ac orci et nisl hendrerit mollis. Suspendisse
ut massa. Cras nec ante. Pellentesque a nulla. Cum sociis natoque penatibus et
magnis dis parturient montes, nascetur ridiculus mus. Aliquam tincidunt urna.
Nulla ullamcorper vestibulum turpis. Pellentesque cursus luctus mauris.

\exportParagraph
Nulla malesuada porttitor diam. Donec felis erat, congue non, volutpat at,
tincidunt tristique, libero. Vivamus viverra fermentum felis. Donec nonummy
pellentesque ante. Phasellus adipiscing semper elit. Proin fermentum massa
ac quam. Sed diam turpis, molestie vitae, placerat a, molestie nec, leo. Mae-
cenas lacinia. 

Nam ipsum ligula, eleifend at, accumsan nec, suscipit a, ipsum.
Morbi blandit ligula feugiat magna. Nunc eleifend consequat lorem. Sed lacinia
nulla vitae enim. Pellentesque tincidunt purus vel magna. Integer non enim.
Praesent euismod nunc eu purus. Donec bibendum quam in tellus. Nullam cur-
sus pulvinar lectus. Donec et mi. Nam vulputate metus eu enim. Vestibulum
pellentesque felis eu massa.
\end{document}

在生成的输出中,rutrumrutrumrutrumrutrumrutrumrutrumrutrumrutrumrutrumrutrumrutrumrutrumrutrumrutrum第一段末尾的无意义单词将被连字符连接:

[...]
ac, nulla. Cur- abitur auctor semper nulla. Donec varius orci eget risus. Duis 
nibh mi, congue eu, accumsan eleifend, sagittis quis, diam. Duis eget orci sit 
amet orci dignissim rutrumrutrumrutrumrutrumrutrumrutrumrutrumrutrum- 
rutrumrutrumrutrumrutrumrutrumrutrum.  
Words: 134

这种情况发生在我的整个文本中,使得文本拼写检查变得相当困难。有没有办法完全禁用这种黑客攻击的连字(我犹豫着是否称之为解决方案)?

答案1

中存在多个节点处理回调luatexpost_linebreak_filter但这些回调并不适合你的目的,因为你必须处理分成几行的节点列表。更合适的是pre_linebreak_filter,它在换行前调用。

我还在您的代码中发现了一些错误,这些错误在我尝试使用fontspec包和一些非 ascii 字符时显示出来。首先,我将发布修改后的文件:

\documentclass{article}
\usepackage{luatexbase}
\usepackage{fontspec}
%\setmainfont{TeX Gyre Schola}
\usepackage{lipsum}
\usepackage{filecontents}
\usepackage{ifluatex}

\begin{filecontents*}{luaFunctions.lua}
-- clear the file
local file = io.open("output.txt", "w")
file:write()
file:close()

local char = unicode.utf8.char
exportParagraph = false

function exportText (head, listtype)

  --[[
  -- it is better to solve this using attributes
  if exportParagraph == false then
  --if you return nil no pdf will be created
  -- return nil 
  return head        
  end --]]

  -- open the file in append-modus
  local out = io.open("output.txt", "a")
  local wordCounter = 0
  local charcount = 0
  local function traverse(h)
    local word = false
    for item in node.traverse (h) do
      local skip = node.has_attribute(item, 
      luatexbase.attributes.wordcounton) 
      if skip == 2 then
        -- check if the element is a char
        if item.id == node.id("glyph") then
          if node.is_node(item.components) then
            traverse(item.components)
          else
            out:write(char(item.char))
            charcount = charcount  + 1
            word = true
          end
        elseif 
          item.id == node.id("hlist") 
          or item.id == node.id("vlist")
          or item.id == node.id("insert")
          or item.id == node.id("adjust")
          then
            -- out:write(item.id..","..item.subtype.."[")
            traverse(item.head)
            -- out:write "]"
            -- check if the element is a 'glue'. this means not only space
          elseif item.id == node.id("glue") and item.subtype == 0 then
            -- glue nodes doesn't have to be spaces, count only after word
            if word then
              wordCounter = wordCounter  + 1
              charcount = charcount + 1
            end
            word = false
            out:write(" ")
          end
        end
      end 
      -- if word then wordCounter = wordCounter + 1 end
    end

    -- loop over all hboxes in the current paragraph
    --for line in node.traverse_id (node.id("hlist"), head) do
    -- loop over each element in the line
    traverse(head)
    -- a newline in the file after each (tex)line
    out:write("\n")        
    --end

    -- wordCounter = wordCounter - 1
    out:write("Words: "..wordCounter)
    out:write(", characters: "..charcount)
    out:write(", list type: "..listtype.."\n")

    -- a newline in the file after each paragraph 
    out:write("\n")  

    assert(out:close())  
    --exportParagraph = false  

    --if you return nil no pdf will be created
    -- return nil
    return head    
  end


  function disableLigatures(head)
    -- disable ligatures
  end

  function SetExportParagraph(export)
    exportParagraph = export
  end

  luatexbase.add_to_callback("ligaturing", disableLigatures, "disableLigatures")
  luatexbase.add_to_callback("pre_linebreak_filter", exportText, "exportText")
\end{filecontents*}

\ifluatex
    \newluatexattribute\wordcounton
    \directlua{dofile("luaFunctions.lua")}
\fi

\def\startExportParagraph{%
    \ifluatex
      \wordcounton = 2
        %\directlua{SetExportParagraph(true)}
    \fi
}

\def\stopExportParagraph{%
    \ifluatex
      \wordcounton = 1
    \fi
}


\begin{document}
\startExportParagraph 
\noindent
ff fi Lorem ipsum dolor sit amet, příliš žluťoučký text s diakritikou 
dash\footnote{you should test some options}  -- \hbox{how does that work?} 
\textbf{consectetuer adipiscing elit. Ut purus elit,
vestibulum ut, placerat ac, adipiscing vitae, felis.} Curabitur dictum gravida
mauris. Nam arcu libero, nonummy eget, consectetuer id, vulputate a, magna.
Donec vehicula augue eu neque. Pellentesque habitant morbi tristique senectus
et netus et malesuada fames ac turpis egestas. Mauris ut leo. Cras viverra
metus rhoncus sem. Nulla et lectus vestibulum urna fringilla ultrices. Phasellus
eu tellus sit amet tortor gravida placerat. Integer sapien est, iaculis in, pretium
quis, viverra ac, nunc. Praesent eget sem vel leo ultrices bibendum. Aenean
faucibus. Morbi dolor nulla, malesuada eu, pulvinar at, mollis ac, nulla. Curabitur auctor semper nulla. Donec varius orci eget risus. Duis nibh mi, congue
eu, accumsan eleifend, sagittis quis, diam. Duis eget orci sit amet orci dignissim
rutrumrutrumrutrumrutrumrutrumrutrumrutrumrutrumrutrumrutrumrutrumrutrumrutrumrutrum.

\begin{tabular}{ll}
        what & about\\
        tables&?
\end{tabular}

\begin{itemize}
        \item you also want to save itemize
        \item items
\end{itemize}

You can \stopExportParagraph stop word countinh in the middle of \startExportParagraph the paragraph.

Nam dui ligula, fringilla a, euismod sodales, sollicitudin vel, wisi. Morbi
auctor lorem non justo. Nam lacus libero, pretium at, lobortis vitae, ultricies et,
tellus. Donec aliquet, tortor sed accumsan bibendum, erat ligula aliquet magna,
vitae ornare odio metus a mi. Morbi ac orci et nisl hendrerit mollis. Suspendisse
ut massa. Cras nec ante. Pellentesque a nulla. Cum sociis natoque penatibus et
magnis dis parturient montes, nascetur ridiculus mus. Aliquam tincidunt urna.
Nulla ullamcorper vestibulum turpis. Pellentesque cursus luctus mauris.

\stopExportParagraph
Nulla malesuada porttitor diam. Donec felis erat, congue non, volutpat at,
tincidunt tristique, libero. Vivamus viverra fermentum felis. Donec nonummy
pellentesque ante. Phasellus adipiscing semper elit. Proin fermentum massa
ac quam. Sed diam turpis, molestie vitae, placerat a, molestie nec, leo. Mae-
cenas lacinia. 

Nam ipsum ligula, eleifend at, accumsan nec, suscipit a, ipsum.
Morbi blandit ligula feugiat magna. Nunc eleifend consequat lorem. Sed lacinia
nulla vitae enim. Pellentesque tincidunt purus vel magna. Integer non enim.
Praesent euismod nunc eu purus. Donec bibendum quam in tellus. Nullam cur-
sus pulvinar lectus. Donec et mi. Nam vulputate metus eu enim. Vestibulum
pellentesque felis eu massa.
\end{document}

使用了全局变量file,与 中的某些变量发生干扰fontspec。所有私有变量都应该local!。file也没有关闭。

在处理unicode字符时,我们不能使用string.char函数,但必须使用unicode.utf8.char

然后我将节点遍历循环重写为递归函数,因为节点列表中可能出现子列表,我们也必须处理它们。参见traverse函数。

修改了文档接口,引入了两个宏:startExportParagraphstopExportParagraphluatex使用节点属性机制,可以更灵活地切换计数,即使在段落中间也可以。还增加了字符计数。

我添加了一些测试用例:

ff fi Lorem ipsum dolor sit amet, příliš žluťoučký text s diakritikou 
dash\footnote{you should test some options}  -- \hbox{how does that work?} 
\textbf{consectetuer adipiscing elit. Ut purus elit,...

\begin{tabular}{ll}
        what & about\\
        tables&?
\end{tabular}

\begin{itemize}
        \item you also want to save itemize
        \item items
\end{itemize}

You can \stopExportParagraph stop word countinh in the middle of \startExportParagraph the paragraph.

保存为output.txt

1you should test some options
Words: 4, characters: 29, list type: insert

ff fi Lorem ipsum dolor sit amet, příliš žluťoučký text s diakritikou dash1 -- how does that work? consectetuer adipiscing elit. 

  what    about    tables    ?
Words: 4, characters: 20, list type:

     • you also want to save itemize
Words: 7, characters: 30, list type:

     • items
Words: 2, characters: 6, list type:

You can the paragraph.
Words: 4, characters: 22, list type:

如您所见,脚注会生成自己的段落,该段落显示在它们所在的段落之前。连字被拆分为 int 部分,因此ffifi被正确计算。但这也导致破折号被拆分为--。 itemize 环境中的项目符号被计为单词,我必须研究如何解决这个问题。此外,字符数统计也是错误的。

答案2

如果你不是在寻找 lua 解决方案(毫无疑问这是可能的),你可以使用经典的 Tex 版本

 \begin{document}\language-1

将关闭连字

相关内容