是否有任何 GUI 快速方法可以提取.tex
文件内容?(删除所有标签和元数据)
TeX 编辑器中哪一个支持将内容导出到纯.txt
文件?
(我找到了一些命令行工具和一些 RegExp 解决方案,但使用它们并不那么方便。这似乎是一个基本需求,我想知道为什么没有已知的解决方案。)
答案1
我使用以下方法从文件中提取不同的段落,tex
将其写入单独的文本文件(并进行字数统计)。使用LuaLaTeX
它很容易在TeX
完成创建段落的所有工作后设置一个钩子(这意味着在展开所有宏、计算换行符并创建整个段落之后)。在钩子的回调函数中,现在可以将结果 pdf 中出现的所有单词写入单独的文本文件中。对于纯文本,它工作得很好,但我不知道它如何处理特殊字符、表格、脚注……也许其中一位LaTeX
专家LuaLaTeX
可以检查并改进它。通过禁用连字,它为我完成了工作。
该方法仅适用于LuaLaTeX
,但我使用开关 ( \ifluatex
) 来激活该功能。当我需要我的 pdf 时,我会使用 编译文档pdfLaTeX
,当我需要单独的文本文件中提取的段落时,我会使用 编译它LuaLaTeX
。
\documentclass{article}
\usepackage{lipsum}
\usepackage{filecontents}
\usepackage{ifluatex}
\begin{filecontents*}{luaFunctions.lua}
-- clear the file
file = io.open("output.txt", "w")
file:write()
exportParagraph = false
function exportText (head)
if exportParagraph == false then
--if you return nil no pdf will be created
-- return nil
return head
end
-- open the file in append-modus
local out = io.open("output.txt", "a")
local wordCounter = 0
-- loop over all hboxes in the current paragraph
for line in node.traverse_id (node.id("hlist"), head) do
-- loop over each element in the line
for item in node.traverse (line.list) do
-- check if the element is a char
if item.id == node.id("glyph") then
out:write(string.char(item.char))
-- check if the element is a 'space'
elseif item.id == node.id("glue") then
wordCounter = wordCounter + 1
out:write(" ")
end
end
-- a newline in the file after each (tex)line
out:write("\n")
end
wordCounter = wordCounter - 1
out:write("Words: "..wordCounter.."\n")
-- a newline in the file after each paragraph
out:write("\n")
assert(out:close())
exportParagraph = false
--if you return nil no pdf will be created
-- return nil
return head
end
function disableLigatures(head)
-- disable ligatures
end
function SetExportParagraph(export)
exportParagraph = export
end
callback.register("ligaturing", disableLigatures)
callback.register("post_linebreak_filter", exportText)
\end{filecontents*}
\ifluatex
\directlua{dofile("luaFunctions.lua")}
\fi
\def\exportParagraph{%
\ifluatex
\directlua{SetExportParagraph(true)}
\fi
}
\begin{document}
\exportParagraph
ff fi Lorem ipsum dolor sit amet, \textbf{consectetuer adipiscing elit. Ut purus elit,
vestibulum ut, placerat ac, adipiscing vitae, felis.} Curabitur dictum gravida
mauris. Nam arcu libero, nonummy eget, consectetuer id, vulputate a, magna.
Donec vehicula augue eu neque. Pellentesque habitant morbi tristique senectus
et netus et malesuada fames ac turpis egestas. Mauris ut leo. Cras viverra
metus rhoncus sem. Nulla et lectus vestibulum urna fringilla ultrices. Phasellus
eu tellus sit amet tortor gravida placerat. Integer sapien est, iaculis in, pretium
quis, viverra ac, nunc. Praesent eget sem vel leo ultrices bibendum. Aenean
faucibus. Morbi dolor nulla, malesuada eu, pulvinar at, mollis ac, nulla. Cur-
abitur auctor semper nulla. Donec varius orci eget risus. Duis nibh mi, congue
eu, accumsan eleifend, sagittis quis, diam. Duis eget orci sit amet orci dignissim
rutrum.
Nam dui ligula, fringilla a, euismod sodales, sollicitudin vel, wisi. Morbi
auctor lorem non justo. Nam lacus libero, pretium at, lobortis vitae, ultricies et,
tellus. Donec aliquet, tortor sed accumsan bibendum, erat ligula aliquet magna,
vitae ornare odio metus a mi. Morbi ac orci et nisl hendrerit mollis. Suspendisse
ut massa. Cras nec ante. Pellentesque a nulla. Cum sociis natoque penatibus et
magnis dis parturient montes, nascetur ridiculus mus. Aliquam tincidunt urna.
Nulla ullamcorper vestibulum turpis. Pellentesque cursus luctus mauris.
\exportParagraph
Nulla malesuada porttitor diam. Donec felis erat, congue non, volutpat at,
tincidunt tristique, libero. Vivamus viverra fermentum felis. Donec nonummy
pellentesque ante. Phasellus adipiscing semper elit. Proin fermentum massa
ac quam. Sed diam turpis, molestie vitae, placerat a, molestie nec, leo. Mae-
cenas lacinia.
Nam ipsum ligula, eleifend at, accumsan nec, suscipit a, ipsum.
Morbi blandit ligula feugiat magna. Nunc eleifend consequat lorem. Sed lacinia
nulla vitae enim. Pellentesque tincidunt purus vel magna. Integer non enim.
Praesent euismod nunc eu purus. Donec bibendum quam in tellus. Nullam cur-
sus pulvinar lectus. Donec et mi. Nam vulputate metus eu enim. Vestibulum
pellentesque felis eu massa.
\end{document}
答案2
我发现图形用户界面作者:汤姆·福特。
不幸的是,它的 GUI 版本在我的 Windows 7 64b 上生成了错误,并且它的命令行版本无法处理包含的文档。