提取 .tex 文件的内容

提取 .tex 文件的内容

是否有任何 GUI 快速方法可以提取.tex文件内容?(删除所有标签和元数据)

TeX 编辑器中哪一个支持将内容导出到纯.txt文件?

(我找到了一些命令行工具和一些 RegExp 解决方案,但使用它们并不那么方便。这似乎是一个基本需求,我想知道为什么没有已知的解决方案。)

答案1

我使用以下方法从文件中提取不同的段落,tex将其写入单独的文本文件(并进行字数统计)。使用LuaLaTeX它很容易在TeX完成创建段落的所有工作后设置一个钩子(这意味着在展开所有宏、计算换行符并创建整个段落之后)。在钩子的回调函数中,现在可以将结果 pdf 中出现的所有单词写入单独的文本文件中。对于纯文本,它工作得很好,但我不知道它如何处理特殊字符、表格、脚注……也许其中一位LaTeX专家LuaLaTeX可以检查并改进它。通过禁用连字,它为我完成了工作。

该方法仅适用于LuaLaTeX,但我使用开关 ( \ifluatex) 来激活该功能。当我需要我的 pdf 时,我会使用 编译文档pdfLaTeX,当我需要单独的文本文件中提取的段落时,我会使用 编译它LuaLaTeX

\documentclass{article}
\usepackage{lipsum}
\usepackage{filecontents}
\usepackage{ifluatex}

\begin{filecontents*}{luaFunctions.lua}
-- clear the file
file = io.open("output.txt", "w")
file:write()

exportParagraph = false

function exportText (head)

    if exportParagraph == false then
        --if you return nil no pdf will be created
        -- return nil 
        return head        
    end

    -- open the file in append-modus
    local out = io.open("output.txt", "a")
    local wordCounter = 0

    -- loop over all hboxes in the current paragraph
    for line in node.traverse_id (node.id("hlist"), head) do

        -- loop over each element in the line
        for item in node.traverse (line.list) do
            -- check if the element is a char
            if item.id == node.id("glyph") then
                out:write(string.char(item.char))
            -- check if the element is a 'space'
            elseif item.id == node.id("glue") then
                wordCounter = wordCounter + 1
                out:write(" ")
            end
        end
        -- a newline in the file after each (tex)line
        out:write("\n")        
    end

    wordCounter = wordCounter - 1
    out:write("Words: "..wordCounter.."\n")

    -- a newline in the file after each paragraph 
    out:write("\n")  

    assert(out:close())  
    exportParagraph = false  

    --if you return nil no pdf will be created
    -- return nil
    return head    
end


function disableLigatures(head)
    -- disable ligatures
end

function SetExportParagraph(export)
    exportParagraph = export
end

callback.register("ligaturing", disableLigatures)
callback.register("post_linebreak_filter", exportText)
\end{filecontents*}

\ifluatex
    \directlua{dofile("luaFunctions.lua")}
\fi

\def\exportParagraph{%
    \ifluatex
        \directlua{SetExportParagraph(true)}
    \fi
}

\begin{document}
\exportParagraph 
ff fi Lorem ipsum dolor sit amet, \textbf{consectetuer adipiscing elit. Ut purus elit,
vestibulum ut, placerat ac, adipiscing vitae, felis.} Curabitur dictum gravida
mauris. Nam arcu libero, nonummy eget, consectetuer id, vulputate a, magna.
Donec vehicula augue eu neque. Pellentesque habitant morbi tristique senectus
et netus et malesuada fames ac turpis egestas. Mauris ut leo. Cras viverra
metus rhoncus sem. Nulla et lectus vestibulum urna fringilla ultrices. Phasellus
eu tellus sit amet tortor gravida placerat. Integer sapien est, iaculis in, pretium
quis, viverra ac, nunc. Praesent eget sem vel leo ultrices bibendum. Aenean
faucibus. Morbi dolor nulla, malesuada eu, pulvinar at, mollis ac, nulla. Cur-
abitur auctor semper nulla. Donec varius orci eget risus. Duis nibh mi, congue
eu, accumsan eleifend, sagittis quis, diam. Duis eget orci sit amet orci dignissim
rutrum.

Nam dui ligula, fringilla a, euismod sodales, sollicitudin vel, wisi. Morbi
auctor lorem non justo. Nam lacus libero, pretium at, lobortis vitae, ultricies et,
tellus. Donec aliquet, tortor sed accumsan bibendum, erat ligula aliquet magna,
vitae ornare odio metus a mi. Morbi ac orci et nisl hendrerit mollis. Suspendisse
ut massa. Cras nec ante. Pellentesque a nulla. Cum sociis natoque penatibus et
magnis dis parturient montes, nascetur ridiculus mus. Aliquam tincidunt urna.
Nulla ullamcorper vestibulum turpis. Pellentesque cursus luctus mauris.

\exportParagraph
Nulla malesuada porttitor diam. Donec felis erat, congue non, volutpat at,
tincidunt tristique, libero. Vivamus viverra fermentum felis. Donec nonummy
pellentesque ante. Phasellus adipiscing semper elit. Proin fermentum massa
ac quam. Sed diam turpis, molestie vitae, placerat a, molestie nec, leo. Mae-
cenas lacinia. 

Nam ipsum ligula, eleifend at, accumsan nec, suscipit a, ipsum.
Morbi blandit ligula feugiat magna. Nunc eleifend consequat lorem. Sed lacinia
nulla vitae enim. Pellentesque tincidunt purus vel magna. Integer non enim.
Praesent euismod nunc eu purus. Donec bibendum quam in tellus. Nullam cur-
sus pulvinar lectus. Donec et mi. Nam vulputate metus eu enim. Vestibulum
pellentesque felis eu massa.
\end{document}

答案2

我发现图形用户界面作者:汤姆·福特。

不幸的是,它的 GUI 版本在我的 Windows 7 64b 上生成了错误,并且它的命令行版本无法处理包含的文档。

在此处输入图片描述

相关内容