考虑以下伪代码:
\documentclass{article}%
\usepackage{etoolbox}%
\usepackage{enumitem}%
% BASIC QUALIFICATIONS
\listadd\qualifications{A degree}%
\listadd\qualifications{Skill}%
\listadd\qualifications{Common sense}%
\listadd\qualifications{Enthusiasm}%
\begin{document}
% duties
\section*{Basic Qualifications}
% opening call for function which facilitates export of section (or subsection) contents
% <here>
% \begin{export}[include section title=true]{qualifications.txt}
The successful candidate will have the following basic qualifications:
\begin{itemize}[topsep=2mm]
\forlistloop{\item}{\qualifications}
\end{itemize}
% closing call for
% \end{export}
\end{document}
section
目标是将(或subsection
、subsubsection
等)的内容导出到.txt
无标记的文件中。任何有关合适起点的建议都将不胜感激。
答案1
TeX4ht 可以将生成的 HTML 分割为各个部分或子部分的独立 HTML 文件。然后可以使用 或其他基于文本的浏览器将这些 HTML 文件转换为 TXT w3m
。
为了自动化此操作,您可以使用以下 Lua 构建脚本:
local domfilter = require "make4ht-domfilter"
local mkutils = require "mkutils"
local process = domfilter {
function(dom)
for _, crosslinks in ipairs(dom:query_selector(".crosslinks")) do
crosslinks:remove_node()
end
return dom
end
}
Make:match("html$", process)
Make:match("html", function(filename, settings)
local output_name = filename:gsub("html$", "txt")
mkutils.execute("w3m -dump " .. filename .. " > " .. output_name)
end)
它删除 TeX4ht 自动插入的上一个和下一个文件的链接,并使用将每个 HTML 文件转换为 TXT w3m
。
使用以下方法编译您的文件:
make4ht -e build.lua main.tex "3,sec-filename"
它将生成两个 TXT 文件,“sample.txt”和“BasicQualifications.txt”。各节的文件名基于节标题。“sample.txt”仅包含目录。“BasicQualifications.txt”如下所示:
Basic Qualifications
The successful candidate will have the following basic qualifications:
* A degree
* Skill
* Common sense
* Enthusiasm