将部分内容导出为 .txt

将部分内容导出为 .txt

考虑以下伪代码:

\documentclass{article}%

\usepackage{etoolbox}%
\usepackage{enumitem}%

% BASIC QUALIFICATIONS
\listadd\qualifications{A degree}%
\listadd\qualifications{Skill}%
\listadd\qualifications{Common sense}%
\listadd\qualifications{Enthusiasm}%

\begin{document}

% duties
\section*{Basic Qualifications}

% opening call for function which facilitates export of section (or subsection) contents
% <here>
% \begin{export}[include section title=true]{qualifications.txt}


The successful candidate will have the following basic qualifications:

\begin{itemize}[topsep=2mm]
\forlistloop{\item}{\qualifications}
\end{itemize}

% closing call for
% \end{export}

\end{document}

section目标是将(或subsectionsubsubsection等)的内容导出到.txt无标记的文件中。任何有关合适起点的建议都将不胜感激。

答案1

TeX4ht 可以将生成的 HTML 分割为各个部分或子部分的独立 HTML 文件。然后可以使用 或其他基于文本的浏览器将这些 HTML 文件转换为 TXT w3m

为了自动化此操作,您可以使用以下 Lua 构建脚本:

local domfilter = require "make4ht-domfilter"
local mkutils = require "mkutils"
local process = domfilter {
function(dom)
  for _, crosslinks in ipairs(dom:query_selector(".crosslinks")) do
    crosslinks:remove_node()
  end
  return dom
end
}

Make:match("html$", process)
Make:match("html", function(filename, settings)
  local output_name = filename:gsub("html$", "txt")
  mkutils.execute("w3m -dump " .. filename .. " > " .. output_name)
end)

它删除 TeX4ht 自动插入的上一个和下一个文件的链接,并使用将每个 HTML 文件转换为 TXT w3m

使用以下方法编译您的文件:

make4ht -e build.lua main.tex "3,sec-filename"

它将生成两个 TXT 文件,“sample.txt”和“BasicQualifications.txt”。各节的文件名基于节标题。“sample.txt”仅包含目录。“BasicQualifications.txt”如下所示:

Basic Qualifications

The successful candidate will have the following basic qualifications:

  * A degree
  * Skill
  * Common sense
  * Enthusiasm

相关内容