我怎样才能生成PDF和带有单个.tex
文件的文本版本?文本版本要求如下所述。
输入
例如,给定此.tex
文件(texlive.net PDF 在线生成器):
\documentclass[11pt,a4paper]{article}
% I doubt this will affect the PDF to text solution,
% but I've included it to make it as similar as possible to my real document.
\usepackage[
paperheight=11.00in,
paperwidth=8.50in,
margin=1.00in,
top=1.00in,
left=1.00in,
bottom=1.00in
]{geometry}
\usepackage[hidelinks]{hyperref}
% This part is \input from another file.
% Included inline for your convenience.
\hypersetup{
pdfinfo={
Author={tfstwbbnb},
}
}
\newcommand{\authorName}{tfstwbbnb}
% xelatex required, pdflatex does not work
% \setmainfont{Ubuntu Light}[
% ItalicFont=Ubuntu Light Italic,
% BoldFont=Ubuntu,
% BoldItalicFont=Ubuntu Italic,
% ]
\setlength\parindent{0pt}
\pagenumbering{gobble}
\usepackage{xcolor}
\newcommand{\gray}[1]{\textcolor{gray}{#1}}
\usepackage{setspace}
\setstretch{1.10}
% https://tex.stackexchange.com/a/50510
\newcommand{\fitline}[1]{\makebox[\linewidth][s]{#1}}
\newcommand{\myInnerSpacing}{0.40\baselineskip}
\hypersetup{
pdfinfo={
Title={tfstwbbnb demo},
}
}
\newcommand{\optionalOne}{optionalOne}
\newcommand{\optionalTwo}{optionalTwo}
% Links should appear as link text ("requiredOne") in text version.
\newcommand{\requiredOne}{\href{mailto:[email protected]}{requiredOne}}
\newcommand{\requiredTwo}{requiredTwo}
\begin{document}
% Alignment in text version does not matter to me. Can be left-justified or centered.
\begin{center}
\LARGE{\textbf{Title}}
\end{center}
\vspace{\myInnerSpacing}
\optionalOne \\
% Optionals might be commented out like so:
% \optionalTwo \\
\requiredOne \\
\gray{\requiredTwo} \\
\vspace{\myInnerSpacing}
% Text formmatting should be stripped in text version.
\textbf{Lorem ipsum dolor sit amet}, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. \\
Purus semper eget duis at tellus at. Tellus cras adipiscing enim eu turpis egestas pretium aenean. \\
Felis donec, \\
tfstwbbnb
\end{document}
预期的
我怎样才能将其输出(作为纯文本):
Title
optionalOne
requiredOne
requiredTwo
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
Purus semper eget duis at tellus at. Tellus cras adipiscing enim eu turpis egestas pretium aenean.
Felis donec,
tfstwbbnb
复制和粘贴
在查看器中打开 PDF 并复制/粘贴可得到:
TitleoptionalOnerequiredOnerequiredTwoLorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt utlabore et dolore magna aliqua.Purus semper eget duis at tellus at. Tellus cras adipiscing enim eu turpis egestas pretium aenean.Felis donec,tfstwbbnb
pdftotext
pdftotext
给出了更好的结果,但仍然不是我想要的(缺少换行符,换行符太多,0x0c
末尾有多余的字符):
Title
optionalOne
requiredOne
requiredTwo
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut
labore et dolore magna aliqua.
Purus semper eget duis at tellus at. Tellus cras adipiscing enim eu turpis egestas pretium aenean.
Felis donec,
tfstwbbnb
PDF 转文本摘要
本质上,纯文本输出应该是:
\textcolor{...}
忽略所有颜色 ( )- 忽略所有字体大小 (
\LARGE
, )\small
- 所有链接 (
requiredOne
) 显示为文本 - 保留所有显式换行符(例如,
optionalOne
和之间requiredOne
) - 保留所有段落(例如
Lorem ipsum ...
和之间Purus semper ...
)
答案1
您可以添加可见的段落分隔符,然后删除:
pdflatex '\AddToHook{para/after}{\hbox{PARA}}\input' cc873
pdftotext cc873.pdf
sed -i -e 's/^PARA//' -e 's/[\f]//' cc873.txt
cat cc873.txt
生产
Title
optionalOne
requiredOne
requiredTwo
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut
labore et dolore magna aliqua.
Purus semper eget duis at tellus at. Tellus cras adipiscing enim eu turpis egestas pretium aenean.
Felis donec,
tfstwbbnb
(如果出现问题,您也可以使用 sed 删除尾随空白行,这里我只是删除了 ^L。
答案2
Pandoc 的说法几乎是正确的:
pandoc yourfile.tex -f latex -t plain -o yourfile.txt --wrap=none
结果是
Title
optionalOne
requiredOne
requiredTwo
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
Purus semper eget duis at tellus at. Tellus cras adipiscing enim eu turpis egestas pretium aenean.
Felis donec,
tfstwbbnb
唯一的问题是 Lorem ipsum 句子之间的空行。但是,后面跟着空行的组合对\\
LaTeX 来说不是很干净,我知道 Pandoc 拒绝转换它 :)
如果在段落之间创建空行,而段落末尾\usepackage[parfill]{parskip}
没有空行,那么 Pandoc 可以完美地转换空行。LaTeX 代码:\\
\documentclass[11pt,a4paper]{article}
% I doubt this will affect the PDF to text solution,
% but I've included it to make it as similar as possible to my real document.
\usepackage[
paperheight=11.00in,
paperwidth=8.50in,
margin=1.00in,
top=1.00in,
left=1.00in,
bottom=1.00in
]{geometry}
\usepackage[hidelinks]{hyperref}
\usepackage[parfill]{parskip}
% This part is \input from another file.
% Included inline for your convenience.
\hypersetup{
pdfinfo={
Author={tfstwbbnb},
}
}
\newcommand{\authorName}{tfstwbbnb}
% xelatex required, pdflatex does not work
% \setmainfont{Ubuntu Light}[
% ItalicFont=Ubuntu Light Italic,
% BoldFont=Ubuntu,
% BoldItalicFont=Ubuntu Italic,
% ]
\setlength\parindent{0pt}
\pagenumbering{gobble}
\usepackage{xcolor}
\newcommand{\gray}[1]{\textcolor{gray}{#1}}
\usepackage{setspace}
\setstretch{1.10}
% https://tex.stackexchange.com/a/50510
\newcommand{\fitline}[1]{\makebox[\linewidth][s]{#1}}
\newcommand{\myInnerSpacing}{0.40\baselineskip}
\hypersetup{
pdfinfo={
Title={tfstwbbnb demo},
}
}
\newcommand{\optionalOne}{optionalOne}
\newcommand{\optionalTwo}{optionalTwo}
% Links should appear as link text ("requiredOne") in text version.
\newcommand{\requiredOne}{\href{mailto:[email protected]}{requiredOne}}
\newcommand{\requiredTwo}{requiredTwo}
\begin{document}
% Alignment in text version does not matter to me. Can be left-justified or centered.
\begin{center}
\LARGE{\textbf{Title}}
\end{center}
\optionalOne \\
% Optionals might be commented out like so:
% \optionalTwo \\
\requiredOne \\
\gray{\requiredTwo}
% Text formmatting should be stripped in text version.
\textbf{Lorem ipsum dolor sit amet}, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
Purus semper eget duis at tellus at. Tellus cras adipiscing enim eu turpis egestas pretium aenean.
Felis donec, \\
tfstwbbnb
\end{document}
用此代码创建的 PDF 与问题中的 PDF 几乎相同,不同之处在于空行稍小一些。如果您想要较大的间隙,请使用
\usepackage[skip=\baselineskip]{parskip}
反而。