将两列布局转换为具有相同换行符的单列布局

将两列布局转换为具有相同换行符的单列布局

我有一个文档,我想应用pdftotext -layout并解析输出文本。我遇到的问题是文本分为两列,而输出pdftotext有时会交错两列(因为两列的基线不在网格上)。

有没有办法输出相同的 pdf,但将两列视为不同的页面(这样pdftotext -layout就不会在同一页上读取两列)?下面是一个简单的例子。我考虑过简单地更改边距,以便页面成为 中的单列\textwidth,但具体的例子涉及浮动、切换到一列 + longtable 中间文档,以及页面引用、同上等内容。 的预期用途pdftotext -layout是检查单词是否在连续的行上重复,因此文本和布局需要相同。

例如,

\documentclass{article}
\usepackage[twocolumn]{geometry}

\usepackage{lipsum}

\usepackage{parskip}

\begin{document}
\lipsum
\end{document}

生成结果:

Lorem ipsum dolor sit amet, consectetuer adip- vulputate metus eu enim. Vestibulum pellen-
iscing elit. Ut purus elit, vestibulum ut, plac- tesque felis eu massa.
erat ac, adipiscing vitae, felis. Curabitur dic-
                                                      Quisque ullamcorper placerat ipsum. Cras nibh.
tum gravida mauris. Nam arcu libero, nonummy
                                                      Morbi vel justo vitae lacus tincidunt ultrices.
eget, consectetuer id, vulputate a, magna. Donec
                                                      Lorem ipsum dolor sit amet, consectetuer adip-
vehicula augue eu neque. Pellentesque habitant
                                                      iscing elit. In hac habitasse platea dictumst.
morbi tristique senectus et netus et malesuada
                                                      Integer tempus convallis augue. Etiam facili-
fames ac turpis egestas. Mauris ut leo. Cras
                                                      sis. Nunc elementum fermentum wisi. Aenean
viverra metus rhoncus sem. Nulla et lectus vestibu-
                                                      placerat. Ut imperdiet, enim sed gravida sollic-
lum urna fringilla ultrices. Phasellus eu tellus sit
                                                      itudin, felis odio placerat quam, ac pulvinar elit
amet tortor gravida placerat. Integer sapien est,
                                                      purus eget enim. Nunc vitae tortor. Proin tem-
iaculis in, pretium quis, viverra ac, nunc. Prae-
                                                      pus nibh sit amet nisl. Vivamus quis tortor vitae
sent eget sem vel leo ultrices bibendum. Aenean
                                                      risus porta vehicula.
faucibus. Morbi dolor nulla, malesuada eu, pul-
vinar at, mollis ac, nulla. Curabitur auctor sem- Fusce mauris. Vestibulum luctus nibh at lectus.
per nulla. Donec varius orci eget risus. Duis Sed bibendum, nulla a faucibus semper, leo velit
nibh mi, congue eu, accumsan eleifend, sagittis ultricies tellus, ac venenatis arcu wisi vel nisl.

答案1

因为这只是为了提取文本,所以它不是很漂亮,但只需将列一个接一个地排列,而不是并排排列即可

在此处输入图片描述

它复制了一个较大的宏,但只是将 hbox 和 vrule 更改为 vbox 和 hrule

\documentclass{article}
\usepackage[twocolumn]{geometry}

\paperheight=2\paperheight

\makeatletter

\def\@outputdblcol{%
  \if@firstcolumn
    \global\@firstcolumnfalse
    \global\setbox\@leftcolumn\copy\@outputbox
    \splitmaxdepth\maxdimen
    \vbadness\maxdimen
     \setbox\@outputbox\vbox{\unvbox\@outputbox\unskip}%
     \setbox\@outputbox\vsplit\@outputbox to\maxdimen
    \toks@\expandafter{\topmark}%
    \xdef\@firstcoltopmark{\the\toks@}%
    \toks@\expandafter{\splitfirstmark}%
    \xdef\@firstcolfirstmark{\the\toks@}%
    \ifx\@firstcolfirstmark\@empty
      \global\let\@setmarks\relax
    \else
      \gdef\@setmarks{%
        \let\firstmark\@firstcolfirstmark
        \let\topmark\@firstcoltopmark}%
    \fi
  \else
    \global\@firstcolumntrue
    \setbox\@outputbox\vbox{%
     \hb@xt@\textwidth{%
\vbox{%
       \hb@xt@\columnwidth{\box\@leftcolumn \hss}%
%        \hfil
%        {\normalcolor\vrule \@width\columnseprule}%
%        \hfil
\hrule
       \hb@xt@\columnwidth{\box\@outputbox \hss}}%
}%
   }%
  \@combinedblfloats
    \@setmarks
    \@outputpage
    \begingroup
      \@dblfloatplacement
      \@startdblcolumn
      \@whilesw\if@fcolmade \fi{\@outputpage
     \@startdblcolumn}%
    \endgroup
  \fi}%
\makeatother

\usepackage{lipsum}

\usepackage{parskip}

\begin{document}
\lipsum
\end{document}

相关内容