amazon.com 上的一些书有摘录(前几页加上附录),我们可以在购买之前大致了解其内容、布局等。从统计学上讲,我认为如果摘录的内容是随机从相应书籍中摘录一些页面(30-50 页可能就足够了),然后按页数升序排列,最后将它们打包成新的 PDF,效果会更好。
我的问题是:如何在 LaTeX 中做到这一点?
最小工作示例
% compile with pdflatex -shell-escape
% =============================================================================================
\def\NoticeThatIAmUsingThisPackageToExtractSomePagesFromAnExternalPDFFileInMyComputer{pdfpages}
% =============================================================================================
\documentclass{article}
\usepackage{filecontents}
\begin{filecontents*}{book.tex}
\documentclass{book}
\usepackage{blindtext}
\begin{document}
\Blinddocument
\end{document}
\end{filecontents*}
\immediate\write18{pdflatex book.tex}
\immediate\write18{pdflatex book.tex}
\usepackage{\NoticeThatIAmUsingThisPackageToExtractSomePagesFromAnExternalPDFFileInMyComputer}
\def\NumberOfPagesOfExcerpt{50}
\begin{document}
% do randomization, sorting and bundling here!
% \includepdf[pages=-]{book}
\end{document}
答案1
一个lualatex
办法:
\documentclass{article}
\usepackage{luatextra}
\usepackage{filecontents}
\begin{filecontents*}{book.tex}
\documentclass{book}
\usepackage{blindtext}
\begin{document}
\Blinddocument
\end{document}
\end{filecontents*}
\begin{luacode}
function get_random_pages(randPages,totalPages, randSeed)
--[[--
Constructs a sorted list of randPages random page numbers within a range 1..totalPages
@Parameter: randPages
The number of random pages to extract
@Parameter: totalPages
Total number of pages in a pdf file
@Parameter: randSeed
Random seed
--]]--
local pagesLeft= {}
local pageList = {}
for pageNo=1, totalPages, 1 do
table.insert(pagesLeft,pageNo)
end
math.randomseed (randSeed)
local r
for i=1, randPages do
r=math.random(#pagesLeft)
table.insert(pageList,pagesLeft[r])
table.remove(pagesLeft,r)
end
table.sort(pageList)
local s="\\includepdf[pages={"
s=s..pageList[1]
for i=2, randPages do
s=s..","..pageList[i]
end
s=s.."}]{book}"
tex.print(s)
end
\end{luacode}
\immediate\write18{pdflatex book.tex}
\immediate\write18{pdflatex book.tex}
\usepackage{pdfpages}
\def\NumberOfPagesOfExcerpt{9}
\def\NumberOfPagesInPdf{17}
\def\randomSeed{27449}
\begin{document}
% do randomization, sorting and bundling here!
\directlua{get_random_pages(\NumberOfPagesOfExcerpt,\NumberOfPagesInPdf,\randomSeed)}
\end{document}
用 进行处理 lualatex -shell-escape random_pages.tex
。
编辑:
table.concat
按照建议使用的标准函数@Aditya
,\randomPages
使用可选随机种子参数定义的命令,pdf 中的页数通过 pdftex 基元定义,如下所示 这里。
random_pages.tex
:
\documentclass{article}
\usepackage{luatextra}
\usepackage{filecontents}
\begin{filecontents*}{book.tex}
\documentclass{article}
\usepackage{geometry}
\geometry{
paperwidth=74mm,
paperheight=105mm,
margin=2em,
bottom=9ex,
nohead
}
\usepackage{blindtext}
\begin{document}
\Blinddocument
\end{document}
\end{filecontents*}
\begin{luacode}
function get_random_pages(randPages,totalPages, randSeed)
--[[--
Constructs a sorted list of randPages random page numbers within a range 1..totalPages
@Parameter: randPages
The number of random pages to extract
@Parameter: totalPages
Total number of pages in a pdf file
@Parameter: randSeed
Random seed: used only if >0
--]]--
local pagesLeft= {}
local pageList = {}
for pageNo=1, totalPages, 1 do
table.insert(pagesLeft,pageNo)
end
if randSeed>0 then math.randomseed(randSeed) end
local r
for i=1, math.min(randPages,totalPages) do
r=math.random(#pagesLeft)
table.insert(pageList,pagesLeft[r])
table.remove(pagesLeft,r)
end
table.sort(pageList)
local s="\\includepdf[pages={"
s=s..table.concat(pageList,",")
s=s.."}]{book}"
tex.print(s)
end
\end{luacode}
\immediate\write18{pdflatex book.tex}
\immediate\write18{pdflatex book.tex}
\usepackage{pdfpages}
\def\NumberOfPagesOfExcerpt{42}
\def\randomSeed{27449}
\makeatletter
\newcommand\@randomPages[3]{%
\pdfximage{#2}%
\def\NumberOfPagesInPdf{\the\pdflastximagepages}%
\directlua{get_random_pages(#1,\NumberOfPagesInPdf,#3)}%
}
\def\randomPages{%
\@ifnextchar[{\@with}{\@without}}%
\def\@with[#1]#2#3{%
\@randomPages{#2}{#3}{#1}%
}%
\def\@without#1#2{%
\@randomPages{#1}{#2}{0}%
}%
\makeatother
\begin{document}
% do randomization, sorting and bundling here!
% \randomPages[\randomSeed]{10}{book.pdf} % supposed to produce a fixed set of pages every time
\randomPages{10}{book.pdf} % supposed to produce a different set of pages every time
\end{document}
答案2
这是一个解决方案ConTeXt Lua 文档. 适当修改参数filename
和n
(稍后我会发布使用命令行参数的版本)。
将其保存为filter.cld
(注意扩展名!),然后使用 进行处理context filter.cld
。
local random = math.random
local format = string.format
-- Sample n items out of m without replacement
function reservoirsample (n, m)
local sampledlist = {}
if n == 0 then return sampledlist end
for i = 1, m do
-- Take the first n samples
if i <= n then
sampledlist[i] = i
else
-- Randomly replace one sample
local j = random(i)
if j < n then
sampledlist[j] = i
end
end
end
table.sort(sampledlist)
return sampledlist
end
local filename="fonts-mkiv.pdf"
local n = 20
context.starttext()
-- Example taken from grph-inc.lua
local fig = figures.push { name = filename }
figures.identify()
figures.check()
local nofpages = fig.used.pages
figures.pop()
selected = reservoirsample(n, nofpages)
print(format("::: File %s has %d pages, selecting %d", filename, nofpages, n))
print(format("::: %s", table.concat(selected, ", ")))
for i = 1,#selected do
context.startTEXpage()
context.externalfigure( {filename}, {page=selected[i]} )
context.stopTEXpage()
end
context.stoptext()
答案3
这是一个仅使用数学和循环位的解决方案pgf
。它借用了 Mark Wibrow 不久前在 pgf-users 邮件列表中编写的一些代码改组pgfmath
列表。中的列表pgfmath
是用哈希实现的,而不是单个标记列表。
获取随机列表钾列表 {1,...,否},我创建列表 {1,...,否} 然后 knuth 将其打乱。然后我对第一个进行冒泡排序钾元素。然后对每个我从 1 到钾我包括我PDF 的第页。
\documentclass{article}
\usepackage{pgf,pgffor}
\usepackage{pdfpages}
\makeatletter
% declare a list by its elements
% e.g., \pgfmathdeclarelist{mylist}{{foo}{bar}{baz}}
\def\pgfmathdeclarelist#1#2{%
\def\pgfmath@list@name{#1}%
\c@pgfmath@counta=0%
\pgfmath@declarelistlist#2{\pgfmath@stop}%
}%
\def\pgfmath@declarelistlist#1{%
\ifx#1\pgfmath@stop%
\expandafter\edef\csname pgfmath@list@\pgfmath@list@name @length\endcsname{\the\c@pgfmath@counta}%
\else%
\advance\c@pgfmath@counta by1\relax%
\pgfutil@namedef{pgfmath@list@\pgfmath@list@name @\the\c@pgfmath@counta}{#1}%
\expandafter\pgfmath@declarelistlist%
\fi%
}
% get a list item
% \pgfmathgetlistitem{\cs}{mylist}{3} lets \cs be the 3rd item of mylist
\def\pgfmathgetlistitem#1#2#3{%
\expandafter\let\expandafter#1\expandafter=\csname pgfmath@list@#2@#3\endcsname%
}
% set a list item
% \pgfmathsetlistitem{mylist}{3}{foo} defines the 3rd item of mylist to be foo
% caution - you may need the 3rd argument expanded first.
\def\pgfmathsetlistitem#1#2#3{%
\pgfutil@namedef{pgfmath@list@#1@#2}{#3}%
}
% get the length of a list
% \pgfmathgetlistlength{\mylistlength}{mylist} lets \mylistlength be the length of the list.
\def\pgfmathgetlistlength#1#2{%
\expandafter\let\expandafter#1\expandafter=\csname pgfmath@list@#2@length\endcsname%
}
% set the length of a list
% \pgfmathsetlistlength{mylist}{length} defines the length of mylist to be length
\def\pgfmathsetlistlength#1#2{%
\expandafter\edef\csname pgfmath@list@#1@length\endcsname{#2}
}
\def\pgfmathknuthshuffle#1{%
\pgfmathgetlistlength\pgfmath@len{#1}%
\pgfmathloop%
\ifnum\pgfmathcounter>\pgfmath@len%
\else%
\pgfmathrandominteger\pgfmath@temp{1}{\pgfmath@len}%
\pgfmathgetlistitem\pgfmath@@temp{#1}{\pgfmathcounter}%
\pgfmathgetlistitem\pgfmath@@@temp{#1}{\pgfmath@temp}%
\def\pgfmath@marshal{\pgfmathsetlistitem{#1}}%
\expandafter\pgfmath@marshal\expandafter{\expandafter\pgfmath@temp\expandafter}\expandafter{\pgfmath@@temp}%
\expandafter\pgfmath@marshal\expandafter{\expandafter\pgfmathcounter\expandafter}\expandafter{\pgfmath@@@temp}%
\repeatpgfmathloop%
}
\def\NumberOfPagesOfExcerpt{9}
\def\NumberOfPagesInPdf{17}
% Populate page list. Rather than use \pgfmathdeclarelist we allocate the list and assign in a loop.
% sorry for the \global... pgf's \foreach creates a group.
\def\s@pagelist{pagelist} % makes expansion easier
\pgfmathsetlistlength{pagelist}{\NumberOfPagesInPdf}
\foreach \i in {1,...,\NumberOfPagesInPdf}{
\global\expandafter\pgfmathsetlistitem\expandafter\s@pagelist\expandafter\i\expandafter{\i}
}
\pgfmathknuthshuffle{pagelist}
% now a bubble sort on the first \NumberOfPagesOfExcerpt items in the list.
\pgfmathtruncatemacro{\n}{\NumberOfPagesOfExcerpt-1}
\foreach \j in {1,...,\n}{
\pgfmathtruncatemacro{\k}{\NumberOfPagesOfExcerpt-\j}
\foreach \i in {1,...,\k}{
\pgfmathtruncatemacro{\iplusone}{\i+1}
\pgfmathgetlistitem{\pagei}{pagelist}{\i}
\pgfmathgetlistitem{\pageiplusone}{pagelist}{\iplusone}
\ifnum\pagei>\pageiplusone
\global\expandafter\pgfmathsetlistitem\expandafter\s@pagelist\expandafter\i\expandafter{\pageiplusone}
\global\expandafter\pgfmathsetlistitem\expandafter\s@pagelist\expandafter\iplusone\expandafter{\pagei}
\fi
}
}
\makeatother
\begin{document}
\foreach \i in {1,...,\NumberOfPagesOfExcerpt}{
\pgfmathgetlistitem{\pagei}{pagelist}{\i}
\includepdf[pages=\pagei]{book.pdf}
}
\end{document}
正如您所见,它有点混乱,但它不需要 lua 或外部脚本。IANACS,所以我也不知道它有多高效。但如果您想要高效,您就不会在 TeX 中完成这项工作。:-)
答案4
excerpting.exe
调用LaTeX 内部命名的外部随机器。
LaTeX 代码:
% compile with pdflatex -shell-escape
\documentclass{article}
\usepackage{pdfpages}
\def\bookfilename{status-lua}% http://chat.stackexchange.com/transcript/41?m=8712421#8712421
\def\take{30}
\def\seeder{1}
\def\auxiliaryfilename{random.txt}
\pdfximage{\bookfilename.pdf}
\immediate\write18{excerpting \the\pdflastximagepages\space \take\space \seeder\space \auxiliaryfilename}
\begin{document}
\newread\reader
\openin\reader=\auxiliaryfilename\relax
\loop
\read\reader to \data
\unless\ifeof\reader
\includepdf[pages=\data]{\bookfilename}
\repeat
\closein\reader
\end{document}
C# (Fisher-Yates 改组):
// excerpting.cs
using System;
using System.IO;
using System.Linq;
namespace Excerpting
{
class Program
{
static void Main(string[] args)
{
int total = int.Parse(args[0]);
int take = int.Parse(args[1]);
int seeder = int.Parse(args[2]);
string filename = args[3];
int[] array = Enumerable.Range(1, total).ToArray();
Random random = new Random(seeder);
for (int i = total - 1; i > 0; i--)
{
int j = random.Next(i+1);
int temp = array[i];
array[i] = array[j];
array[j] = temp;
}
File.WriteAllLines(filename, array.Take(take).OrderBy(x => x).Select(x => x.ToString()));
}
}
}
C#(随机排序):
有人声称随机排序具有均匀分布,但我还没有检查过。
// excerpting.cs
using System;
using System.IO;
using System.Linq;
namespace Excerpting
{
class Program
{
static void Main(string[] args)
{
int total = int.Parse(args[0]);
int take = int.Parse(args[1]);
int seeder = int.Parse(args[2]);
string filename = args[3];
Random random = new Random(seeder);
string[] array = Enumerable.Range(1, total)
.OrderBy(x => random.Next())
.Take(take)
.OrderBy(x => x)
.Select(x => x.ToString())
.ToArray();
File.WriteAllLines(filename, array);
}
}
}