我用pdfcrop
它来去除 10MB 大小、400 页的 PDF 中的边距。边距已正确去除,但生成的 PDF 大小为 51MB?有什么建议吗?
答案1
这里是我的一个改进版本pdfcrop
。
默认操作是从 pdf 输入中删除白色边距,可选择留下用户定义的额外边距(选项-m ...
)。
另一种操作是按照用户定义的量修剪页面边缘(选项-t ...
)。
pdfcrop.sh
使用gs
(Ghostscript)按页面确定紧密封闭的边界框,pdftk
解压缩/压缩 PDF 文件并获取页面顺序(不需要是线性的),以及向每个 PDF 页面perl
添加/CropBox
代表新找到的紧密边界框的单独条目。
与原始版本不同pdfcrop
,下面的 bash 脚本保留了 PDF 的原始交互部分(链接、注释等)。输出文件大小与以前大致相同。
更新:-two
添加了双面页面布局选项
使用示例:
#getting help
pdfcrop.sh -help
#default operation
pdfcrop.sh orig.pdf cropped.pdf
pdfcrop.sh -m 10 orig.pdf cropped.pdf
pdfcrop.sh -hires orig.pdf cropped.pdf
#trimming pages
pdfcrop.sh -t "10 20 30 40" orig.pdf trimmed.pdf
#same for two-sided layout
pdfcrop.sh -t "10 20 30 40" -two orig.pdf trimmed.pdf
内容pdfcrop.sh
:
#!/bin/bash
function usage () {
echo "Usage: `basename $0` [Options] <input.pdf> [<output.pdf>]"
echo
echo " * Removes white margins from every page in the file. (Default operation)"
echo " * Trims page edges by given amounts. (Alternative operation)"
echo
echo "If only <input.pdf> is given, it is overwritten with the cropped output."
echo
echo "Options:"
echo
echo " -m \"<left> [<bottom> [<right> <top>]]\""
echo " adds extra margins in default operation mode. Unit is bp. A single number"
echo " is used for all margins, two numbers \"<left> <bottom>\" are applied to the"
echo " right and top margins alike."
echo
echo " -t \"<left> [<bottom> [<right> <top>]]\""
echo " trims outer page edges by the given amounts. Unit is bp. A single number"
echo " is used for all trims, two numbers \"<left> <bottom>\" are applied to the"
echo " right and top trims alike."
echo
echo " -two"
echo " to be used for documents with two-sided page layout; the meaning of <left>"
echo " and <right> changes to <inner> and <outer> for options -m and -t"
echo
echo " -hires"
echo " %%HiResBoundingBox is used in default operation mode."
echo
echo " -help"
echo " prints this message."
}
c=0
mar=(0 0 0 0); tri=(0 0 0 0)
bbtype=BoundingBox
two=0
while getopts m:t:h: opt
do
case $opt
in
m)
eval mar=($OPTARG)
[[ -z "${mar[1]}" ]] && mar[1]=${mar[0]}
[[ -z "${mar[2]}" || -z "${mar[3]}" ]] && mar[2]=${mar[0]} && mar[3]=${mar[1]}
c=0
;;
t)
if [[ "$OPTARG" == "wo" ]]
then
two=1
else
eval tri=($OPTARG)
[[ -z "${tri[1]}" ]] && tri[1]=${tri[0]}
[[ -z "${tri[2]}" || -z "${tri[3]}" ]] && tri[2]=${tri[0]} && tri[3]=${tri[1]}
c=1
fi
;;
h)
if [[ "$OPTARG" == "ires" ]]
then
bbtype=HiResBoundingBox
else
usage 1>&2; exit 0
fi
;;
\?)
usage 1>&2; exit 1
;;
esac
done
shift $((OPTIND-1))
[[ -z "$1" ]] && echo "`basename $0`: missing filename" 1>&2 && usage 1>&2 && exit 1
input=$1;output=$1;shift;
[[ -n "$1" ]] && output=$1 && shift;
(
[[ "$c" -eq 0 ]] && gs -dNOPAUSE -q -dBATCH -sDEVICE=bbox "$input" 2>&1 | grep "%%$bbtype"
pdftk "$input" output - uncompress
) | perl -w -n -s -e '
BEGIN {@m=split /\s+/, $mar; @t=split /\s+/, $tri; @mb=(); $p=-1;}
sub insCropBox {
if($c){
if($two && $p%2) {
$mb[0]+=$t[2];$mb[1]+=$t[1];$mb[2]-=$t[0];$mb[3]-=$t[3];
}
else {
$mb[0]+=$t[0];$mb[1]+=$t[1];$mb[2]-=$t[2];$mb[3]-=$t[3];
}
print "/CropBox [", join(" ", @mb), "]\n";
} else {
@bb=split /\s+/, $bbox[$p];
if($two && $p%2) {
$bb[0]+=$mb[0];$bb[1]+=$mb[1];$bb[2]+=$mb[0];$bb[3]+=$mb[1];
$bb[0]-=$m[2];$bb[1]-=$m[1];$bb[2]+=$m[0];$bb[3]+=$m[3];
}
else {
$bb[0]+=$mb[0];$bb[1]+=$mb[1];$bb[2]+=$mb[0];$bb[3]+=$mb[1];
$bb[0]-=$m[0];$bb[1]-=$m[1];$bb[2]+=$m[2];$bb[3]+=$m[3];
}
print "/CropBox [", join(" ", @bb), "]\n";
}
}
if (/BoundingBox:\s+([\d\.\s]+\d)/) { push @bbox, $1; next;}
elsif (/\/CropBox\s+\[([\d\.\s]+\d)\]/) {next;}
elsif (/\/MediaBox\s+\[([\d\.\s]+\d)\]/) {
@mb=split /\s+/, $1; next if($p<0);
insCropBox; @mb=(); $p=-1;
}
elsif (/pdftk_PageNum\s+(\d+)/) {
$p=$1-1; next unless(@mb);
insCropBox; @mb=(); $p=-1;
}
print;
' -- -mar="${mar[*]}" -tri="${tri[*]}" -c=$c -two=$two | pdftk - output "$output" compress
答案2
我使用此处找到的 Python 脚本:http://www.mobileread.com/forums/showthread.php?t=25565具有以下特点:
- 输出具有您所要求的合理大小
- 支持绝对裁剪(以防当您有水平页脚或标题栏时自动计算的边界框没有用)
- 速度非常快:不到一秒钟就能浏览 200 页!
当然你需要提前安装 pyPdf。由于链接可能无效,我在这里粘贴源代码:
#! /usr/bin/python
import getopt, sys
from pyPdf import PdfFileWriter, PdfFileReader
def usage ():
print """sjvr767\'s PDF Cropping Script.
Example:
my_pdf_crop.py -s -p 0.5 -i input.pdf -o output.pdf
my_pdf_crop.py --skip --percent 0.5 -input input.pdf -output output.pdf
\n
REQUIRED OPTIONS:
-p\t--percent
The factor by which to crop. Must be positive and less than or equal to 1.
-i\t--input
The path to the file to be cropped.
\n
OPTIONAL:
-s\t--skip
Skip the first page. Ouptut file will not contain the first page of the input file.
-o\t--output
Specify the name and path of the output file. If none specified, the script appends \'cropped\' to the file name.
-m\t--margin
Specify additional absolute cropping, for fine tuning results.
\t-m "left top right bottom"
"""
sys.exit(0)
def cut_length(dictionary, key, factor):
cut_factor = 1-factor
cut = float(dictionary[key])*cut_factor
cut = cut / 4
return cut
def new_coords(dictionary, key, cut, margin, code = "tl"):
if code == "tl":
if key == "x":
return abs(float(dictionary[key])+(cut+margin["l"]))
else:
return abs(float(dictionary[key])-(cut+margin["t"]))
elif code == "tr":
if key == "x":
return abs(float(dictionary[key])-(cut+margin["r"]))
else:
return abs(float(dictionary[key])-(cut+margin["t"]))
elif code == "bl":
if key == "x":
return abs(float(dictionary[key])+(cut+margin["l"]))
else:
return abs(float(dictionary[key])+(cut+margin["b"]))
else:
if key == "x":
return abs(float(dictionary[key])-(cut+margin["r"]))
else:
return abs(float(dictionary[key])+(cut+margin["b"]))
try:
opts, args = getopt.getopt(sys.argv[1:], "sp:i:o:m:", ["skip", "percent=", "input=", "output=", "margin="])
except getopt.GetoptError, err:
# print help information and exit:
print str(err) # will print something like "option -a not recognized"
usage()
sys.exit(2)
skipone = 0
for a in opts[:]:
if a[0] == '-s' or a[0]=='--skip':
skipone = 1
factor = 0.8 #default scaling factor
for a in opts[:]:
if a[0] == '-p' or a[0]=='--factor':
if a[1] != None:
try:
factor = float(a[1])
except TypeError:
print "Factor must be a number."
sys.exit(2) #exit if no appropriate input file
input_file = None #no defualt input file
for a in opts[:]:
if a[0] == '-i' or a[0]=='--input':
if a[1] != None:
try:
if a[1][-4:]=='.pdf':
input_file = a[1]
else:
print "Input file must be a PDF."
sys.exit(2) #exit if no appropriate input file
except TypeError:
print "Input file must be a PDF."
sys.exit(2) #exit if no appropriate input file
except IndexError:
print "Input file must be a PDF."
sys.exit(2) #exit if no appropriate input file
else:
print "Please speicfy an input file."
sys.exit(2) #exit if no appropriate input file
output_file = "%s_cropped.pdf" %input_file[:-4] #default output
for a in opts[:]:
if a[0] == '-o' or a[0]=='--output':
if a[1]!= None:
try:
if a[1][-4:]=='.pdf':
output_file = a[1]
else:
print "Output file must be a PDF."
except TypeError:
print "Output file must be a PDF."
except IndexError:
print "Output file must be a PDF."
margin = {"l": 0, "t": 0, "r": 0, "b": 0}
for a in opts[:]:
if a[0] == '-m' or a[0]=='--margin':
if a[1]!= None:
m_temp = a[1].strip("\"").split()
margin["l"] = float(m_temp[0])
margin["t"] = float(m_temp[1])
margin["r"] = float(m_temp[2])
margin["b"] = float(m_temp[3])
else:
print "Error"
input1 = PdfFileReader(file(input_file, "rb"))
output = PdfFileWriter()
outputstream = file(output_file, "wb")
pages = input1.getNumPages()
top_right = {'x': input1.getPage(1).mediaBox.getUpperRight_x(), 'y': input1.getPage(1).mediaBox.getUpperRight_y()}
top_left = {'x': input1.getPage(1).mediaBox.getUpperLeft_x(), 'y': input1.getPage(1).mediaBox.getUpperLeft_y()}
bottom_right = {'x': input1.getPage(1).mediaBox.getLowerRight_x(), 'y': input1.getPage(1).mediaBox.getLowerRight_y()}
bottom_left = {'x': input1.getPage(1).mediaBox.getLowerLeft_x(), 'y': input1.getPage(1).mediaBox.getLowerLeft_y()}
print('Page dim.\t%f by %f' %(top_right['x'], top_right['y']))
cut = cut_length(top_right, 'x', factor)
new_tr = (new_coords(top_right, 'x', cut, margin, code = "tr"), new_coords(top_right, 'y', cut, margin, code = "tr"))
new_br = (new_coords(bottom_right, 'x', cut, margin, code = "br"), new_coords(bottom_right, 'y', cut, margin, code = "br" ))
new_tl = (new_coords(top_left, 'x', cut, margin, code = "tl"), new_coords(top_left, 'y', cut, margin, code = "tl"))
new_bl = (new_coords(bottom_left, 'x', cut, margin, code = "bl"), new_coords(bottom_left, 'y', cut, margin, code = "bl"))
if skipone == 0:
for i in range(0, pages):
page = input1.getPage(i)
page.mediaBox.upperLeft = new_tl
page.mediaBox.upperRight = new_tr
page.mediaBox.lowerLeft = new_bl
page.mediaBox.lowerRight = new_br
output.addPage(page)
else:
for i in range(1, pages):
page = input1.getPage(i)
page.mediaBox.upperLeft = new_tl
page.mediaBox.upperRight = new_tr
page.mediaBox.lowerLeft = new_bl
page.mediaBox.lowerRight = new_br
output.addPage(page)
output.write(outputstream)
outputstream.close()
答案3
我非常喜欢 Alexander Grahn 的脚本,但我缺少一个允许小边距的功能。我对脚本做了一点小修改,以便像原始 PDF 裁剪一样允许这个边距。
由于我是 Stack Exchange 的这个部分的新手,我无法发表评论,因此我将在此处发布整个脚本。不幸的是,我不太擅长使用 bash,因此我浪费了一些时间尝试使其成为可选的,但最终还是放弃了。我将边距声明保留在 Perl 脚本之外,因此使用更多 bash-foo 应该可以做到。
#!/bin/bash
MARGIN=10
(
gs -dNOPAUSE -q -dBATCH -sDEVICE=bbox "$1" 2>&1 | grep '%%BoundingBox'
pdftk "$1" output - uncompress
) | perl -w -n -e '
$margin = '$MARGIN';
if (/BoundingBox:\s+(\d+\s+\d+\s+\d+\s+\d+)/) {
push @bbox, $1; next;
}
elsif (/pdftk_PageNum\s+(\d+)/) {
# Split the sizes
@sizes = split(/ /, $bbox[$1-1]);
# Add or substract the margin size
$j = 0;
foreach(@sizes) {
if($j < 2) {
$_ = $_ - $margin;
} else {
$_ = $_ + $margin;
}
$j++;
}
# Print the box
print "/MediaBox [" .join(" ", @sizes) . "]\n";
}
elsif (/MediaBox/) {
next;
}
print;
' | pdftk - output "$2" compress
答案4
我发现这个项目是一个很好的替代方案pdfcrop
:https://github.com/abarker/pdfCropMargins
它是一个具有大量命令行选项的 Python 包。还提供可选的 GUI。
例如命令:
$ pdf-crop-margins -u -s in.pdf
裁剪in.pdf
,使所有页面设置为相同大小,裁剪量在所有页面上保持一致,默认保留现有边距的 10%。输出文件与输入文件大小大致相同,并且链接和注释也保留。