整理来自 2 个文件夹的扫描 PDF

整理来自 2 个文件夹的扫描 PDF

我有大约 1000 份不同文档的正面和背面扫描件,分别放在 2 个不同的文件夹中。我希望创建一个批处理操作,将正面扫描件和其对应的背面扫描件合并为一个文档。

编辑:我使用的是 Windows XP,扫描件是 PDF。封面在一个文件夹中,封底在另一个文件夹中。文件名为 1-1-NAME、1-2-NAME;其中 NAME 是四个字母的标识符。

答案1

你在寻找 ImageMagick 的剪辑? ImageMagick 可以处理 pdf。

如果你想要比 ImageMagick montage 更灵活的功能,你也可以在Python语言pyPdf库。pyPdf 可以合并 PDF 页面并应用基本变换(例如平移、旋转、缩放)。示例脚本:

import pyPdf

def merge_horizontal(out_filename, left_filename, right_filename):
    """ Merge the first page of two PDFs side-to-side """

    # open the PDF files to be merged
    with open(left_filename) as left_file, open(right_filename) as right_file, open(out_filename, 'w') as output_file:
        left_pdf = pyPdf.PdfFileReader(left_file)
        right_pdf = pyPdf.PdfFileReader(right_file)
        output = pyPdf.PdfFileWriter()

        # get the first page from each pdf
        left_page = left_pdf.pages[0]
        right_page = right_pdf.pages[0]

        # start a new blank page with a size that can fit the merged pages side by side
        page = output.addBlankPage(
            width=left_page.mediaBox.getWidth() + right_page.mediaBox.getWidth(),
            height=max(left_page.mediaBox.getHeight(), right_page.mediaBox.getHeight()),
        )

        # draw the pages on that new page
        page.mergeTranslatedPage(left_page, 0, 0)
        page.mergeTranslatedPage(right_page, left_page.mediaBox.getWidth(), 0)

        # write to file
        output.write(output_file)


def mkdir_p(path):
    try:
        os.makedirs(path)
    except OSError as exc:
        if not (exc.errno == errno.EEXIST and os.path.isdir(path)): 
            raise
if __name__ == '__main__':
    import sys, os, errno
    output_folder_name = sys.argv[1]
    left_folder_name = sys.argv[2]
    right_folder_name = sys.argv[3]
    left_files = set(os.listdir(left_folder_name))
    right_files = set(os.listdir(right_folder_name))
    mkdir_p(output_folder_name)

    # for every files that are in both left_files and right_files
    for f in left_files.intersection(right_files):
        output_file_name = os.path.join(output_folder_name, f)
        left_file_name = os.path.join(left_folder_name, f)
        right_file_name = os.path.join(right_folder_name, f)
        print 'merging %s and %s into %s' % (left_file_name, right_file_name, output_file_name)
        merge_horizontal(output_file_name, left_file_name, right_file_name)

    # pair is missing, not merging
    print 'Only in left folder: ', left_files - right_files
    print 'Only in right folder: ', right_files - left_files

并像下面这样调用脚本:

python merge.py output_folder left_folder right_folder

示例输出:

merging folderA/two.pdf and folderB/two.pdf into output/dacd/adca/two.pdf
merging folderA/one.pdf and folderB/one.pdf into output/dacd/adca/one.pdf
merging folderA/three.pdf and folderB/three.pdf into output/dacd/adca/three.pdf
Only in left folder:  set(['four.pdf'])
Only in right folder:  set(['five.pdf'])

相关内容