仅保留 PDF 中每个部分的最后一页？

Question

我自己解决了这个问题。编写了 Python 代码来处理它。检索 PageLabels 会检索标签本身（可能是数字，也可能不是数字）以及所述标签开始的相应索引。我提取标签的起始索引，并假设一个部分或标签的结束发生在下一个标签/部分开始前 1 页。

#!/usr/bin/python

from PyPDF2 import PdfFileWriter, PdfFileReader
import numpy as np

def printf(format, *values):
    print(format % values )

with open("in.pdf", "rb") as in_f:
    input1 = PdfFileReader(in_f)
    output = PdfFileWriter()

    numPages = input1.getNumPages()

    # The label indices occur @ even locations - generate array of form [0, 2, 4, 6, ...]
    indices = np.array(np.arange(0,np.shape(input1.trailer["/Root"]["/PageLabels"]["/Nums"])[0],2))

    # Assume end of preceding label = start of next label - 1
    pageIndices = np.array(input1.trailer["/Root"]["/PageLabels"]["/Nums"])[indices] - 1 

    # ignore the first index which is now a -1
    pageIndices = pageIndices[1:] 

    # there may be extra pages right after the start of the last label - add them
    pageIndices = np.append(pageIndices, np.arange(pageIndices[-1]+1, numPages))


    for _, v in enumerate(pageIndices):
        page = input1.getPage(v)
        output.addPage(page)

    with open("out.pdf", "wb") as out_f:
        output.write(out_f)

Answer 1

我自己解决了这个问题。编写了 Python 代码来处理它。检索 PageLabels 会检索标签本身（可能是数字，也可能不是数字）以及所述标签开始的相应索引。我提取标签的起始索引，并假设一个部分或标签的结束发生在下一个标签/部分开始前 1 页。

#!/usr/bin/python

from PyPDF2 import PdfFileWriter, PdfFileReader
import numpy as np

def printf(format, *values):
    print(format % values )

with open("in.pdf", "rb") as in_f:
    input1 = PdfFileReader(in_f)
    output = PdfFileWriter()

    numPages = input1.getNumPages()

    # The label indices occur @ even locations - generate array of form [0, 2, 4, 6, ...]
    indices = np.array(np.arange(0,np.shape(input1.trailer["/Root"]["/PageLabels"]["/Nums"])[0],2))

    # Assume end of preceding label = start of next label - 1
    pageIndices = np.array(input1.trailer["/Root"]["/PageLabels"]["/Nums"])[indices] - 1 

    # ignore the first index which is now a -1
    pageIndices = pageIndices[1:] 

    # there may be extra pages right after the start of the last label - add them
    pageIndices = np.append(pageIndices, np.arange(pageIndices[-1]+1, numPages))


    for _, v in enumerate(pageIndices):
        page = input1.getPage(v)
        output.addPage(page)

    with open("out.pdf", "wb") as out_f:
        output.write(out_f)

仅保留 PDF 中每个部分的最后一页？

答案1

相关内容