创建带有图像的 PDF

Question 1

你可以这样做HexaPDF用一些脚本：

require 'hexapdf'

doc = HexaPDF::Document.new
ARGV.each do |image_file|
  image = doc.images.add(image_file)
  iw = image.info.width.to_f
  ih = image.info.height.to_f
  page = doc.pages.add(:A4, orientation: (ih > iw ? :portrait : :landscape))
  pw = page.box(:media).width.to_f
  ph = page.box(:media).height.to_f
  rw, rh = pw / iw, ph / ih
  ratio = [rw, rh].min
  iw, ih = iw * ratio, ih * ratio
  x, y = (pw - iw) / 2, (ph - ih) / 2
  page.canvas.image(image, at: [x, y], width: iw, height: ih)
end
doc.write("output.pdf")

此脚本将图像文件作为参数，并将每个文件放在单独的 A4 页面上，缩放图像以适合页面，并在必要时旋转页面。如果您不想缩放图像，只需设置ratio = 1。

Answer

你可以这样做HexaPDF用一些脚本：

require 'hexapdf'

doc = HexaPDF::Document.new
ARGV.each do |image_file|
  image = doc.images.add(image_file)
  iw = image.info.width.to_f
  ih = image.info.height.to_f
  page = doc.pages.add(:A4, orientation: (ih > iw ? :portrait : :landscape))
  pw = page.box(:media).width.to_f
  ph = page.box(:media).height.to_f
  rw, rh = pw / iw, ph / ih
  ratio = [rw, rh].min
  iw, ih = iw * ratio, ih * ratio
  x, y = (pw - iw) / 2, (ph - ih) / 2
  page.canvas.image(image, at: [x, y], width: iw, height: ih)
end
doc.write("output.pdf")

此脚本将图像文件作为参数，并将每个文件放在单独的 A4 页面上，缩放图像以适合页面，并在必要时旋转页面。如果您不想缩放图像，只需设置ratio = 1。

Question 2

因此实际上 mutool 运行脚本提供了API（参见一切（位于“JavaScript Shell”）用于处理 pdf 并且不需要了解 pdf 格式（与示例脚本暗示的相反），供参考这里有一个实现 pdf->每页一张图片->pdf 的脚本。

由于缺乏清晰的文档（尤其是关于参数类型），我决定研究一下 mutool源代码。我收到的大多数错误消息与问题本身无关。

/*
Flatten a pdf by converting each page to an image and putting them in a new pdf.

Usage: mutool run flatten.js in.pdf out.pdf

Documentation:
https://mupdf.com/docs/manual-mutool-run.html
http://git.ghostscript.com/?p=mupdf.git;a=blob;f=source/tools/murun.c;h=16582b5d83234c880c4c3b67b279a4428d3f2303;hb=refs/heads/master
*/

function array_max(a)
{
    var max = a[0];
    for(var i=1;i<a.length;i++)
    {
        if(a[i] > max)
            max = a[i];
    }
    return max;
}

// scale up render if needed to have non-blurry images up to filling up screen when zooming on a page
var minRenderWidth = 1920;
var minRenderHeight = 1080;

var input = new PDFDocument(scriptArgs[0]);
var pages = input.countPages();

var dw = new DocumentWriter(scriptArgs[1], 'pdf');
for(var i=0;i<pages;i++)
{
    var page = input.loadPage(i);
    var pageBounds = page.bound();
    var scale = array_max([1, minRenderWidth/pageBounds[2], minRenderHeight/pageBounds[3]]);
    var pixmapRender = page.toPixmap(Scale(scale, scale), DeviceRGB);
    var imageRender = new Image(pixmapRender);
    var device = dw.beginPage(pageBounds);
    // translate then scale
    // no idea about how the translate bit works: not sure where the origin for coordinates is (but it works)
    device.fillImage(imageRender, Concat(Scale(pageBounds[2] - pageBounds[0], pageBounds[3] - pageBounds[1]), Translate(2*pageBounds[0], 0)), 1);
    device.close();
    dw.endPage(device);
    print('#' + (i+1) + ' done');
}
dw.close();

Answer

因此实际上 mutool 运行脚本提供了API（参见一切（位于“JavaScript Shell”）用于处理 pdf 并且不需要了解 pdf 格式（与示例脚本暗示的相反），供参考这里有一个实现 pdf->每页一张图片->pdf 的脚本。

由于缺乏清晰的文档（尤其是关于参数类型），我决定研究一下 mutool源代码。我收到的大多数错误消息与问题本身无关。

/*
Flatten a pdf by converting each page to an image and putting them in a new pdf.

Usage: mutool run flatten.js in.pdf out.pdf

Documentation:
https://mupdf.com/docs/manual-mutool-run.html
http://git.ghostscript.com/?p=mupdf.git;a=blob;f=source/tools/murun.c;h=16582b5d83234c880c4c3b67b279a4428d3f2303;hb=refs/heads/master
*/

function array_max(a)
{
    var max = a[0];
    for(var i=1;i<a.length;i++)
    {
        if(a[i] > max)
            max = a[i];
    }
    return max;
}

// scale up render if needed to have non-blurry images up to filling up screen when zooming on a page
var minRenderWidth = 1920;
var minRenderHeight = 1080;

var input = new PDFDocument(scriptArgs[0]);
var pages = input.countPages();

var dw = new DocumentWriter(scriptArgs[1], 'pdf');
for(var i=0;i<pages;i++)
{
    var page = input.loadPage(i);
    var pageBounds = page.bound();
    var scale = array_max([1, minRenderWidth/pageBounds[2], minRenderHeight/pageBounds[3]]);
    var pixmapRender = page.toPixmap(Scale(scale, scale), DeviceRGB);
    var imageRender = new Image(pixmapRender);
    var device = dw.beginPage(pageBounds);
    // translate then scale
    // no idea about how the translate bit works: not sure where the origin for coordinates is (but it works)
    device.fillImage(imageRender, Concat(Scale(pageBounds[2] - pageBounds[0], pageBounds[3] - pageBounds[1]), Translate(2*pageBounds[0], 0)), 1);
    device.close();
    dw.endPage(device);
    print('#' + (i+1) + ' done');
}
dw.close();

Question 3

我最近了解到img2pdf，一个 Python 包。它的功能与名称所示一致：将图像包装到 PDF 文档中。它不执行 OCR，不对图像重新采样，一切都保持原样。它非常快。使用也非常简单：

img2pdf img1.png img2.jpg -o out.pdf

img2pdf至少在 Debian 上也可以作为软件包使用。

正如在img2pdf地点，立方体如果您有 JPEG 或 PNG，也可能是一个不错的选择。它还具有 OCR 的额外优势，使您可以复制文本并搜索它。这将花费更长的时间，因为 OCR 需要时间。请按如下方式使用它：

ls *.jpg | tesseract - output-file-basename pdf

（请注意，“output-file-basename”和“pdf”是单独的参数。）

如果需要/需要，你可以使用定量PDF之后对文档进行操作。

例如，使用非双面 ADF 扫描仪时，请先扫描所有正面，翻转整个堆栈，然后反向扫描所有背面，然后将两者合并：

qpdf --collate --empty --pages front-sides.pdf 1-z back-sides.pdf z-1 -- out.pdf

Answer

我最近了解到img2pdf，一个 Python 包。它的功能与名称所示一致：将图像包装到 PDF 文档中。它不执行 OCR，不对图像重新采样，一切都保持原样。它非常快。使用也非常简单：

img2pdf img1.png img2.jpg -o out.pdf

img2pdf至少在 Debian 上也可以作为软件包使用。

正如在img2pdf地点，立方体如果您有 JPEG 或 PNG，也可能是一个不错的选择。它还具有 OCR 的额外优势，使您可以复制文本并搜索它。这将花费更长的时间，因为 OCR 需要时间。请按如下方式使用它：

ls *.jpg | tesseract - output-file-basename pdf

（请注意，“output-file-basename”和“pdf”是单独的参数。）

如果需要/需要，你可以使用定量PDF之后对文档进行操作。

例如，使用非双面 ADF 扫描仪时，请先扫描所有正面，翻转整个堆栈，然后反向扫描所有背面，然后将两者合并：

qpdf --collate --empty --pages front-sides.pdf 1-z back-sides.pdf z-1 -- out.pdf

Question 4

使用 Python 和 PyMuPDF 的操作方法如下：

我必须安装“pymupdf” 包提供菲茨模块（不是“fitz”包，这是一个不同的包！）

python -m venv .
./bin/pip install pymupdf

（复制粘贴的代码来自https://pymupdf.readthedocs.io/en/latest/recipes-images.html#how-to-make-one-pdf-of-all-your-pictures-or-files）

我修改了代码以省略使用 PySimpleGUI，因为它也需要安装 TK。我还在文件列表中添加了基于正则表达式的过滤，允许仅包含与正则表达式匹配的文件。

#!./bin/python
import os, fitz, re # "fitz" comes from the "pymupdf" package
#import PySimpleGUI as psg  # for showing a progress bar
doc = fitz.open()  # PDF with the pictures
imgdir = "."  # folder where the pics are
allfiles = os.listdir(imgdir)  # list of them
imglistUNSORTED = list (filter (lambda fn: re.match (r'c-\d+.jpg', fn), allfiles))
imglist = sorted (imglistUNSORTED)
imgcount = len(imglist)  # pic count

for i, f in enumerate(imglist):
    img = fitz.open(os.path.join(imgdir, f))  # open pic as document
    rect = img[0].rect  # pic dimension
    pdfbytes = img.convert_to_pdf()  # make a PDF stream
    img.close()  # no longer needed
    imgPDF = fitz.open("pdf", pdfbytes)  # open stream as PDF
    page = doc.new_page(width = rect.width,  # new page with ...
                       height = rect.height)  # pic dimension
    page.show_pdf_page(rect, imgPDF, 0)  # image fills the page
    #psg.EasyProgressMeter("Import Images",  # show our progress
    #    i+1, imgcount)
    print ('Import Images: ' +str(i+1)+ '/' +str(imgcount))

doc.save("all-my-pics.pdf")

Answer

使用 Python 和 PyMuPDF 的操作方法如下：

我必须安装“pymupdf” 包提供菲茨模块（不是“fitz”包，这是一个不同的包！）

python -m venv .
./bin/pip install pymupdf

（复制粘贴的代码来自https://pymupdf.readthedocs.io/en/latest/recipes-images.html#how-to-make-one-pdf-of-all-your-pictures-or-files）

我修改了代码以省略使用 PySimpleGUI，因为它也需要安装 TK。我还在文件列表中添加了基于正则表达式的过滤，允许仅包含与正则表达式匹配的文件。

#!./bin/python
import os, fitz, re # "fitz" comes from the "pymupdf" package
#import PySimpleGUI as psg  # for showing a progress bar
doc = fitz.open()  # PDF with the pictures
imgdir = "."  # folder where the pics are
allfiles = os.listdir(imgdir)  # list of them
imglistUNSORTED = list (filter (lambda fn: re.match (r'c-\d+.jpg', fn), allfiles))
imglist = sorted (imglistUNSORTED)
imgcount = len(imglist)  # pic count

for i, f in enumerate(imglist):
    img = fitz.open(os.path.join(imgdir, f))  # open pic as document
    rect = img[0].rect  # pic dimension
    pdfbytes = img.convert_to_pdf()  # make a PDF stream
    img.close()  # no longer needed
    imgPDF = fitz.open("pdf", pdfbytes)  # open stream as PDF
    page = doc.new_page(width = rect.width,  # new page with ...
                       height = rect.height)  # pic dimension
    page.show_pdf_page(rect, imgPDF, 0)  # image fills the page
    #psg.EasyProgressMeter("Import Images",  # show our progress
    #    i+1, imgcount)
    print ('Import Images: ' +str(i+1)+ '/' +str(imgcount))

doc.save("all-my-pics.pdf")

创建带有图像的 PDF

答案1

答案2

答案3

答案4

相关内容