使用 ImageMagick 或 Ghostscript 将 PDF 转换为图像时出现重音问题

使用 ImageMagick 或 Ghostscript 将 PDF 转换为图像时出现重音问题

我在使用 ImageMagick 或 Ghostscript 将 PDF 转换为图像时遇到了问题。转换后的图像中所有带重音符号的字符都消失了。我发现有几个人遇到了同样的问题,显然更新 imagemagick 包和 ghostcript 可以解决这个问题,但对我来说没用。

我在进行的每个测试中都使用这个 PDF 文件:https://www.dropbox.com/s/3gso0sw1e1n8f9r/error-with-accents.pdf?dl=0

我在 Azure 上有一个 Ubuntu 14.04.2 LTS 服务器,我需要 ImageMagick 才能工作。在官方存储库中,我有 ImageMagick 6.7.7 和 Ghostscript 9.10。后来,我尝试升级它们以修复我的问题,现在我在/opt/imagemagick-6.8文件夹中也运行了 ImageMagick 6.8.9-10,并且我添加了 Ubuntu 的 15.04 存储库,这样我就可以直接通过 apt-get 安装 Ghostscript 9.15。这些都没有为我解决问题。

以下是我在 Ubuntu 14.04 服务器上的最新尝试:

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 14.04.2 LTS
Release:    14.04
Codename:   trusty

$ /opt/imagemagick-6.8/bin/convert -version
Version: ImageMagick 6.8.9-10 Q16 x86_64 2015-07-30 http://www.imagemagick.org
Copyright: Copyright (C) 1999-2014 ImageMagick Studio LLC
Features: DPC OpenMP
Delegates: jng jpeg png x xml zlib

$ /opt/imagemagick-6.8/bin/convert -list configure |grep DELEGATES
DELEGATES      mpeg jng jpeg png ps x xml zlib

$ /opt/imagemagick-6.8/bin/convert error-with-accents.pdf -verbose -alpha off -resample 150 -density 150 -quality '80' im-test.jpg
   **** Warning: considering '0000000000 XXXXX n' as a free entry.

   **** This file had errors that were repaired or ignored.
   **** The file was produced by: 
   **** >>>> Mac OS X 10.10.4 Quartz PDFContext <<<<
   **** Please notify the author of the software that produced this
   **** file that it does not conform to Adobe's published PDF
   **** specification.

error-with-accents.pdf=>im-test.jpg PDF 595x794=>1240x1654 1240x1654+0+0 16-bit sRGB 172KB 0.440u 0:00.240

$ gs -v
GPL Ghostscript 9.15 (2014-09-22)
Copyright (C) 2014 Artifex Software, Inc.  All rights reserved.

$ gs -dBATCH -dNOPAUSE -sDEVICE=jpeg -sOutputFile=gs-test.jpg error-with-accents.pdf 
GPL Ghostscript 9.15 (2014-09-22)
Copyright (C) 2014 Artifex Software, Inc.  All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
   **** Warning: considering '0000000000 XXXXX n' as a free entry.
Processing pages 1 through 1.
Page 1

   **** This file had errors that were repaired or ignored.
   **** The file was produced by: 
   **** >>>> Mac OS X 10.10.4 Quartz PDFContext <<<<
   **** Please notify the author of the software that produced this
   **** file that it does not conform to Adobe's published PDF
   **** specification.

$ convert -version
Version: ImageMagick 6.7.7-10 2014-03-06 Q16 http://www.imagemagick.org
Copyright: Copyright (C) 1999-2012 ImageMagick Studio LLC
Features: OpenMP    

$ convert -list configure |grep DELEGATES
DELEGATES     bzlib djvu fftw fontconfig freetype jbig jpeg jng jp2 lcms2 lqr lzma openexr pango png rsvg tiff x11 xml wmf zlib

$ convert error-with-accents.pdf -verbose -alpha off -resample 150 -density 150 -quality '80' im-test-6.7.7.jpg
   **** Warning: considering '0000000000 XXXXX n' as a free entry.

   **** This file had errors that were repaired or ignored.
   **** The file was produced by: 
   **** >>>> Mac OS X 10.10.4 Quartz PDFContext <<<<
   **** Please notify the author of the software that produced this
   **** file that it does not conform to Adobe's published PDF
   **** specification.

error-with-accents.pdf=>im-test-6.7.7.jpg PDF 595x794=>1240x1654 1240x1654+0+0 16-bit DirectClass 160KB 0.490u 0:00.279


我可以在 Mac OS 上正确运行 Ghostscript 和 ImageMagick。并且,根据这个帖子,我在 Ubuntu 上安装的版本应该可以正常工作。所以我认为这与 FreeType 字体有关。我不知道如何修复这个问题。有什么帮助吗?


谢谢Stackoverflow 上的 Kurt Pfeifle寻找答案。

问题出在服务器上安装的 Ghostscript 版本。由于 Ubuntu Wily 存储库中 Ghostscript 的最新版本是 9.15,因此我下载了官方适用于 Linux x64 的二进制包在 Ghostscript 网站上。



当我尝试打印带有重音符号的 PDF 时,我遇到了同样的问题。我得出的结论是,这是 ghostscript 的问题,因为 CUPS 在通过gstoraster过滤器对 PDF 进行栅格化时使用了它。我还意识到,最近在独立模式下运行的 ghostscript 二进制文件运行良好。

我不建议替代,/usr/bin/gs因为它可能会破坏一些依赖关系(例如 CUPS)!

