我遇到了一个问题,包含普通 pdf 会导致我的文档不符合 pdf/a 标准,因为包含字体。我的模板使用 pdfx 来实现 pdf/a 兼容性,我使用 pdfpages 来包含 pdf,并使用 Lualatex 进行构建。
\documentclass[a4paper]{report}
\usepackage{pdfpages}
\begin{document}
\tableofcontents
\includepdf{test.pdf}
\end{document}
不幸的是,我无法提供 test.pdf 本身,因为它包含一些机密信息。PDF 嵌入了所有字体,但本身不符合 pdf/a 标准。当我使用 verapdf 验证 pdf(包括 test.pdf 的整个文档)时,我得到:
<?xml version="1.0" encoding="utf-8"?>
<report>
<buildInformation>
<releaseDetails id="core" version="1.18.11" buildDate="2021-04-19T10:21:00+02:00"></releaseDetails>
<releaseDetails id="validation-model" version="1.18.8" buildDate="2021-04-19T10:35:00+02:00"></releaseDetails>
<releaseDetails id="gui" version="1.18.6" buildDate="2021-04-27T08:53:00+02:00"></releaseDetails>
</buildInformation>
<jobs>
<job>
<item size="718642">
<name>/path/to/file.pdf</name>
</item>
<validationReport profileName="PDF/A-2B validation profile" statement="PDF file is not compliant with Validation Profile requirements." isCompliant="false">
<details passedRules="121" failedRules="1" passedChecks="58863" failedChecks="2">
<rule specification="ISO 19005-2:2011" clause="6.2.11.4" testNumber="4" status="failed" passedChecks="0" failedChecks="2">
<description>If the FontDescriptor dictionary of an embedded CID font contains a CIDSet stream, then it shall identify all CIDs which are present in the font program,
regardless of whether a CID in the font is referenced or used by the PDF or not.</description>
<object>PDCIDFont</object>
<test>fontFile_size == 0 || fontName.search(/[A-Z]{6}\+/) != 0 || CIDSet_size == 0 || cidSetListsAllGlyphs == true</test>
<check status="failed">
<context>root/document[0]/pages[1](26 0 obj PDPage)/contentStream[0](27 0 obj PDContentStream)/operators[7]/xObject[0]/contentStream[0](24 0 obj PDContentStream)/operators[72]/font[0](KYIDZT+SymbolMT)/DescendantFonts[0](KYIDZT+SymbolMT)</context>
</check>
<check status="failed">
<context>root/document[0]/pages[1](26 0 obj PDPage)/contentStream[0](27 0 obj PDContentStream)/operators[7]/xObject[0]/contentStream[0](24 0 obj PDContentStream)/operators[234]/font[0](MAYIVC+FrontPagePro-Medium)/DescendantFonts[0](MAYIVC+FrontPagePro-Medium)</context>
</check>
</rule>
</details>
</validationReport>
<duration start="1675365559348" finish="1675365560315">00:00:00.967</duration>
</job>
</jobs>
<batchSummary totalJobs="1" failedToParse="0" encrypted="0">
<validationReports compliant="0" nonCompliant="1" failedJobs="0">1</validationReports>
<featureReports failedJobs="0">0</featureReports>
<repairReports failedJobs="0">0</repairReports>
<duration start="1675365559279" finish="1675365560330">00:00:01.051</duration>
</batchSummary>
</report>
使用 pdffonts,我可以看到所有字体都嵌入在 pdf 中:
name type encoding emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
TTJGTY+Roboto-Regular CID Type 0C Identity-H yes yes yes 13 0
QMJPLU+Roboto-Bold CID Type 0C Identity-H yes yes yes 15 0
MAYIVC+FrontPagePro-Medium CID TrueType Identity-H yes yes yes 35 0
KYIDZT+SymbolMT CID TrueType Identity-H yes yes yes 36 0
PGCUGW+Charter-Bold TrueType WinAnsi yes yes no 37 0
DAESFN+FrontPageMedium TrueType WinAnsi yes yes no 38 0
KUYJMB+FrontPagePro-Medium TrueType WinAnsi yes yes no 39 0
IJGRYF+Charter TrueType WinAnsi yes yes no 40 0
BHNTXR+Stafford TrueType WinAnsi yes yes no 41 0
GUTOKG+ArialMT Type 1C WinAnsi yes yes no 42 0
KKKIUW+XCharter-Roman CID Type 0C Identity-H yes yes yes 72 0
AIWZML+XCharter-Bold CID Type 0C Identity-H yes yes yes 117 0
FMADMD+XCharter-Italic CID Type 0C Identity-H yes yes yes 158 0
KEZTQT+RobotoMono-Regular CID Type 0C Identity-H yes yes yes 204 0
为了进一步分解,我仅在所包含的文档上运行了 pdffonts:
name type encoding emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
MAYIVC+FrontPagePro-Medium CID TrueType Identity-H yes yes yes 25 0
KYIDZT+SymbolMT CID TrueType Identity-H yes yes yes 15 0
PGCUGW+Charter-Bold TrueType WinAnsi yes yes no 20 0
DAESFN+FrontPageMedium TrueType WinAnsi yes yes no 8 0
KUYJMB+FrontPagePro-Medium TrueType WinAnsi yes yes no 10 0
IJGRYF+Charter TrueType WinAnsi yes yes no 12 0
BHNTXR+Stafford TrueType WinAnsi yes yes no 22 0
GUTOKG+ArialMT Type 1C WinAnsi yes yes no 18 0
因此,我可以看到所有字体都已嵌入,但它们的编码或打包方式使我的文档不再符合 PDF/A 标准。为了验证,我关闭了文档包含,并检查了没有包含 PDF 的文档;它通过了 PDF/A 合规性检查。
如果我在 test.pdf(包含的 pdf)上运行 verapub,我会得到:
<?xml version="1.0" encoding="utf-8"?>
<report>
<buildInformation>
<releaseDetails id="core" version="1.18.11" buildDate="2021-04-19T10:21:00+02:00"></releaseDetails>
<releaseDetails id="validation-model" version="1.18.8" buildDate="2021-04-19T10:35:00+02:00"></releaseDetails>
<releaseDetails id="gui" version="1.18.6" buildDate="2021-04-27T08:53:00+02:00"></releaseDetails>
</buildInformation>
<jobs>
<job>
<item size="103546">
<name>/path/to/test.pdf</name>
</item>
<validationReport profileName="PDF/A-1B validation profile" statement="PDF file is not compliant with Validation Profile requirements." isCompliant="false">
<details passedRules="94" failedRules="7" passedChecks="9353" failedChecks="100">
<rule specification="ISO 19005-1:2005" clause="6.3.7" testNumber="3" status="failed" passedChecks="0" failedChecks="5">
<description>Font programs' "cmap" tables for all symbolic TrueType fonts shall contain exactly one encoding</description>
<object>TrueTypeFontProgram</object>
<test>isSymbolic == false || nrCmaps == 1</test>
<check status="failed">
<context>root/document[0]/pages[0](4 0 obj PDPage)/contentStream[0](6 0 obj PDContentStream)/operators[8]/font[0](DAESFN+FrontPageMedium)/fontFile[0]</context>
</check>
<check status="failed">
<context>root/document[0]/pages[0](4 0 obj PDPage)/contentStream[0](6 0 obj PDContentStream)/operators[12]/font[0](KUYJMB+FrontPagePro-Medium)/fontFile[0]</context>
</check>
<check status="failed">
<context>root/document[0]/pages[0](4 0 obj PDPage)/contentStream[0](6 0 obj PDContentStream)/operators[23]/font[0](IJGRYF+Charter)/fontFile[0]</context>
</check>
<check status="failed">
<context>root/document[0]/pages[0](4 0 obj PDPage)/contentStream[0](6 0 obj PDContentStream)/operators[161]/font[0](PGCUGW+Charter-Bold)/fontFile[0]</context>
</check>
<check status="failed">
<context>root/document[0]/pages[0](4 0 obj PDPage)/contentStream[0](6 0 obj PDContentStream)/operators[188]/font[0](BHNTXR+Stafford)/fontFile[0]</context>
</check>
</rule>
<rule specification="ISO 19005-1:2005" clause="6.2.3" testNumber="2" status="failed" passedChecks="0" failedChecks="88">
<description>DeviceRGB may be used only if the file has a PDF/A-1 OutputIntent that uses an RGB colour space</description>
<object>PDDeviceRGB</object>
<test>gOutputCS != null && gOutputCS == "RGB "</test>
<check status="failed">
<context>root/document[0]/pages[0](4 0 obj PDPage)/contentStream[0](6 0 obj PDContentStream)/operators[2]/colorSpace[0]</context>
</check>
<check status="failed">
<context>root/document[0]/pages[0](4 0 obj PDPage)/contentStream[0](6 0 obj PDContentStream)/operators[8]/fillCS[0]</context>
</check>
<check status="failed">
<context>root/document[0]/pages[0](4 0 obj PDPage)/contentStream[0](6 0 obj PDContentStream)/operators[12]/fillCS[0]</context>
</check>
<check status="failed">
<context>root/document[0]/pages[0](4 0 obj PDPage)/contentStream[0](6 0 obj PDContentStream)/operators[14]/fillCS[0]</context>
</check>
<check status="failed">
<context>root/document[0]/pages[0](4 0 obj PDPage)/contentStream[0](6 0 obj PDContentStream)/operators[18]/fillCS[0]</context>
</check>
<check status="failed">
<context>root/document[0]/pages[0](4 0 obj PDPage)/contentStream[0](6 0 obj PDContentStream)/operators[20]/fillCS[0]</context>
</check>
<check status="failed">
<context>root/document[0]/pages[0](4 0 obj PDPage)/contentStream[0](6 0 obj PDContentStream)/operators[23]/fillCS[0]</context>
</check>
<check status="failed">
<context>root/document[0]/pages[0](4 0 obj PDPage)/contentStream[0](6 0 obj PDContentStream)/operators[25]/fillCS[0]</context>
</check>
<check status="failed">
<context>root/document[0]/pages[0](4 0 obj PDPage)/contentStream[0](6 0 obj PDContentStream)/operators[27]/fillCS[0]</context>
</check>
<check status="failed">
<context>root/document[0]/pages[0](4 0 obj PDPage)/contentStream[0](6 0 obj PDContentStream)/operators[29]/fillCS[0]</context>
</check>
<check status="failed">
<context>root/document[0]/pages[0](4 0 obj PDPage)/contentStream[0](6 0 obj PDContentStream)/operators[31]/fillCS[0]</context>
</check>
<check status="failed">
<context>root/document[0]/pages[0](4 0 obj PDPage)/contentStream[0](6 0 obj PDContentStream)/operators[33]/fillCS[0]</context>
</check>
<check status="failed">
<context>root/document[0]/pages[0](4 0 obj PDPage)/contentStream[0](6 0 obj PDContentStream)/operators[35]/fillCS[0]</context>
</check>
<check status="failed">
<context>root/document[0]/pages[0](4 0 obj PDPage)/contentStream[0](6 0 obj PDContentStream)/operators[37]/fillCS[0]</context>
</check>
<check status="failed">
<context>root/document[0]/pages[0](4 0 obj PDPage)/contentStream[0](6 0 obj PDContentStream)/operators[39]/fillCS[0]</context>
</check>
<check status="failed">
<context>root/document[0]/pages[0](4 0 obj PDPage)/contentStream[0](6 0 obj PDContentStream)/operators[41]/fillCS[0]</context>
</check>
<check status="failed">
<context>root/document[0]/pages[0](4 0 obj PDPage)/contentStream[0](6 0 obj PDContentStream)/operators[44]/fillCS[0]</context>
</check>
</rule>
<rule specification="ISO 19005-1:2005" clause="6.1.7" testNumber="2" status="failed" passedChecks="0" failedChecks="1">
<description>The stream keyword shall be followed either by a CARRIAGE RETURN (0Dh) and LINE FEED (0Ah) character sequence
or by a single LINE FEED character. The endstream keyword shall be preceded by an EOL marker</description>
<object>CosStream</object>
<test>streamKeywordCRLFCompliant == true && endstreamKeywordEOLCompliant == true</test>
<check status="failed">
<context>root/indirectObjects[40](6 0)/directObject[0]</context>
</check>
</rule>
<rule specification="ISO 19005-1:2005" clause="6.1.8" testNumber="1" status="failed" passedChecks="0" failedChecks="3">
<description>The object number and generation number shall be separated by a single white-space character. The generation number and obj keyword
shall be separated by a single white-space character. The object number and endobj keyword shall each be preceded by an EOL marker. The obj and endobj
keywords shall each be followed by an EOL marker.</description>
<object>CosIndirect</object>
<test>spacingCompliesPDFA</test>
<check status="failed">
<context>root/indirectObjects[1](29 0)</context>
</check>
<check status="failed">
<context>root/indirectObjects[18](5 0)</context>
</check>
<check status="failed">
<context>root/indirectObjects[44](2 0)</context>
</check>
</rule>
<rule specification="ISO 19005-1:2005" clause="6.7.11" testNumber="1" status="failed" passedChecks="0" failedChecks="1">
<description>The PDF/A version and conformance level of a file shall be specified using the PDF/A Identification extension schema.</description>
<object>MainXMPPackage</object>
<test>Identification_size == 1</test>
<check status="failed">
<context>root/document[0]/metadata[0](46 0 obj PDMetadata)/XMPPackage[0]</context>
</check>
</rule>
<rule specification="ISO 19005-1:2005" clause="6.4" testNumber="2" status="failed" passedChecks="0" failedChecks="1">
<description>An XObject dictionary shall not contain the SMask key</description>
<object>PDXObject</object>
<test>containsSMask == false</test>
<check status="failed">
<context>root/document[0]/pages[0](4 0 obj PDPage)/contentStream[0](6 0 obj PDContentStream)/operators[291]/xObject[0](31 0 obj PDXImage)</context>
</check>
</rule>
<rule specification="ISO 19005-1:2005" clause="6.4" testNumber="3" status="failed" passedChecks="0" failedChecks="1">
<description>A Group object with an S key with a value of Transparency shall not be included in a form XObject.
A Group object with an S key with a value of Transparency shall not be included in a page dictionary</description>
<object>PDGroup</object>
<test>S != "Transparency"</test>
<check status="failed">
<context>root/document[0]/pages[0](4 0 obj PDPage)/Group[0](5 0 obj PDGroup)</context>
</check>
</rule>
</details>
</validationReport>
<duration start="1675365944670" finish="1675365945226">00:00:00.556</duration>
</job>
</jobs>
<batchSummary totalJobs="1" failedToParse="0" encrypted="0">
<validationReports compliant="0" nonCompliant="1" failedJobs="0">1</validationReports>
<featureReports failedJobs="0">0</featureReports>
<repairReports failedJobs="0">0</repairReports>
<duration start="1675365944608" finish="1675365945246">00:00:00.638</duration>
</batchSummary>
</report>
我正在努力弄清楚 verapdf 在 test.pdf 上给出的 cmap 错误是否与 verapdf 在主文档上突出显示的 CID 问题有关。基本上,我的问题是:我需要对这个 pdf 做什么才能让 pdfx 能够成功使整个文档符合 PDFA 标准?
答案1
这不是一个新问题。我不知道这个建议是否有效,但是这是 PDF 文件的链接(日期为 2022 年),其中包含有关您的问题的信息. 请参阅第 4 节。
尽管字体都是嵌入的,但就 PDF/A 而言,它们并不是“以相同的方式”嵌入的。我不认为对pdfx
软件包代码的任何修改都可以解决这个问题。因此,诀窍是拆解原始 PDF,然后使用 PDF 软件(而不是 LaTeX)重新构建它。然后,第二次构建会“以相同的方式”嵌入所有字体。这就是链接文件所说的。
编辑:之前在此处发布的 ghostscript 代码存在问题。进一步的研究表明,使用 ghostscript 可能会解决一个问题,但会导致其他问题(例如擦除元数据或更改字体名称)。正如 OP 发现的那样,最好的方法是使用外部软件。