使用或不使用 pdfx 验证 pdf

使用或不使用 pdfx 验证 pdf

以下文档使用hyperref说明了将元数据嵌入 pdflatex 输出的两种方法:

  1. 使用pdfx带有选项的包a-2u— — 当标志validate设置为时会发生这种情况true,如下所示。
  2. 而是使用包含选项hyperxmp的包— — 如果标志设置为“false”,就会发生这种情况。\hyperrefsetpdfapart=2, pdfaconformance=uvalidate

这两种方法似乎都将有关标题、作者等的基本相同的元数据嵌入到 pdf 中。

使用方法 1,pdf 输出文件通过 PDF/A-2U 验证(例如使用 veraPDF 应用程序)。然而,它失败使用方法 2 进行验证。(请参阅最后的 veraPDF 报告摘录。)

问题: 怎样才能使方法 2 的验证成功?

来源:

\RequirePackage{filecontents}

\begin{filecontents}{\jobname metadata.xmp}
\hyxmp@at@end{%
  % Create XMP code and write it to macro \hyxmp@xml
  % (cf. hyperxmp.sty, \hyxmp@construct@packet (ll. 847-868))
  \gdef\hyxmp@xml{}%
  \hyxmp@add@to@xml{%
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="3.1-702">^^J%
___<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns\hyxmp@hash">^^J%
  }%
  \hyxmp@pdf@schema
  \hyxmp@xmpRights@schema
  \hyxmp@dc@schema
  \hyxmp@photoshop@schema
  \hyxmp@photometa@schema
  \hyxmp@xmp@basic@schema
  \hyxmp@mm@schema
  \hyxmp@add@to@xml{%
___</rdf:RDF>^^J%
</x:xmpmeta>^^J%
  }%
}
\end{filecontents}

\begin{filecontents}{\jobname.xmpdata}
\Title{A Book}
\Author{Anonymous}
\Language{en-US}
\Keywords{things\sep stuff}
\Subject{matters}
\Publisher{Anonymous}
\Copyright{\copyright 2020 The Company CC-BY-NC-ND}
\CopyrightURL{https://creativecommons.org/licenses/by-nc-nd/4.0/}
\PublicationType{book}
\Lastpage{3}
\Date{2020-06-26}
\CoverDisplayDate{June\ 26,\ 2020}
\CoverDate{2020-06-26}
\end{filecontents}

\documentclass{book}

\usepackage{ifthen}
\newboolean{validate}
\setboolean{validate}{true}  

\ifthenelse{\boolean{validate}}%
  {\RequirePackage[a-2u]{pdfx}%
    \pdfglyphtounicode{EM}{0058 0058 0058 0058 0058 0058 0058 0058}%
    \pdfglyphtounicode{NUL}{0060 0060 0060 0060 0060 0060 0060 0060}%
    \RequirePackage[type={CC},modifier={by-nc-nd},version={4.0},lang={english}]{doclicense}
    \usepackage{hyperref}
    \hypersetup{
       pdfa,
       bookmarksnumbered,
       pdftitle={A Book}, pdfauthor={Anonymous}, pdfcreator={somebody},
       pdfsubject={A general introducton to things}, pdfkeywords={things, stuff},
   }  %  
}%
  {\RequirePackage{hyperxmp} % to add CC info into pdf
   \RequirePackage[type={CC},modifier={by-nc-nd},version={4.0},lang={english}]{doclicense}
   \usepackage[pdfa]{hyperref}
   \hypersetup{
      pdfapart=2, pdfaconformance=u,
      bookmarksnumbered,
      pdftitle={A Book}, pdfauthor={Anonymous}, pdfcreator={somebody},
      pdfsubject={A general introducton to things}, pdfkeywords={things, stuff},
      pdflicenseurl={http://creativecommons.org/licenses/by-nc-nd/4.0/}
    }%
    \immediate\pdfobj stream attr{/N 3} file{sRGB.icc}
    \pdfcatalog{%
        /OutputIntents [
           <<
             /Type /OutputIntent
             /S /GTS_PDFA2
             /DestOutputProfile \the\pdflastobj\space 0 R
             /OutputConditionIdentifier (sRGB)
             /Info (sRGB)
          >>
      ]
    }
}

\newcommand\mytitle{A Book}
\newcommand\myauthor{Anonymous}
\newcommand\myabstract{An introduction to things in general.}
\newcommand\mydate{\today}
\title{\mytitle}
\author{\myauthor}
\date{\mydate}

\usepackage{newtxtext,newtxmath}
\usepackage[french,ngerman,russian,main=english]{babel}

\usepackage{blindtext}

\begin{document}
\maketitle
\blindmathpaper
\end{document}

方法 2 的验证失败报告:

<rule specification="ISO 19005-2:2011" clause="6.2.4.3" testNumber="4" status="failed" passedChecks="0" failedChecks="230">
   <description>DeviceGray shall only be used if a device independent DefaultGray colour space has been set when the DeviceGray colour space is used,
    or if a PDF/A OutputIntent is present.</description>
   <object>PDDeviceGray</object>
   <test>gOutputCS != null</test>
   <check status="failed">
     <context>root/document[0]/pages[2](23 0 obj PDPage)/contentStream[0](24 0 obj PDContentStream)/operators[170]/fillCS[0]</context>
   </check>
   <check status="failed">
     <context>root/document[0]/pages[2](23 0 obj PDPage)/contentStream[0](24 0 obj PDContentStream)/operators[168]/fillCS[0]</context>
   </check>
   .... [many more similar failures] 
</rule>
<rule specification="ISO 19005-2:2011" clause="6.2.11.7" testNumber="1" status="failed" passedChecks="0" failedChecks="27">
   <description>The Font dictionary of all fonts shall define the map of all used character codes to Unicode values, either via a ToUnicode entry,
    or other mechanisms as defined in ISO 19005-2, 6.2.11.7.2.</description>
   <object>Glyph</object>
   <test>toUnicode != null</test>
   <check status="failed">
     <context>root/document[0]/pages[2](23 0 obj PDPage)/contentStream[0](24 0 obj PDContentStream)/operators[150]/usedGlyphs[1](YKMMCJ+NewTXMI 67 0  0)</context>
   </check>
   <check status="failed">
     <context>root/document[0]/pages[2](23 0 obj PDPage)/contentStream[0](24 0 obj PDContentStream)/operators[150]/usedGlyphs[0](YKMMCJ+NewTXMI 109 0  0)</context>
   </check>
   <check status="failed">
     <context>root/document[0]/pages[2](23 0 obj PDPage)/contentStream[0](24 0 obj PDContentStream)/operators[135]/usedGlyphs[0](PGXZDZ+txmiaX 8 0  0)</context>
   </check>
   <check status="failed">
     <context>root/document[0]/pages[2](23 0 obj PDPage)/contentStream[0](24 0 obj PDContentStream)/operators[123]/usedGlyphs[0](YKMMCJ+NewTXMI 50 0  0)</context>
   </check>
   .... [more like the above] 
   <check status="failed">
     <context>root/document[0]/pages[1](12 0 obj PDPage)/contentStream[0](13 0 obj PDContentStream)/operators[601]/usedGlyphs[0](GEXQPI+txsys 0 0  0)</context>
   </check>
   <check status="failed">
     <context>root/document[0]/pages[1](12 0 obj PDPage)/contentStream[0](13 0 obj PDContentStream)/operators[595]/usedGlyphs[0](YKMMCJ+NewTXMI 63 0  0)</context>
   </check>
   <check status="failed">
     <context>root/document[0]/pages[1](12 0 obj PDPage)/contentStream[0](13 0 obj PDContentStream)/operators[580]/usedGlyphs[0](UOCZFW+txexs 112 0  0)</context>
   </check>
   .... [more like the above]
</rule>

有关的:

pdfx + hyperref 阻止设置 PDF 元数据 [不兼容的软件包]

是否可以在 LaTeX 中同时使用 hyperxmp 和 xmpincl?

添加于 2020-06-29:

我一直如何在 pdf 中找到需要 \pdfglyphtounicode 来允许验证的字形?讨论从答案的评论开始https://tex.stackexchange.com/a/551291/13492

答案1

以下验证。主要错误是 outputintent 的子类型应该是/GTS_PDFA1/GTS_PDFA2但根据 pdf 参考,它不存在。

除此之外,您的字体中的一些字形没有 unicode 表示,我添加了虚拟含义。我没有尝试包含额外的 xmp 数据。

\documentclass{article}

\usepackage{hyperxmp}
\RequirePackage[type={CC},modifier={by-nc-nd},version={4.0},lang={english}]{doclicense}
 \usepackage[pdfa]{hyperref}
   \hypersetup{
      pdfapart=2, pdfaconformance=u,
      bookmarksnumbered,
      pdftitle={A Book}, pdfauthor={Anonymous}, pdfcreator={somebody},
      pdfsubject={A general introducton to things}, pdfkeywords={things, stuff},
      pdflicenseurl={http://creativecommons.org/licenses/by-nc-nd/4.0/}
    }%
    \input{glyphtounicode}
    \pdfgentounicode=1
    \pdfglyphtounicode{EM}{0058 0058 0058 0058 0058 0058 0058 0058}%
    \pdfglyphtounicode{NUL}{0060 0060 0060 0060 0060 0060 0060 0060}%
    \pdfglyphtounicode{uni222B.dsp}{222B}%
    \pdfglyphtounicode{summationdisplay.1}{0060 0060 0060 0060 0060 0060 0060 0060}%
    \pdfglyphtounicode{summationdisplay}{0060 0060 0060 0060 0060 0060 0060 0060}%    
    \pdfglyphtounicode{radicalBigg}{0060 0060 0060 0060 0060 0060 0060 0060}%
    \pdfglyphtounicode{radicalbig}{0060 0060 0060 0060 0060 0060 0060 0060}%
    \pdfglyphtounicode{radicalbigg}{0060 0060 0060 0060 0060 0060 0060 0060}%
    \immediate\pdfobj stream attr{/N 3} file{sRGB.icc}
    \pdfcatalog{%
        /OutputIntents [
         <<
             /Type /OutputIntent
             /S /GTS_PDFA1
             /DestOutputProfile \the\pdflastobj\space 0 R
             /OutputConditionIdentifier (sRGB)
             /Info (sRGB)
          >>
      ]
    }

\newcommand\mytitle{A Book}
\newcommand\myauthor{Anonymous}
\newcommand\myabstract{An introduction to things in general.}
\newcommand\mydate{\today}
\title{\mytitle}
\author{\myauthor}
\date{\mydate}

\usepackage{newtxtext,newtxmath}
\usepackage[french,ngerman,russian,main=english]{babel}

\usepackage{blindtext}

\begin{document}
abc abc
\maketitle
\blindmathpaper
\end{document}

相关内容