如何在 pdf 中找到需要 \pdfglyphtounicode 来允许验证的字形？

2024-5-25 • tag-icon

$如何在 pdf 中找到需要 \pdfglyphtounicode 来允许验证的字形？$

这个问题延续了对https://tex.stackexchange.com/a/551291/13492。

问题：

确定 pdf 输出中所有阻止 pdf 验证的字形（如下面的 MWE 所示）（在标准 PDF a 2-u 下）。也就是说，因为文件不符合规则“所有字体的字体字典应定义所有使用的字符代码到 Unicode 值的映射，无论是通过 ToUnicode 条目还是 ISO 19005-2, 6.2.11.7.2 中定义的其他机制。”

这两个命令\pdfcompresslevel=0使\pdfobjcompresslevel=0pdflatex 的输出成为可读的纯 ASCII 文件。

问题：

我该在 ascii pdf 中寻找什么以便随后包含合适的\pdfglyphtounicode命令？

如果我首先使用 pdflatex 处理文件，如图所示，然后再次使用两行

    \pdfglyphtounicode{summationdisplay.1}{0060 0060 0060 0060 0060 0060 0060 0060}%
    \pdfglyphtounicode{summationdisplay}{0060 0060 0060 0060 0060 0060 0060 0060}%

注释掉，然后我检查生成的 ascii pdf，我发现的唯一实质性差异是：

源代码中包含这两行后，pdf 会包含以下行

重复 213 /summationdisplay.1 放入

然后是这一行：

/CharSet (/radicalBigg/radicalbig/radicalbigg/summationdisplay.1/uni222B.dsp)

注释掉这两行后，pdf 包含dup 213 /summationdisplay.1 put两次，但不是包括/CharSet (/radicalBigg/radicalbig/radicalbigg/summationdisplay.1/uni222B.dsp)。

错误推断：

从上面，我很想推断，我需要\pdfglyphtounicode在源代码中包含出现在 GREP 形式的 pdf 文件行中的字形名称的命令

dup [0-9]+ /\S+ put

但这肯定是不对的！毕竟，pdf 文件包含许多例如这样的行：

dup 149 /period put
dup 48 /u1D44E put
dup 49 /u1D44F put

dup 150 /comma put
dup 56 /u1D456 put
dup 58 /u1D458 put

dup 115 /radicalBigg put
dup 112 /radicalbig put
dup 114 /radicalbigg put
dup 213 /summationdisplay.1 put
dup 185 /uni222B.dsp put

dup 61 /equal put
dup 8 /uni03A6 put

dup 33 /arrowright put
dup 49 /infinity put
dup 0 /minus put
dup 184 /plus put
dup 6 /plusminus put
dup 112 /radical put

dup 33 /A put
dup 34 /B put
dup 40 /H put
dup 41 /I put
dup 42 /J put
dup 43 /K put
dup 50 /R put
dup 52 /T put
dup 65 /a put
dup 66 /b put
dup 67 /c put
dup 12 /comma put
dup 68 /d put
dup 69 /e put
dup 1 /exclam put
dup 70 /f put
dup 20 /four put
dup 71 /g put
dup 72 /h put

妇女权利委员会：

\documentclass{article}

% To examine pdf as pure ASCII:
\pdfcompresslevel=0
\pdfobjcompresslevel=0

\usepackage{hyperxmp}
\RequirePackage[type={CC},modifier={by-nc-nd},version={4.0},lang={english}]{doclicense}
 \usepackage[pdfa]{hyperref}
   \hypersetup{
      pdfapart=2, pdfaconformance=u,
      bookmarksnumbered,
      pdftitle={A Book}, pdfauthor={Anonymous}, pdfcreator={somebody},
      pdfsubject={A general introducton to things}, pdfkeywords={things, stuff},
      pdflicenseurl={http://creativecommons.org/licenses/by-nc-nd/4.0/}
    }%
    \input{glyphtounicode}
    \pdfgentounicode=1
    \pdfglyphtounicode{EM}{0058 0058 0058 0058 0058 0058 0058 0058}%
    \pdfglyphtounicode{NUL}{0060 0060 0060 0060 0060 0060 0060 0060}%
    \pdfglyphtounicode{uni222B.dsp}{222B}%
    \pdfglyphtounicode{summationdisplay.1}{0060 0060 0060 0060 0060 0060 0060 0060}%
    \pdfglyphtounicode{summationdisplay}{0060 0060 0060 0060 0060 0060 0060 0060}%    
    \pdfglyphtounicode{radicalBigg}{0060 0060 0060 0060 0060 0060 0060 0060}%
    \pdfglyphtounicode{radicalbig}{0060 0060 0060 0060 0060 0060 0060 0060}%
    \pdfglyphtounicode{radicalbigg}{0060 0060 0060 0060 0060 0060 0060 0060}%
    \immediate\pdfobj stream attr{/N 3} file{sRGB.icc}
    \pdfcatalog{%
        /OutputIntents [
         <<
             /Type /OutputIntent
             /S /GTS_PDFA1
             /DestOutputProfile \the\pdflastobj\space 0 R
             /OutputConditionIdentifier (sRGB)
             /Info (sRGB)
          >>
      ]
    }

\newcommand\mytitle{A Book}
\newcommand\myauthor{Anonymous}
\newcommand\myabstract{An introduction to things in general.}
\newcommand\mydate{\today}
\title{\mytitle}
\author{\myauthor}
\date{\mydate}

\usepackage{newtxtext,newtxmath}
\usepackage[french,ngerman,russian,main=english]{babel}

\usepackage{blindtext}

\begin{document}
abc abc
\maketitle
\blindmathpaper
\end{document}