PDF 轮廓中的编码不正确

PDF 轮廓中的编码不正确

我正在尝试正确编码 PDF 轮廓。问题是轮廓中的字母必须Ъ与页面上的字母相同,但事实并非如此Ú

dvitype显示如下:

xxx 'pdf: outline 1 << /Title (?) /Dest [ @thispage /FitH @ypos ] >>' non-ASCII character in xxx command!

下面是示例文件(我知道它有点长,但我尽力从原始文件中删除了尽可能多的不相关的部分)。运行:

$ tex -ini -enc '\input plain \dump'  
$ tex -fmt plain example.tex  
$ dvipdfmx example.dvi

示例.tex:

\font\tenbf=labx1000

\newtoks\gtitle % title of current major group

\newtoks\toksE \newtoks\toksF \newtoks\usersanitizer
\newif\iftokprocessed \newif\ifTnum \newif\ifinstr

\def\firstsecno#1.{\setbox0=\hbox{\toksA={#1.}\toksB={}%
    \maketoks}}
\def\addtokens#1#2{\edef\addtoks{\noexpand#1={\the#1#2}}\addtoks}
\def\poptoks#1#2|ENDTOKS|{\let\first=#1\toksD={#1}%
  \ifcat\noexpand\first0\countB=`#1\else\countB=0\fi\toksA={#2}}

\def\maketoksdone{\edef\st{\global\noexpand\toksA={\the\toksB}}\st}

\def\sanitizecommand#1#2{\addtokens\usersanitizer
       {\noexpand\dosanitizecommand\noexpand#1{#2}}}
\def\dosanitizecommand#1#2{\ifx\nxt#1\addF{#2}\fi}

\def\makeoutlinetoks{\Tnumfalse\afterassignment\makeolproctok\let\nxt= }
\def\makeolnexttok{\afterassignment\makeolproctok\let\nxt= }
\def\makeolgobbletok{\afterassignment\makeolnexttok\let\nxt= }
\def\addF#1{\addtokens\toksF{#1}\tokprocessedtrue}
% now comes a routine to "sanitize" section names, for pdf outlines
\def\makeolproctok{\tokprocessedfalse
  \let\next\makeolnexttok % default
  \ifx\nxt\outlinedone\let\next\outlinedone
  \else\ifx{\nxt \else\ifx}\nxt \Tnumfalse \instrfalse % skip braces
  \else\ifx$\nxt % or a $ sign
  \else\ifx^\nxt \addF^\else\ifx_\nxt \addF_% sanitize ^ and _
  \else\ifx\nxt\spacechar \addF\space
  \else\if\noexpand\nxt\relax % we have a control sequence; is it one we know?
    \ifx\nxt~\addF\space
    \else\ifx\nxt\onespace\addF\space
    \else\the\usersanitizer
    \iftokprocessed\else\makeolproctokctli
    \iftokprocessed\else\makeolproctokctlii
    \iftokprocessed\else\makeolproctokctliii % if not recognised, skip it
    \fi\fi\fi\fi\fi
   \else  % we don't have a control sequence, it's an ordinary char
    \ifx/\nxt \addF{\string\/}% quote chars special to PDF with backslash
    \else\ifx(\nxt \addF{\string\(}\else\ifx)\nxt \addF{\string\)}%
    \else\ifx[\nxt \addF{\string\[}\else\ifx]\nxt \addF{\string\]}%
    \else\expandafter\makeolproctokchar\meaning\nxt
   \fi\fi\fi\fi\fi\fi\fi\fi\fi\fi\fi\fi\fi
  \next
}
\def\makeolproctokchar#1 #2 #3{\addF{#3}}
\def\makeolproctokctli{%
  \ifx\nxt\CEE\addF{C}\let\next\makeolgobbletok % \CEE/
  \else\ifx\nxt\UNIX\addF{UNIX}\let\next\makeolgobbletok % \UNIX/
  \else\ifx\nxt\TEX\addF{TeX}\let\next\makeolgobbletok % \TEX/
  \else\ifx\nxt\TeX\addF{TeX}\else\ifx\nxt\LaTeX\addF{LaTeX}%
  \else\ifx\nxt\CPLUSPLUS\addF{C++}\let\next\makeolgobbletok % \CPLUSPLUS/
  \else\ifx\nxt\Cee\addF{C}%
  \else\ifx\nxt\PB \let\next\makeolgobbletok \tokprocessedtrue % \PB{...}
  \else\ifx\nxt\.\tokprocessedtrue\instrtrue % \.{...}
      % skip \|
  \else\ifx\nxt\\\ifinstr\addF{\bschar\bschar}\else\tokprocessedtrue\fi
  \else\ifx\nxt\&\ifinstr\addF&\else\tokprocessedtrue\fi
  \else\ifx\nxt\~\ifTnum\addF{0}\else\addF\tildechar\fi % 077->\T{\~77}
  \else\ifx\nxt\_\ifTnum\addF{E}\else\addF_\fi  % 0.1E5->\T{0.1\_5}
  \else\ifx\nxt\^\ifTnum\addF{0x}\else\addF^\fi  % 0x77 -> \T{\^77}
  \else\ifx\nxt\$\ifTnum\tokprocessedtrue\else\addF$\fi % \T{77\$L}
  \else\ifx\nxt\{\addF\lbchar       \else\ifx\nxt\}\addF\rbchar
  \else\ifx\nxt\ \addF\space        \else\ifx\nxt\#\addF{\string\#}%
  \else\ifx\nxt\PP\addF{++}\else\ifx\nxt\MM\addF{--}%
  \fi\fi\fi\fi\fi\fi\fi\fi\fi\fi\fi\fi\fi\fi\fi\fi\fi\fi\fi\fi\fi
}

\def\outlinedone{\edef\outlinest{\global\noexpand\toksE={\the\toksF}}%
  \outlinest\let\outlinedone=\relax}

\def\lapstar{\rlap{*}}
\def\stsec{\rightskip=0pt % get out of C mode (cf. \B)
  \sfcode`;=1500 \pretolerance 200 \hyphenpenalty 50 \exhyphenpenalty 50
  \noindent{\let\*=\lapstar\bf\secstar.\quad}%
  \smash{\raise\baselineskip\hbox to0pt{%
     \let\*=\empty\special{%
       pdf: dest (\romannumeral\secstar) [ @thispage /FitH @ypos ]}}}}
\let\startsection=\stsec

\def\MN#1{\par % common code for \M, \N
  {\xdef\secstar{#1}\let\*=\empty\xdef\secno{#1}}% remove \* from section name
  \ifx\secno\secstar\fi
  \mark{{{\tensy x}\secno}{1}{\the\gtitle}}}
\let\ZZ=\let % now you can \write the control sequence \ZZ

\let\page=\pagebody \raggedbottom
\def\startpdf{
    {\special{pdf: docview << /PageMode /UseOutlines >>}}}

\newwrite\cont
\output{\setbox0=\page % the first page is garbage
  \openout\cont=\jobname.toc
  \global\output{\shipout\vbox{
    \vbox to 9in{
    \hbox to 6.5in{\vbox to10pt{}}
    \vfill\page}}}}

\vbox to \vsize{} % the first \topmark won't be null

\def\makebookmarks{\let\ZZ=\writebookmarkline \readcontents\relax}
\def\expnumber#1{\expandafter\ifx\csname#1\endcsname\relax 0%
  \else \csname#1\endcsname \fi} % Petr Olsak's macros from texinfo.tex

\def\writebookmarkline#1#2#3#4#5{{%
  \let\(=\let \let\)=\let \let\[=\let \let\]=\let \let\/=\let
  \pdfoutline goto num #3 count -\expnumber{chunk#2.#3} {#5}}}

\def\main#1#2#3.{% beginning of starred section
  \toksF={}\makeoutlinetoks#3\outlinedone\outlinedone
  \gtitle={#3}\MN{#2}%
  \vfil\eject
  \def\stripprefix##1>{}\def\gtitletoks{#3}%
  \edef\gtitletoks{\expandafter\stripprefix\meaning\gtitletoks}%
  \edef\next{\write\cont{\ZZ{\gtitletoks}{#1}{\secno}% write to contents file
   {\noexpand\the\pageno}{\the\toksE}}}\next % \ZZ{title}{depth}{sec}{page}{ss}
  \special{pdf: outline #1 << /Title (\the\toksE) /Dest
    [ @thispage /FitH @ypos ] >>}
  \startsection{\bf#3.\quad}\ignorespaces}

\mubytein=1 \mubyteout=2
\mubyte ^^da  ^^d0^^aa\endmubyte

\main{1}{1}Ъ.

\bye

编辑:
总结一下答案,这些是我当前的设置(没有中间包):

\newbox\mybox
\let\oldshipout\shipout
\def\shipout{\afterassignment\myboat\setbox\mybox=}
\def\myboat{\aftergroup\myship}
\def\myship{\setbox\mybox=\vbox{\special{pdf:tounicode UTF8-UCS2}\unvbox\mybox}\oldshipout\box\mybox\global\let\shipout\oldshipout}

(想法来自 quire.tex)

答案1

假设输入编码为 UTF-8,并且 encTeX 为:

\font\tenbf=labx1000
\mubytein=1 \mubyteout=2 \specialout=2 \mubyte ^^da ^^d0^^aa\endmubyte
\special{pdf: tounicode UTF8-UCS2}
\special{pdf: outline 1 << /Title (Ъ) /Dest
[ @thispage /FitH @ypos ] >>}
\bf Ъ.
\bye

就你的例子而言,你正在重新定义

\output

因此请在 example.tex 顶部添加以下两行:

\input atbegshi.sty
\AtBeginShipoutFirst{\special{pdf: tounicode UTF8-UCS2}}

这里 atbegshi.sty 是一个由 Heiko Oberdiek 开发的包。

编辑:
除了使用 atbegshi.sty,您还可以使用更直接的方式来添加“tounicode”特殊字符:

...
dviasm -o example.dump example.dvi
perl -i -pe "s/(?<=\[page 1 0 0 0 0 0 0 0 0 0\])/\nxxx: 'pdf:tounicode UTF8-UCS2'/" example.dump
dviasm -o example.dvi example.dump
dvipdfmx example.dvi

答案2

问题是,PDF 字符串只有两种记录在案的编码(用于 PDFoutlines、PDFinfo 等):第一种是默认的:PDFDocEncoding。它是单字节编码,但在非英语和非西欧语言(如捷克语)中不可用。第二种:PDFunicode 编码。它是从 UTF16 派生的特殊双字节编码,以 FEFF(十六进制)开头。每个 PDF 字符串都必须以此前缀作为前缀,并采用 UTF16 编码,除非它是 PDFDocEncoded 字符串。

设置后,转换器能够添加上述前缀 FEFF,并将所有 PDF 字符串中的 UTF8 转换为 UTF16。XeTeX 在其后端使用转换dvipdfmx器,并自动激活此 UTF8-UCS2 转换。\special{pdf: tounicode UTF8-UCS2}xdvipdfmx

可以通过上述设置来设置 dvi 模式下的 PdfTeX。然后dvipdfmx后处理器会执行所需的工作。

但是 PDF 模式下的 PdfTeX 直接输出到 PDF 字符串(\pdfoutline例如使用原始格式),我们需要在宏级别进行 UTF8 到 UTF16 的转换。该文件pdfuni.tex解决了捷克语和斯洛伐克语字母表的这个问题。\pdfunidef\macro{text}此处定义了:text将转换为 UTF16 并保存到\macro。然后\macro可以将用作原始格式的参数\pdfoutline。如果您需要在 PDF 模式下使用 pdftex,您可以从代码中获得启发pdfuni.tex,并且可以为西里尔语实现类似的宏。

相关内容