我正在尝试正确编码 PDF 轮廓。问题是轮廓中的字母必须Ъ
与页面上的字母相同,但事实并非如此Ú
。
dvitype
显示如下:
xxx 'pdf: outline 1 << /Title (?) /Dest [ @thispage /FitH @ypos ] >>' non-ASCII character in xxx command!
下面是示例文件(我知道它有点长,但我尽力从原始文件中删除了尽可能多的不相关的部分)。运行:
$ tex -ini -enc '\input plain \dump'
$ tex -fmt plain example.tex
$ dvipdfmx example.dvi
示例.tex:
\font\tenbf=labx1000
\newtoks\gtitle % title of current major group
\newtoks\toksE \newtoks\toksF \newtoks\usersanitizer
\newif\iftokprocessed \newif\ifTnum \newif\ifinstr
\def\firstsecno#1.{\setbox0=\hbox{\toksA={#1.}\toksB={}%
\maketoks}}
\def\addtokens#1#2{\edef\addtoks{\noexpand#1={\the#1#2}}\addtoks}
\def\poptoks#1#2|ENDTOKS|{\let\first=#1\toksD={#1}%
\ifcat\noexpand\first0\countB=`#1\else\countB=0\fi\toksA={#2}}
\def\maketoksdone{\edef\st{\global\noexpand\toksA={\the\toksB}}\st}
\def\sanitizecommand#1#2{\addtokens\usersanitizer
{\noexpand\dosanitizecommand\noexpand#1{#2}}}
\def\dosanitizecommand#1#2{\ifx\nxt#1\addF{#2}\fi}
\def\makeoutlinetoks{\Tnumfalse\afterassignment\makeolproctok\let\nxt= }
\def\makeolnexttok{\afterassignment\makeolproctok\let\nxt= }
\def\makeolgobbletok{\afterassignment\makeolnexttok\let\nxt= }
\def\addF#1{\addtokens\toksF{#1}\tokprocessedtrue}
% now comes a routine to "sanitize" section names, for pdf outlines
\def\makeolproctok{\tokprocessedfalse
\let\next\makeolnexttok % default
\ifx\nxt\outlinedone\let\next\outlinedone
\else\ifx{\nxt \else\ifx}\nxt \Tnumfalse \instrfalse % skip braces
\else\ifx$\nxt % or a $ sign
\else\ifx^\nxt \addF^\else\ifx_\nxt \addF_% sanitize ^ and _
\else\ifx\nxt\spacechar \addF\space
\else\if\noexpand\nxt\relax % we have a control sequence; is it one we know?
\ifx\nxt~\addF\space
\else\ifx\nxt\onespace\addF\space
\else\the\usersanitizer
\iftokprocessed\else\makeolproctokctli
\iftokprocessed\else\makeolproctokctlii
\iftokprocessed\else\makeolproctokctliii % if not recognised, skip it
\fi\fi\fi\fi\fi
\else % we don't have a control sequence, it's an ordinary char
\ifx/\nxt \addF{\string\/}% quote chars special to PDF with backslash
\else\ifx(\nxt \addF{\string\(}\else\ifx)\nxt \addF{\string\)}%
\else\ifx[\nxt \addF{\string\[}\else\ifx]\nxt \addF{\string\]}%
\else\expandafter\makeolproctokchar\meaning\nxt
\fi\fi\fi\fi\fi\fi\fi\fi\fi\fi\fi\fi\fi
\next
}
\def\makeolproctokchar#1 #2 #3{\addF{#3}}
\def\makeolproctokctli{%
\ifx\nxt\CEE\addF{C}\let\next\makeolgobbletok % \CEE/
\else\ifx\nxt\UNIX\addF{UNIX}\let\next\makeolgobbletok % \UNIX/
\else\ifx\nxt\TEX\addF{TeX}\let\next\makeolgobbletok % \TEX/
\else\ifx\nxt\TeX\addF{TeX}\else\ifx\nxt\LaTeX\addF{LaTeX}%
\else\ifx\nxt\CPLUSPLUS\addF{C++}\let\next\makeolgobbletok % \CPLUSPLUS/
\else\ifx\nxt\Cee\addF{C}%
\else\ifx\nxt\PB \let\next\makeolgobbletok \tokprocessedtrue % \PB{...}
\else\ifx\nxt\.\tokprocessedtrue\instrtrue % \.{...}
% skip \|
\else\ifx\nxt\\\ifinstr\addF{\bschar\bschar}\else\tokprocessedtrue\fi
\else\ifx\nxt\&\ifinstr\addF&\else\tokprocessedtrue\fi
\else\ifx\nxt\~\ifTnum\addF{0}\else\addF\tildechar\fi % 077->\T{\~77}
\else\ifx\nxt\_\ifTnum\addF{E}\else\addF_\fi % 0.1E5->\T{0.1\_5}
\else\ifx\nxt\^\ifTnum\addF{0x}\else\addF^\fi % 0x77 -> \T{\^77}
\else\ifx\nxt\$\ifTnum\tokprocessedtrue\else\addF$\fi % \T{77\$L}
\else\ifx\nxt\{\addF\lbchar \else\ifx\nxt\}\addF\rbchar
\else\ifx\nxt\ \addF\space \else\ifx\nxt\#\addF{\string\#}%
\else\ifx\nxt\PP\addF{++}\else\ifx\nxt\MM\addF{--}%
\fi\fi\fi\fi\fi\fi\fi\fi\fi\fi\fi\fi\fi\fi\fi\fi\fi\fi\fi\fi\fi
}
\def\outlinedone{\edef\outlinest{\global\noexpand\toksE={\the\toksF}}%
\outlinest\let\outlinedone=\relax}
\def\lapstar{\rlap{*}}
\def\stsec{\rightskip=0pt % get out of C mode (cf. \B)
\sfcode`;=1500 \pretolerance 200 \hyphenpenalty 50 \exhyphenpenalty 50
\noindent{\let\*=\lapstar\bf\secstar.\quad}%
\smash{\raise\baselineskip\hbox to0pt{%
\let\*=\empty\special{%
pdf: dest (\romannumeral\secstar) [ @thispage /FitH @ypos ]}}}}
\let\startsection=\stsec
\def\MN#1{\par % common code for \M, \N
{\xdef\secstar{#1}\let\*=\empty\xdef\secno{#1}}% remove \* from section name
\ifx\secno\secstar\fi
\mark{{{\tensy x}\secno}{1}{\the\gtitle}}}
\let\ZZ=\let % now you can \write the control sequence \ZZ
\let\page=\pagebody \raggedbottom
\def\startpdf{
{\special{pdf: docview << /PageMode /UseOutlines >>}}}
\newwrite\cont
\output{\setbox0=\page % the first page is garbage
\openout\cont=\jobname.toc
\global\output{\shipout\vbox{
\vbox to 9in{
\hbox to 6.5in{\vbox to10pt{}}
\vfill\page}}}}
\vbox to \vsize{} % the first \topmark won't be null
\def\makebookmarks{\let\ZZ=\writebookmarkline \readcontents\relax}
\def\expnumber#1{\expandafter\ifx\csname#1\endcsname\relax 0%
\else \csname#1\endcsname \fi} % Petr Olsak's macros from texinfo.tex
\def\writebookmarkline#1#2#3#4#5{{%
\let\(=\let \let\)=\let \let\[=\let \let\]=\let \let\/=\let
\pdfoutline goto num #3 count -\expnumber{chunk#2.#3} {#5}}}
\def\main#1#2#3.{% beginning of starred section
\toksF={}\makeoutlinetoks#3\outlinedone\outlinedone
\gtitle={#3}\MN{#2}%
\vfil\eject
\def\stripprefix##1>{}\def\gtitletoks{#3}%
\edef\gtitletoks{\expandafter\stripprefix\meaning\gtitletoks}%
\edef\next{\write\cont{\ZZ{\gtitletoks}{#1}{\secno}% write to contents file
{\noexpand\the\pageno}{\the\toksE}}}\next % \ZZ{title}{depth}{sec}{page}{ss}
\special{pdf: outline #1 << /Title (\the\toksE) /Dest
[ @thispage /FitH @ypos ] >>}
\startsection{\bf#3.\quad}\ignorespaces}
\mubytein=1 \mubyteout=2
\mubyte ^^da ^^d0^^aa\endmubyte
\main{1}{1}Ъ.
\bye
编辑:
总结一下答案,这些是我当前的设置(没有中间包):
\newbox\mybox
\let\oldshipout\shipout
\def\shipout{\afterassignment\myboat\setbox\mybox=}
\def\myboat{\aftergroup\myship}
\def\myship{\setbox\mybox=\vbox{\special{pdf:tounicode UTF8-UCS2}\unvbox\mybox}\oldshipout\box\mybox\global\let\shipout\oldshipout}
(想法来自 quire.tex)
答案1
假设输入编码为 UTF-8,并且 encTeX 为:
\font\tenbf=labx1000
\mubytein=1 \mubyteout=2 \specialout=2 \mubyte ^^da ^^d0^^aa\endmubyte
\special{pdf: tounicode UTF8-UCS2}
\special{pdf: outline 1 << /Title (Ъ) /Dest
[ @thispage /FitH @ypos ] >>}
\bf Ъ.
\bye
就你的例子而言,你正在重新定义
\output
因此请在 example.tex 顶部添加以下两行:
\input atbegshi.sty
\AtBeginShipoutFirst{\special{pdf: tounicode UTF8-UCS2}}
这里 atbegshi.sty 是一个由 Heiko Oberdiek 开发的包。
编辑:
除了使用 atbegshi.sty,您还可以使用更直接的方式来添加“tounicode”特殊字符:
...
dviasm -o example.dump example.dvi
perl -i -pe "s/(?<=\[page 1 0 0 0 0 0 0 0 0 0\])/\nxxx: 'pdf:tounicode UTF8-UCS2'/" example.dump
dviasm -o example.dvi example.dump
dvipdfmx example.dvi
答案2
问题是,PDF 字符串只有两种记录在案的编码(用于 PDFoutlines、PDFinfo 等):第一种是默认的:PDFDocEncoding。它是单字节编码,但在非英语和非西欧语言(如捷克语)中不可用。第二种:PDFunicode 编码。它是从 UTF16 派生的特殊双字节编码,以 FEFF(十六进制)开头。每个 PDF 字符串都必须以此前缀作为前缀,并采用 UTF16 编码,除非它是 PDFDocEncoded 字符串。
设置后,转换器能够添加上述前缀 FEFF,并将所有 PDF 字符串中的 UTF8 转换为 UTF16。XeTeX 在其后端使用转换dvipdfmx
器,并自动激活此 UTF8-UCS2 转换。\special{pdf: tounicode UTF8-UCS2}
xdvipdfmx
可以通过上述设置来设置 dvi 模式下的 PdfTeX。然后dvipdfmx
后处理器会执行所需的工作。
但是 PDF 模式下的 PdfTeX 直接输出到 PDF 字符串(\pdfoutline
例如使用原始格式),我们需要在宏级别进行 UTF8 到 UTF16 的转换。该文件pdfuni.tex
解决了捷克语和斯洛伐克语字母表的这个问题。\pdfunidef\macro{text}
此处定义了:text
将转换为 UTF16 并保存到\macro
。然后\macro
可以将用作原始格式的参数\pdfoutline
。如果您需要在 PDF 模式下使用 pdftex,您可以从代码中获得启发pdfuni.tex
,并且可以为西里尔语实现类似的宏。