如何使用 INITeX 编译 UTF-8 LaTeX 文档

如何使用 INITeX 编译 UTF-8 LaTeX 文档

考虑简单的 LaTeX 文件fine.tex

\documentclass[a4paper]{amsart}

\begin{document}

``Ceci échoue'' means ``this fails'' in French.

\end{document}

然后运行pdftex -fmt pdflatex fine.tex。我们得到一个文件fine.pdf,其中 UTF-8 字符“é”可以正确显示。如下fine.log

This is pdfTeX, Version 3.14159265-2.6-1.40.19 (TeX Live 2019/dev/Debian) (preloaded 
format=pdflatex 2021.5.17)  18 MAY 2021 17:18
entering extended mode
 restricted \write18 enabled.
 %&-line parsing enabled.
**fine.tex
(./fine.tex
LaTeX2e <2018-12-01>
(/usr/share/texlive/texmf-dist/tex/latex/amscls/amsart.cls
Document Class: amsart 2017/10/31 v2.20.4
\linespacing=\dimen102
\normalparindent=\dimen103
\normaltopskip=\skip41
(/usr/share/texlive/texmf-dist/tex/latex/amsmath/amsmath.sty
Package: amsmath 2018/12/01 v2.17b AMS math features
\@mathmargin=\skip42

For additional information on amsmath, use the `?' option.
(/usr/share/texlive/texmf-dist/tex/latex/amsmath/amstext.sty
Package: amstext 2000/06/29 v2.01 AMS text

(/usr/share/texlive/texmf-dist/tex/latex/amsmath/amsgen.sty
File: amsgen.sty 1999/11/30 v2.0 generic functions
\@emptytoks=\toks14
\ex@=\dimen104
))
(/usr/share/texlive/texmf-dist/tex/latex/amsmath/amsbsy.sty
Package: amsbsy 1999/11/29 v1.2d Bold Symbols
\pmbraise@=\dimen105
)
(/usr/share/texlive/texmf-dist/tex/latex/amsmath/amsopn.sty
Package: amsopn 2016/03/08 v2.02 operator names
)
\inf@bad=\count80
LaTeX Info: Redefining \frac on input line 223.
\uproot@=\count81
\leftroot@=\count82
LaTeX Info: Redefining \overline on input line 385.
\classnum@=\count83
\DOTSCASE@=\count84
LaTeX Info: Redefining \ldots on input line 482.
LaTeX Info: Redefining \dots on input line 485.
LaTeX Info: Redefining \cdots on input line 606.
\Mathstrutbox@=\box27
\strutbox@=\box28
\big@size=\dimen106
LaTeX Font Info:    Redeclaring font encoding OML on input line 729.
LaTeX Font Info:    Redeclaring font encoding OMS on input line 730.
\macc@depth=\count85
\c@MaxMatrixCols=\count86
\dotsspace@=\muskip10
\c@parentequation=\count87
\dspbrk@lvl=\count88
\tag@help=\toks15
\row@=\count89
\column@=\count90
\maxfields@=\count91
\andhelp@=\toks16
\eqnshift@=\dimen107
\alignsep@=\dimen108
\tagshift@=\dimen109
\tagwidth@=\dimen110
\totwidth@=\dimen111
\lineht@=\dimen112
\@envbody=\toks17
\multlinegap=\skip43
\multlinetaggap=\skip44
\mathdisplay@stack=\toks18
LaTeX Info: Redefining \[ on input line 2844.
LaTeX Info: Redefining \] on input line 2845.
)
LaTeX Font Info:    Try loading font information for U+msa on input line 398.

(/usr/share/texlive/texmf-dist/tex/latex/amsfonts/umsa.fd
File: umsa.fd 2013/01/14 v3.01 AMS symbols A
)
(/usr/share/texlive/texmf-dist/tex/latex/amsfonts/amsfonts.sty
Package: amsfonts 2013/01/14 v3.01 Basic AMSFonts support
\symAMSa=\mathgroup4
\symAMSb=\mathgroup5
LaTeX Font Info:    Overwriting math alphabet `\mathfrak' in version `bold'
(Font)                  U/euf/m/n --> U/euf/b/n on input line 106.
)
\copyins=\insert199
\abstractbox=\box29
\listisep=\skip45
\c@part=\count92
\c@section=\count93
\c@subsection=\count94
\c@subsubsection=\count95
\c@paragraph=\count96
\c@subparagraph=\count97
\c@figure=\count98
\c@table=\count99
\abovecaptionskip=\skip46
\belowcaptionskip=\skip47
\captionindent=\dimen113
\thm@style=\toks19
\thm@bodyfont=\toks20
\thm@headfont=\toks21
\thm@notefont=\toks22
\thm@headpunct=\toks23
\thm@preskip=\skip48
\thm@postskip=\skip49
\thm@headsep=\skip50
\dth@everypar=\toks24
)
No file fine.aux.
\openout1 = `fine.aux'.

LaTeX Font Info:    Checking defaults for OML/cmm/m/it on input line 3.
LaTeX Font Info:    ... okay on input line 3.
LaTeX Font Info:    Checking defaults for T1/cmr/m/n on input line 3.
LaTeX Font Info:    ... okay on input line 3.
LaTeX Font Info:    Checking defaults for OT1/cmr/m/n on input line 3.
LaTeX Font Info:    ... okay on input line 3.
LaTeX Font Info:    Checking defaults for OMS/cmsy/m/n on input line 3.
LaTeX Font Info:    ... okay on input line 3.
LaTeX Font Info:    Checking defaults for OMX/cmex/m/n on input line 3.
LaTeX Font Info:    ... okay on input line 3.
LaTeX Font Info:    Checking defaults for U/cmr/m/n on input line 3.
LaTeX Font Info:    ... okay on input line 3.
LaTeX Font Info:    Try loading font information for U+msa on input line 3.
(/usr/share/texlive/texmf-dist/tex/latex/amsfonts/umsa.fd
File: umsa.fd 2013/01/14 v3.01 AMS symbols A
)
LaTeX Font Info:    Try loading font information for U+msb on input line 3.

(/usr/share/texlive/texmf-dist/tex/latex/amsfonts/umsb.fd
File: umsb.fd 2013/01/14 v3.01 AMS symbols B
) [1{/usr/share/texlive/texmf-dist/fonts/map/pdftex/updmap/pdftex.map}] (./fine
.aux) ) 
Here is how much of TeX's memory you used:
 1370 strings out of 494700
 15198 string characters out of 6179925
 67801 words of memory out of 5000000
 4802 multiletter control sequences out of 15000+600000
 7385 words of font info for 29 fonts, out of 8000000 for 9000
 175 hyphenation exceptions out of 8191
 34i,4n,29p,241b,185s stack positions out of 5000i,500n,10000p,200000b,80000s
</usr/share/texlive/texmf-dist/fonts/type1/public/amsfonts/cm/cmr10.pfb>
</usr/share/texlive/texmf-dist/fonts/type1/public/amsfonts/cm/cmr7.pfb>
Output written on fine.pdf (1 page, 20901 bytes).
PDF statistics:
 16 PDF objects out of 1000 (max. 8388607)
 10 compressed objects within 1 object stream
 0 named destinations out of 1000 (max. 500000)
 1 words of extra memory for PDF output out of 10000 (max. 10000000)

现在考虑这个 TeX 文件fail.tex

\let\dump\relax
\input pdflatex.ini
\input fine.tex

我的目标是通过运行获取相同的 PDF 文件(除了一些元数据)pdftex -etex -ini fail.tex。这失败了,因为“é”未显示在文件中fail.pdf,如文件末尾所示fail.log(由于长度限制,省略了完整日志):

Missing character: There is no ^^c3 in font cmr10!
Missing character: There is no ^^a9 in font cmr10!
 [1{/usr/share/texlive/texmf-dist/fonts/map/pdftex/updmap/pdftex.map}] (./fail.
aux) ) ) 
Here is how much of TeX's memory you used:
 4593 strings out of 497925
 49090 string characters out of 6213961
 67806 words of memory out of 5000000
 4801 multiletter control sequences out of 15000+600000
 7385 words of font info for 29 fonts, out of 8000000 for 9000
 175 hyphenation exceptions out of 8191
 35i,4n,29p,257b,185s stack positions out of 5000i,500n,10000p,200000b,80000s
</usr/share/texlive/texmf-dist/fonts/type1/public/amsfonts/cm/cmr10.pfb
></usr/share/texlive/texmf-dist/fonts/type1/public/amsfonts/cm/cmr7.pfb>
Output written on fail.pdf (1 page, 20790 bytes).
PDF statistics:
 16 PDF objects out of 1000 (max. 8388607)
 10 compressed objects within 1 object stream
 0 named destinations out of 1000 (max. 500000)
 1 words of extra memory for PDF output out of 10000 (max. 10000000)

问题很简单:为什么需要转储 FMT 文件才能正确输入 UTF-8 字符?

在有人问之前为什么我正在这样做:我想更好地理解 INITeX。这与David Carlisle 的回答

答案1

有些内容无法保存到格式文件中,因此 LaTeX 会将这些东西保存在寄存器中,\everyjob以便在作业启动时执行,包括设置 UTF-8 字符命令。直接从 INITEX 运行 LaTeX 时,您需要手动执行该标记列表:

\let\dump\relax
\input pdflatex.ini
\the\everyjob % use \everyjob
\everyjob={}% clear \everyjob
\input fine.tex

相关内容