如何使用 INITeX 编译 UTF-8 LaTeX 文档

如何使用 INITeX 编译 UTF-8 LaTeX 文档

考虑简单的 LaTeX 文件fine.tex



``Ceci échoue'' means ``this fails'' in French.


然后运行pdftex -fmt pdflatex fine.tex。我们得到一个文件fine.pdf,其中 UTF-8 字符“é”可以正确显示。如下fine.log

This is pdfTeX, Version 3.14159265-2.6-1.40.19 (TeX Live 2019/dev/Debian) (preloaded 
format=pdflatex 2021.5.17)  18 MAY 2021 17:18
entering extended mode
 restricted \write18 enabled.
 %&-line parsing enabled.
LaTeX2e <2018-12-01>
Document Class: amsart 2017/10/31 v2.20.4
Package: amsmath 2018/12/01 v2.17b AMS math features

For additional information on amsmath, use the `?' option.
Package: amstext 2000/06/29 v2.01 AMS text

File: amsgen.sty 1999/11/30 v2.0 generic functions
Package: amsbsy 1999/11/29 v1.2d Bold Symbols
Package: amsopn 2016/03/08 v2.02 operator names
LaTeX Info: Redefining \frac on input line 223.
LaTeX Info: Redefining \overline on input line 385.
LaTeX Info: Redefining \ldots on input line 482.
LaTeX Info: Redefining \dots on input line 485.
LaTeX Info: Redefining \cdots on input line 606.
LaTeX Font Info:    Redeclaring font encoding OML on input line 729.
LaTeX Font Info:    Redeclaring font encoding OMS on input line 730.
LaTeX Info: Redefining \[ on input line 2844.
LaTeX Info: Redefining \] on input line 2845.
LaTeX Font Info:    Try loading font information for U+msa on input line 398.

File: umsa.fd 2013/01/14 v3.01 AMS symbols A
Package: amsfonts 2013/01/14 v3.01 Basic AMSFonts support
LaTeX Font Info:    Overwriting math alphabet `\mathfrak' in version `bold'
(Font)                  U/euf/m/n --> U/euf/b/n on input line 106.
No file fine.aux.
\openout1 = `fine.aux'.

LaTeX Font Info:    Checking defaults for OML/cmm/m/it on input line 3.
LaTeX Font Info:    ... okay on input line 3.
LaTeX Font Info:    Checking defaults for T1/cmr/m/n on input line 3.
LaTeX Font Info:    ... okay on input line 3.
LaTeX Font Info:    Checking defaults for OT1/cmr/m/n on input line 3.
LaTeX Font Info:    ... okay on input line 3.
LaTeX Font Info:    Checking defaults for OMS/cmsy/m/n on input line 3.
LaTeX Font Info:    ... okay on input line 3.
LaTeX Font Info:    Checking defaults for OMX/cmex/m/n on input line 3.
LaTeX Font Info:    ... okay on input line 3.
LaTeX Font Info:    Checking defaults for U/cmr/m/n on input line 3.
LaTeX Font Info:    ... okay on input line 3.
LaTeX Font Info:    Try loading font information for U+msa on input line 3.
File: umsa.fd 2013/01/14 v3.01 AMS symbols A
LaTeX Font Info:    Try loading font information for U+msb on input line 3.

File: umsb.fd 2013/01/14 v3.01 AMS symbols B
) [1{/usr/share/texlive/texmf-dist/fonts/map/pdftex/updmap/pdftex.map}] (./fine
.aux) ) 
Here is how much of TeX's memory you used:
 1370 strings out of 494700
 15198 string characters out of 6179925
 67801 words of memory out of 5000000
 4802 multiletter control sequences out of 15000+600000
 7385 words of font info for 29 fonts, out of 8000000 for 9000
 175 hyphenation exceptions out of 8191
 34i,4n,29p,241b,185s stack positions out of 5000i,500n,10000p,200000b,80000s
Output written on fine.pdf (1 page, 20901 bytes).
PDF statistics:
 16 PDF objects out of 1000 (max. 8388607)
 10 compressed objects within 1 object stream
 0 named destinations out of 1000 (max. 500000)
 1 words of extra memory for PDF output out of 10000 (max. 10000000)

现在考虑这个 TeX 文件fail.tex

\input pdflatex.ini
\input fine.tex

我的目标是通过运行获取相同的 PDF 文件(除了一些元数据)pdftex -etex -ini fail.tex。这失败了,因为“é”未显示在文件中fail.pdf,如文件末尾所示fail.log(由于长度限制,省略了完整日志):

Missing character: There is no ^^c3 in font cmr10!
Missing character: There is no ^^a9 in font cmr10!
 [1{/usr/share/texlive/texmf-dist/fonts/map/pdftex/updmap/pdftex.map}] (./fail.
aux) ) ) 
Here is how much of TeX's memory you used:
 4593 strings out of 497925
 49090 string characters out of 6213961
 67806 words of memory out of 5000000
 4801 multiletter control sequences out of 15000+600000
 7385 words of font info for 29 fonts, out of 8000000 for 9000
 175 hyphenation exceptions out of 8191
 35i,4n,29p,257b,185s stack positions out of 5000i,500n,10000p,200000b,80000s
Output written on fail.pdf (1 page, 20790 bytes).
PDF statistics:
 16 PDF objects out of 1000 (max. 8388607)
 10 compressed objects within 1 object stream
 0 named destinations out of 1000 (max. 500000)
 1 words of extra memory for PDF output out of 10000 (max. 10000000)

问题很简单:为什么需要转储 FMT 文件才能正确输入 UTF-8 字符?

在有人问之前为什么我正在这样做:我想更好地理解 INITeX。这与David Carlisle 的回答


有些内容无法保存到格式文件中,因此 LaTeX 会将这些东西保存在寄存器中,\everyjob以便在作业启动时执行,包括设置 UTF-8 字符命令。直接从 INITEX 运行 LaTeX 时,您需要手动执行该标记列表:

\input pdflatex.ini
\the\everyjob % use \everyjob
\everyjob={}% clear \everyjob
\input fine.tex
