这个问题直接受到马丁·施罗德对以下问题的回答的启发:这个问题也就是说,我想知道如何使用 LuaTeX 或 XeTeX 生成塞尔维亚语(与俄语略有不同)西里尔文输出(使用美国键盘布局)?如何使用塞尔维亚语键盘布局生成相同的输出?使用 pdfTeX 引擎和美国键盘布局生成此类输出的正确方法是:
\documentclass{article}
\usepackage[OT2,T1]{fontenc}
\input{cyracc.def}
\newcommand\textcyr[1]{{\fontencoding{OT2}\fontfamily{wncyr}\selectfont #1}}
\begin{document}
Serbian alphabet again \dots \textcyr{\cyracc
A B V G D DJ E Zh Z I J K L LJ M N NJ O P R S T \'C U F Kh C Ch \Dzh\ Sh
}
\end{document}
这使
当然,你也可以使用 inputenc 来使用塞尔维亚语键盘。但 Babel 很不幸需要塞尔维亚语键盘,因此对我个人来说,它并不有趣。
答案1
这是 XeLaTeX 的一种方法。
ascii-to-serbian.map
准备一个包含以下内容的文件:
; TECkit mapping for TeX input conventions <-> Unicode characters
LHSName "ASCII-to-Serbian"
RHSName "UNICODE"
pass(Unicode)
; ligatures from Knuth's original CMR fonts
U+002D U+002D <> U+2013 ; -- -> en dash
U+002D U+002D U+002D <> U+2014 ; --- -> em dash
U+0027 <> U+2019 ; ' -> right single quote
U+0027 U+0027 <> U+201D ; '' -> right double quote
U+0022 > U+201D ; " -> right double quote
U+0060 <> U+2018 ; ` -> left single quote
U+0060 U+0060 <> U+201C ; `` -> left double quote
U+0021 U+0060 <> U+00A1 ; !` -> inverted exclam
U+003F U+0060 <> U+00BF ; ?` -> inverted question
; additions supported in T1 encoding
U+002C U+002C <> U+201E ; ,, -> DOUBLE LOW-9 QUOTATION MARK
U+003C U+003C <> U+00AB ; << -> LEFT POINTING GUILLEMET
U+003E U+003E <> U+00BB ; >> -> RIGHT POINTING GUILLEMET
U+0041 <> U+0410 ; A
U+0042 <> U+0411 ; B
U+0043 <> U+0426 ; C
U+0043 U+0048 <> U+0427 ; CH
U+0043 U+0068 <> U+0427 ; Ch
U+0043 U+0031 <> U+040B ; C1
U+0027 U+0043 <> U+040B ; 'C
U+0044 <> U+0414 ; D
U+0044 U+004A <> U+0402 ; DJ
U+0044 U+006A <> U+0402 ; Dj
U+0044 U+005A U+0048 <> U+040F ; DZH
U+0044 U+007A U+0068 <> U+040F ; Dzh
U+0044 U+0031 <> U+040F ; D1
U+0045 <> U+0415 ; E
U+0046 <> U+0424 ; F
U+0047 <> U+0413 ; G
U+0048 <> U+0425 ; H
U+0049 <> U+0418 ; I
U+004A <> U+0408 ; J
U+004B <> U+041A ; K
U+004B U+0048 <> U+0425 ; KH
U+004B U+0068 <> U+0425 ; Kh
U+004C <> U+041B ; L
U+004C U+004A <> U+0409 ; LJ
U+004C U+006A <> U+0409 ; Lj
U+004D <> U+041C ; M
U+004E <> U+041D ; N
U+004E U+004A <> U+040A ; NJ
U+004E U+006A <> U+040A ; Nj
U+004F <> U+041E ; O
U+0050 <> U+041F ; P
;U+0051 <> ; Q
U+0052 <> U+0420 ; R
U+0053 <> U+0421 ; S
U+0053 U+0048 <> U+0428 ; SH
U+0053 U+0068 <> U+0428 ; Sh
U+0054 <> U+0422 ; T
U+0055 <> U+0423 ; U
U+0056 <> U+0412 ; V
;U+0057 <> ; W
U+0058 <> U+0425 ; X
;U+0059 ; Y
U+005A <> U+0417 ; Z
U+005A U+0048 <> U+0416 ; ZH
U+005A U+0068 <> U+0416 ; Zh
U+0061 <> U+0430 ; a
U+0062 <> U+0431 ; b
U+0063 <> U+0446 ; c
U+0063 U+0068 <> U+0447 ; ch
U+0063 U+0031 <> U+045B ; c1
U+0027 U+0063 <> U+045B ; 'c
U+0064 <> U+0434 ; d
U+0064 U+006A <> U+0452 ; dj
U+0064 U+007A U+0068 <> U+045F ; dzh
U+0064 U+0031 <> U+045F ; d1
U+0065 <> U+0435 ; e
U+0066 <> U+0444 ; f
U+0067 <> U+0433 ; g
U+0068 <> U+0445 ; h
U+0069 <> U+0438 ; i
U+006A <> U+0458 ; j
U+006B <> U+043A ; k
U+006B U+0068 <> U+0445 ; kh
U+006C <> U+043B ; l
U+006C U+006A <> U+0459 ; lj
U+006D <> U+043C ; m
U+006E <> U+043D ; n
U+006E U+006A <> U+045A ; nj
U+006F <> U+043E ; o
U+0070 <> U+043F ; p
;U+0071 <> ; q
U+0072 <> U+0440 ; r
U+0073 <> U+0441 ; s
U+0073 U+0068 <> U+0448 ; sh
U+0074 <> U+0442 ; t
U+0075 <> U+0443 ; u
U+0076 <> U+0432 ; v
;U+0077 <> ; w
U+0078 <> U+0445 ; x
;U+0079 ; y
U+007A <> U+0437 ; z
U+007A U+0068 <> U+0436 ; zh
; Additional (for official translitteration)
U+0110 <> U+0402 ; Đ
U+0111 <> U+0452 ; đ
U+017D <> U+0416 ; Ž
U+017E <> U+0436 ; ž
U+0106 <> U+040B ; Ć
U+0107 <> U+045B ; ć
U+010C <> U+0427 ; Č
U+010D <> U+0447 ; č
U+0044 U+017D <> U+040F ; DŽ
U+0044 U+017E <> U+040F ; Dž
U+0064 U+017E <> U+045F ; dž
U+0160 <> U+0428 ; Š
U+0161 <> U+0448 ; š
然后处理它
teckit_compile ascii-to-serbian.map
这将生成一个文件ascii-to-serbian.tec
,你可以将其放在 XeTeX 可以找到的任何位置(例如,在工作目录中)。然后创建以下测试文件:
\documentclass{article}
\usepackage{fontspec}
\setmainfont[Ligatures=TeX]{Linux Libertine O}
\newfontfamily{\serbianfont}[Mapping=ascii-to-serbian]{Linux Libertine O}
\usepackage{polyglossia}
\setmainlanguage{english}
\setotherlanguage[Script=Cyrillic]{serbian}
\begin{document}
Serbian alphabet again
\begin{serbian}
A B V G D DJ E Zh Z I J K L LJ M N NJ O P R S T C1 U F Kh C Ch D1 Sh
a b v g d dj e zh z i j k l m n nj o p r s t c1 u f kh c ch d1 sh
\end{serbian}
\end{document}
之后的样本输出xelatex test.tex
注 1:字符Џ
和џ
也可以输入为DZH
(或Dzh
)和dzh
。如果这是不正确的(这可能会导致不正确的连字),则从 中删除相应的行ascii-to-serbian.map
。
注 2:如果您发现输入不方便,C1
无法c1
获得 Ћ 和 ћ,您可以添加以下行
U+0027 U+0043 <> U+040B ; 'C
和
U+0027 U+0063 <> U+040B ; 'c
在C1
和c1
条目之后。这将允许您输入字符为'C
和'c
。
如果您想将它们输入为\'C
和\'c
,请在使用 Polyglossia 加载塞尔维亚语后插入此代码
\let\standardcommandquote\'
\DeclareRobustCommand{\serbiancommandquote}[1]{%
\ifnum\strcmp{#1}{c}=0 c1\else
\ifnum\strcmp{#1}{C}=0 C1\else
\standardcommandquote{#1}\fi\fi}
\makeatletter
\appto\blockextras@serbian{\let\'\serbiancommandquote}
\appto\inlineextras@serbian{\let\'\serbiancommandquote}
\appto\noextras@serbian{\let\'\standardcommandquote}
\makeatother
注 3(2 月 17 日添加):如果有可用的 Unicode 输入,那么
Đ đ Ž ž Ć ć Č č DŽ Dž dž Š š
映射到
Ђ ђ Ж ж Ћ ћ Ч ч Џ џ Ш ш
分别。
答案2
只要您仅对塞尔维亚语使用 ascii,您的示例经过一些更改后也适用于 xelatex(和 lualatex)。(如果您在塞尔维亚语之外使用非 ascii 字符,则文件应以 utf8 编码):
\documentclass{article}
\usepackage[OT2]{fontenc}
\input{cyracc.def}
\usepackage{fontspec}
\setmainfont{Arial} % to see the difference
\newcommand\textcyr[1]{{\fontencoding{OT2}\fontfamily{wncyr}\selectfont #1}}
\begin{document}
Serbian alphabet again \dots \textcyr{\cyracc
A B V G D DJ E Zh Z I J K L LJ M N NJ O P R S T \'C U F Kh C Ch \Dzh\ Sh
} roman text again
\end{document}
cyracc.def 中的定义可能会在较长的文档中产生不必要的副作用。如果您需要连字符,也可能会遇到问题。
但使用此输入,您无法利用 xetex/luatex 的优势。您没有使用真正的 unicode 输入,也没有使用西里尔字母的系统字体 - 您可以使用的字体仅限于 OT2 编码字体。
因此,如果您真的想坚持使用 ASCII 输入,您最好使用 egreg 提到的映射(仅适用于 xelatex)。但我认为,从长远来看,想要使用两个或更多脚本的人最好学习如何切换键盘布局,以便他们可以直接输入字符。对于短文本,您可以在网上找到虚拟键盘。
答案3
使用 LuaLaTeX,可以使用 opentype 功能文件模拟 xetex 的映射功能。与 xetex 映射功能相反,它不是基于用一个 unicode 字符替换另一个 unicode 字符,而是基于用字体中使用的字形名称替换 unicode 字符。这里有一个使用示例gentium
:
\documentclass{article}
\usepackage{fontspec}
\usepackage[serbian]{babel}
\def\Dzh{Dzh}
\def\Sh{Sh}
\setmainfont{Gentium Plus}
\newfontfamily\serbianfont[RawFeature=+gsub,FeatureFile=serb.fea,Script=Cyrillic]
{Gentium Plus}
\begin{document}
\noindent српска ћирилица\\
Hello world\\
\serbianfont
Hello world\\
A B V G D DJ E Zh Z I J K Kh L LJ M N NJ O P R S T Th U F Kh C Ch \Dzh\Sh\\
a b v g d dj e zh z i j k kh l lj m n nj o p r s t th u f kh c ch dzh sh
\end{document}
和功能文件serb.fea
languagesystem cyrl SRB;
languagesystem cyrl DFLT;
feature liga {
sub C H by Checyrillic;
sub C h by Checyrillic;
sub D J by Djecyrillic;
sub D j by Djecyrillic;
sub D z h by Dzhecyrillic;
sub D Z H by Dzhecyrillic;
sub K H by Khacyrillic;
sub K h by Khacyrillic;
sub L J by Ljecyrillic;
sub L j by Ljecyrillic;
sub N j by Njecyrillic;
sub N J by Njecyrillic;
sub S H by Shacyrillic;
sub S h by Shacyrillic;
sub Z H by Zhecyrillic;
sub T h by Tshecyrillic;
sub T H by Tshecyrillic;
sub Z h by Zhecyrillic;
sub c h by checyrillic;
sub t h by tshecyrillic;
sub d j by djecyrillic;
sub d z h by dzhecyrillic;
sub k h by khacyrillic;
sub l j by ljecyrillic;
sub n j by njecyrillic;
sub s h by shacyrillic;
sub z h by zhecyrillic;
} liga;
feature gsub {
sub A by Acyrillic;
sub B by Becyrillic;
sub C by Vecyrillic;
sub D by Decyrillic;
sub E by Iecyrillic;
sub F by Efcyrillic;
sub G by Gecyrillic;
sub H by Khacyrillic;
sub I by Iicyrillic;
sub J by Jecyrillic;
sub K by Kacyrillic;
sub L by Elcyrillic;
sub M by Emcyrillic;
sub N by Encyrillic;
sub O by Ocyrillic;
sub P by Pecyrillic;
sub R by Ercyrillic;
sub S by Escyrillic;
sub T by Tecyrillic;
sub U by Ucyrillic;
sub V by Vecyrillic;
sub X by Khacyrillic;
sub Z by Zecyrillic;
sub a by acyrillic;
sub b by becyrillic;
sub c by tsecyrillic;
sub d by decyrillic;
sub e by iecyrillic;
sub f by efcyrillic;
sub g by gecyrillic;
sub h by khacyrillic;
sub i by iicyrillic;
sub j by jecyrillic;
sub k by kacyrillic;
sub l by elcyrillic;
sub m by emcyrillic;
sub n by encyrillic;
sub o by ocyrillic;
sub p by pecyrillic;
sub r by ercyrillic;
sub s by escyrillic;
sub t by tecyrillic;
sub u by ucyrillic;
sub v by vecyrillic;
sub x by khacyrillic;
sub z by zecyrillic;
} gsub;