使用 LuaTeX 和 XeTeX 的塞尔维亚西里尔语

使用 LuaTeX 和 XeTeX 的塞尔维亚西里尔语

这个问题直接受到马丁·施罗德对以下问题的回答的启发:这个问题也就是说,我想知道如何使用 LuaTeX 或 XeTeX 生成塞尔维亚语(与俄语略有不同)西里尔文输出(使用美国键盘布局)?如何使用塞尔维亚语键盘布局生成相同的输出?使用 pdfTeX 引擎和美国键盘布局生成此类输出的正确方法是:

\documentclass{article}
\usepackage[OT2,T1]{fontenc}
\input{cyracc.def}
\newcommand\textcyr[1]{{\fontencoding{OT2}\fontfamily{wncyr}\selectfont #1}}
\begin{document}
Serbian alphabet again \dots \textcyr{\cyracc
A B V G D DJ E Zh Z I J K L LJ M N NJ O P R S T \'C U F Kh C Ch \Dzh\ Sh
} 
\end{document}

这使

在此处输入图片

当然,你也可以使用 inputenc 来使用塞尔维亚语键盘。但 Babel 很不幸需要塞尔维亚语键盘,因此对我个人来说,它并不有趣。

答案1

这是 XeLaTeX 的一种方法。

ascii-to-serbian.map准备一个包含以下内容的文件:

; TECkit mapping for TeX input conventions <-> Unicode characters

LHSName "ASCII-to-Serbian"
RHSName "UNICODE"

pass(Unicode)

; ligatures from Knuth's original CMR fonts
U+002D U+002D           <>  U+2013  ; -- -> en dash
U+002D U+002D U+002D    <>  U+2014  ; --- -> em dash

U+0027          <>  U+2019  ; ' -> right single quote
U+0027 U+0027   <>  U+201D  ; '' -> right double quote
U+0022           >  U+201D  ; " -> right double quote

U+0060          <>  U+2018  ; ` -> left single quote
U+0060 U+0060   <>  U+201C  ; `` -> left double quote

U+0021 U+0060   <>  U+00A1  ; !` -> inverted exclam
U+003F U+0060   <>  U+00BF  ; ?` -> inverted question

; additions supported in T1 encoding
U+002C U+002C   <>  U+201E  ; ,, -> DOUBLE LOW-9 QUOTATION MARK
U+003C U+003C   <>  U+00AB  ; << -> LEFT POINTING GUILLEMET
U+003E U+003E   <>  U+00BB  ; >> -> RIGHT POINTING GUILLEMET

U+0041 <> U+0410 ; A
U+0042 <> U+0411 ; B
U+0043 <> U+0426 ; C
U+0043 U+0048 <> U+0427 ; CH
U+0043 U+0068 <> U+0427 ; Ch
U+0043 U+0031 <> U+040B ; C1
U+0027 U+0043 <> U+040B ; 'C
U+0044 <> U+0414 ; D
U+0044 U+004A <> U+0402 ; DJ
U+0044 U+006A <> U+0402 ; Dj
U+0044 U+005A U+0048 <> U+040F ; DZH
U+0044 U+007A U+0068 <> U+040F ; Dzh
U+0044 U+0031 <> U+040F ; D1
U+0045 <> U+0415 ; E
U+0046 <> U+0424 ; F
U+0047 <> U+0413 ; G
U+0048 <> U+0425 ; H
U+0049 <> U+0418 ; I
U+004A <> U+0408 ; J
U+004B <> U+041A ; K
U+004B U+0048 <> U+0425 ; KH
U+004B U+0068 <> U+0425 ; Kh
U+004C <> U+041B ; L
U+004C U+004A <> U+0409 ; LJ
U+004C U+006A <> U+0409 ; Lj
U+004D <> U+041C ; M
U+004E <> U+041D ; N
U+004E U+004A <> U+040A ; NJ
U+004E U+006A <> U+040A ; Nj
U+004F <> U+041E ; O
U+0050 <> U+041F ; P
;U+0051 <> ; Q
U+0052 <> U+0420 ; R
U+0053 <> U+0421 ; S
U+0053 U+0048 <> U+0428 ; SH
U+0053 U+0068 <> U+0428 ; Sh
U+0054 <> U+0422 ; T
U+0055 <> U+0423 ; U
U+0056 <> U+0412 ; V
;U+0057 <> ; W
U+0058 <> U+0425 ; X
;U+0059 ; Y
U+005A <> U+0417 ; Z
U+005A U+0048 <> U+0416 ; ZH
U+005A U+0068 <> U+0416 ; Zh

U+0061 <> U+0430 ; a
U+0062 <> U+0431 ; b
U+0063 <> U+0446 ; c
U+0063 U+0068 <> U+0447 ; ch
U+0063 U+0031 <> U+045B ; c1
U+0027 U+0063 <> U+045B ; 'c
U+0064 <> U+0434 ; d
U+0064 U+006A <> U+0452 ; dj
U+0064 U+007A U+0068 <> U+045F ; dzh
U+0064 U+0031 <> U+045F ; d1
U+0065 <> U+0435 ; e
U+0066 <> U+0444 ; f
U+0067 <> U+0433 ; g
U+0068 <> U+0445 ; h
U+0069 <> U+0438 ; i
U+006A <> U+0458 ; j
U+006B <> U+043A ; k
U+006B U+0068 <> U+0445 ; kh
U+006C <> U+043B ; l
U+006C U+006A <> U+0459 ; lj
U+006D <> U+043C ; m
U+006E <> U+043D ; n
U+006E U+006A <> U+045A ; nj
U+006F <> U+043E ; o
U+0070 <> U+043F ; p
;U+0071 <> ; q
U+0072 <> U+0440 ; r
U+0073 <> U+0441 ; s
U+0073 U+0068 <> U+0448 ; sh
U+0074 <> U+0442 ; t
U+0075 <> U+0443 ; u
U+0076 <> U+0432 ; v
;U+0077 <> ; w
U+0078 <> U+0445 ; x
;U+0079 ; y
U+007A <> U+0437 ; z
U+007A U+0068 <> U+0436 ; zh

; Additional (for official translitteration)
U+0110 <> U+0402 ; Đ
U+0111 <> U+0452 ; đ
U+017D <> U+0416 ; Ž
U+017E <> U+0436 ; ž
U+0106 <> U+040B ; Ć
U+0107 <> U+045B ; ć
U+010C <> U+0427 ; Č
U+010D <> U+0447 ; č
U+0044 U+017D <> U+040F ; DŽ
U+0044 U+017E <> U+040F ; Dž
U+0064 U+017E <> U+045F ; dž
U+0160 <> U+0428 ; Š
U+0161 <> U+0448 ; š

然后处理它

teckit_compile ascii-to-serbian.map

这将生成一个文件ascii-to-serbian.tec,你可以将其放在 XeTeX 可以找到的任何位置(例如,在工作目录中)。然后创建以下测试文件:

\documentclass{article}
\usepackage{fontspec}
\setmainfont[Ligatures=TeX]{Linux Libertine O}
\newfontfamily{\serbianfont}[Mapping=ascii-to-serbian]{Linux Libertine O}
\usepackage{polyglossia}
\setmainlanguage{english}
\setotherlanguage[Script=Cyrillic]{serbian}

\begin{document}
Serbian alphabet again

\begin{serbian}
A B V G D DJ E Zh Z I J K L LJ M N NJ O P R S T C1 U F Kh C Ch D1 Sh

a b v g d dj e zh z i j k l m n nj o p r s t c1 u f kh c ch d1 sh
\end{serbian} 
\end{document}

之后的样本输出xelatex test.tex

在此处输入图片描述

注 1:字符Џџ也可以输入为DZH(或Dzh)和dzh。如果这是不正确的(这可能会导致不正确的连字),则从 中删除相应的行ascii-to-serbian.map

注 2:如果您发现输入不方便,C1无法c1获得 Ћ 和 ћ,您可以添加以下行

U+0027 U+0043 <> U+040B ; 'C

U+0027 U+0063 <> U+040B ; 'c

C1c1条目之后。这将允许您输入字符为'C'c

如果您想将它们输入为\'C\'c,请在使用 Polyglossia 加载塞尔维亚语后插入此代码

\let\standardcommandquote\'
\DeclareRobustCommand{\serbiancommandquote}[1]{%
  \ifnum\strcmp{#1}{c}=0 c1\else
    \ifnum\strcmp{#1}{C}=0 C1\else
      \standardcommandquote{#1}\fi\fi}
\makeatletter
\appto\blockextras@serbian{\let\'\serbiancommandquote}
\appto\inlineextras@serbian{\let\'\serbiancommandquote}
\appto\noextras@serbian{\let\'\standardcommandquote}
\makeatother

注 3(2 月 17 日添加):如果有可用的 Unicode 输入,那么

Đ đ Ž ž Ć ć Č č DŽ Dž dž Š š

映射到

Ђ ђ Ж ж Ћ ћ Ч ч Џ џ Ш ш

分别。

答案2

只要您仅对塞尔维亚语使用 ascii,您的示例经过一些更改后也适用于 xelatex(和 lualatex)。(如果您在塞尔维亚语之外使用非 ascii 字符,则文件应以 utf8 编码):

\documentclass{article}
\usepackage[OT2]{fontenc}
\input{cyracc.def}
\usepackage{fontspec}
\setmainfont{Arial} % to see the difference
\newcommand\textcyr[1]{{\fontencoding{OT2}\fontfamily{wncyr}\selectfont #1}}
\begin{document}
Serbian alphabet again \dots \textcyr{\cyracc
A B V G D DJ E Zh Z I J K L LJ M N NJ O P R  S T \'C U F Kh C Ch \Dzh\ Sh
} roman text again
\end{document}

cyracc.def 中的定义可能会在较长的文档中产生不必要的副作用。如果您需要连字符,也可能会遇到问题。

但使用此输入,您无法利用 xetex/luatex 的优势。您没有使用真正的 unicode 输入,也没有使用西里尔字母的系统字体 - 您可以使用的字体仅限于 OT2 编码字体。

因此,如果您真的想坚持使用 ASCII 输入,您最好使用 egreg 提到的映射(仅适用于 xelatex)。但我认为,从长远来看,想要使用两个或更多脚本的人最好学习如何切换键盘布局,以便他们可以直接输入字符。对于短文本,您可以在网上找到虚拟键盘。

答案3

使用 LuaLaTeX,可以使用 opentype 功能文件模拟 xetex 的映射功能。与 xetex 映射功能相反,它不是基于用一个 unicode 字符替换另一个 unicode 字符,而是基于用字体中使用的字形名称替换 unicode 字符。这里有一个使用示例gentium

\documentclass{article}
\usepackage{fontspec}
\usepackage[serbian]{babel}
\def\Dzh{Dzh}
\def\Sh{Sh}
\setmainfont{Gentium Plus}
\newfontfamily\serbianfont[RawFeature=+gsub,FeatureFile=serb.fea,Script=Cyrillic]
{Gentium Plus}
\begin{document}
\noindent српска ћирилица\\
Hello world\\
\serbianfont
Hello world\\
A B V G D DJ E Zh Z I J K Kh L LJ M N NJ O P R S T Th U F Kh C Ch \Dzh\Sh\\
a b v g d dj e zh z i j k kh l lj m n nj o p r s t th u f kh c ch dzh sh
\end{document}

和功能文件serb.fea

languagesystem cyrl SRB;
languagesystem cyrl DFLT;

feature liga {
  sub C H  by Checyrillic;
  sub C h by Checyrillic;
  sub D J by Djecyrillic;
  sub D j by Djecyrillic;
  sub D z h by Dzhecyrillic;
  sub D Z H by Dzhecyrillic;
  sub K H by Khacyrillic;
  sub K h by Khacyrillic;
  sub L J by Ljecyrillic;
  sub L j by Ljecyrillic;
  sub N j by Njecyrillic;
  sub N J by Njecyrillic;
  sub S H by Shacyrillic;
  sub S h by Shacyrillic;
  sub Z H by Zhecyrillic;
  sub T h by Tshecyrillic;
  sub T H by Tshecyrillic;
  sub Z h by Zhecyrillic;
  sub c h by checyrillic;
  sub t h by tshecyrillic;
  sub d j by djecyrillic;
  sub d z h by dzhecyrillic;
  sub k h by khacyrillic;
  sub l j by ljecyrillic;
  sub n j by njecyrillic;
  sub s h by shacyrillic;
  sub z h by zhecyrillic;
} liga;

feature gsub {
 sub A by Acyrillic;  
 sub B by Becyrillic;
 sub C by Vecyrillic;
 sub D by Decyrillic;
 sub E by Iecyrillic;
 sub F by Efcyrillic;
 sub G by Gecyrillic;
 sub H by Khacyrillic;
 sub I by Iicyrillic;
 sub J by Jecyrillic;
 sub K by Kacyrillic;
 sub L by Elcyrillic;
 sub M by Emcyrillic;
 sub N by Encyrillic;
 sub O by Ocyrillic;
 sub P by Pecyrillic;
 sub R by Ercyrillic;
 sub S by Escyrillic;
 sub T by Tecyrillic;
 sub U by Ucyrillic;
 sub V by Vecyrillic;
 sub X by Khacyrillic;
 sub Z by Zecyrillic;
 sub a by acyrillic;
 sub b by becyrillic;
 sub c by tsecyrillic;
 sub d by decyrillic;
 sub e by iecyrillic;
 sub f by efcyrillic;
 sub g by gecyrillic;
 sub h by khacyrillic;
 sub i by iicyrillic;
 sub j by jecyrillic;
 sub k by kacyrillic;
 sub l by elcyrillic;
 sub m by emcyrillic;
 sub n by encyrillic;
 sub o by ocyrillic;
 sub p by pecyrillic;
 sub r by ercyrillic;
 sub s by escyrillic;
 sub t by tecyrillic;
 sub u by ucyrillic;
 sub v by vecyrillic;
 sub x by khacyrillic;
 sub z by zecyrillic;
} gsub;

示例输出

相关内容