如何在 LaTeX 中排版古波斯文字

Parsoomash这是用MS Word 中的字体排版的阿契美尼德文(古波斯语)中的一个简单单词:




使用 XeLaTeX。该字体将字形映射到拉丁字母槽中,因此您必须自己弄清楚对应关系。




This is ancient Persian script {\oldpersian ABCDFG}







This is ancient Persian script {\oldpersian ABCDFG}

\advance\count255 1
\hbox{\hbox to 1em{\symbol{\count255}\hss}\hbox{\oldpersian\symbol{\count255}}}





(1) oldprsn2005 年的字体包(与其他旧字体捆绑在一起archaic)可能会有所帮助。它为 Tex 定义了一个特殊的字体系列。


使用 Xelatex、fontspec(一种包含属于古波斯语代码块的字形的 unicode 字体)、字体映射文件以便通过“普通”字符组合轻松输入,并调整 egreg 的赫梯语音译解决方案(如何修改自定义环境中的单个(范围)字符?),我们得到:




\setmainfont{Linux Libertine} 

\newcommand\operfont{Noto Sans Old Persian}
\newfontface\fmoper[Mapping=\opermapping ,Colour=\opercolour]{\operfont}



  \tl_set:Nn \l_xander_oper_tl { #1 }
  % change every run of lowercase letters into italic
   { [a-z]+ }
   { \c{textit}\cB\{\0\cE\} }
% change every xx to name
   { ([xx]){2,2} }
   { |Xshaayathiya| }
  % change every ch into c U+030c
   { ([ch]){2,2} }
   { c \x{030c} }
  % change every th into θ = U+03b8
   { ([th]){2,2} }
   { \x{03b8} }
  % change every ss into c U+0327
   { ([ss]){2,2} }
   { c \x{0327} }
   % change every s into s U+030C
   { ([sh]){2,2} }
   { s \x{030c} }

   % change every am into AM
   { ([aur]){3,3} }
   { |AuraMazda| }
   % change every dah into Dah
   { ([dah]){3,3} }
   { |Dahya \x{0304}ush| }
   % change every baga into Baga
   { ([baga]){4,4} }
   { |BAGA| }
   % change every buu into Buumish
   { ([buu]){3,3} }
   { |Buumish| }

%  % change every double vowel aa/ into a/ U+0301
%  \regex_replace_all:nnN
%   { ([a,e,i,u]){2,2} }
%   { \1 \x{0301} }
%   \l_xander_oper_tl
%  % change |...| into \textsuperscript{...}
% change |...| into raised small-caps
   { \|([^|]+)\| }
   { \c{raisebox}\cB\{0.5ex\cE\}\cB\{\c{textsc}\cB\{\1\cE\}\cE\} 
%    \raisebox{0.5ex}{yyy}
%   { \c{textsuperscript}\cB\{\c{textsc}\cB\{\1\cE\}\cE\} 
%    }

   % change every buu into Buumish
   { ([Buumish]){7,7} }
   { Bu\x{0304}mis\x{030c} }
% \x{0304}

  % print the result
  \tl_use:N \l_xander_oper_tl


\newcommand\hstackon[1]{\Longstack{ \raisebox{1.12ex}{\eoper{#1}}  \begin{toper}#1\end{toper}}}  

\section{Old Persian \textoper{la-da div pa-aur-ra-sa-i-a-na}}
For heading, see ref\footnote{\texttt{oldprsn} package (2005) documentation: "Old Persian in the Old Persian script" "(as near as possible)"}. \textoper{mi-ta-ya-na}.





(0) Compile the map text file with teckit\_compile to produce a binary tec file.

\verb|teckit_compile foo.map foo.tec|

Call the mapping file in the font command:


(1) To typeset in cuneiform, use \textbackslash\texttt{textoper} and type the mapping syllables: 

\verb|\textoper{a}| $\to$ \textoper{a}.

\textoper{da-a-ra-ya-va-ha-u-sha-} (son of Darius).

(2) To typeset transliteration with diacritics, type the mapping syllables and use \textbackslash\texttt{eoper}: 

\verb|\eoper{a}| $\to$ \eoper{a}.


(3) To typeset ruby text, use \textbackslash\texttt{hstackon}:

\verb|\hstackon{a}| $\to$ \hstackon{a} (per syllable).

\hstackon{da-a-ra-ya-va-ha-u-sha-} (per word).

映射文件(使用 teckit_compile 进行编译):

; TECkit mapping for TeX input conventions <-> Unicode characters

LHSName "latin-to-oldpersian"


; ligatures from Knuth's original CMR fonts
U+002D U+002D           <>  U+2013  ; -- -> en dash
U+002D U+002D U+002D    <>  U+2014  ; --- -> em dash

U+0027          <>  U+2019  ; ' -> right single quote
U+0027 U+0027   <>  U+201D  ; '' -> right double quote
U+0022           >  U+201D  ; " -> right double quote

U+0060          <>  U+2018  ; ` -> left single quote
U+0060 U+0060   <>  U+201C  ; `` -> left double quote

U+0021 U+0060   <>  U+00A1  ; !` -> inverted exclam
U+003F U+0060   <>  U+00BF  ; ?` -> inverted question

; additions supported in T1 encoding
U+002C U+002C   <>  U+201E  ; ,, -> DOUBLE LOW-9 QUOTATION MARK
U+003C U+003C   <>  U+00AB  ; << -> LEFT POINTING GUILLEMET
U+003E U+003E   <>  U+00BB  ; >> -> RIGHT POINTING GUILLEMET

;U+0020    >  U+0020 ;  space maps to space
U+002D    >  U+200D ;  hyphen as Zero Width Joiner
U+002E    >  U+200D ;  dot as Zero Width Joiner
U+007C    >  U+200C ;  pipe as Zero Width Non-Joiner

U+0061         <>  U+103A0    ;  a 
