我正在尝试使用 SBL 手册的转录惯例将希伯来语文本音译为拉丁语。我几乎成功了,但有一些更高级的音译规则我很难实现。请参阅下面的问题。
希伯来语是从右向左书写的,希伯来语字母由辅音组成,例如 ידלת (直译为 yḏlṯ)。元音加在辅音下面,例如 ְ ַ ָ (ā a ĕ )。使用希伯来语键盘书写时,通常先输入每个辅音,然后再输入元音(如果有)。上例中的最后一个元音(ְ )称为 shva (直译为 ĕ )。在音译中,在某些情况下应将其省略,即当
- 最后一个字母(辅音)上有 shva。
- 单词末尾有两个 shva(在最后两个辅音上)。
例如:希伯来语文本:יָדַלְתְּ
我的音译:yāḏalĕtĕ
求音译:yād̲alt
我的音译方案使用 polyglossia 中的自定义映射,其中我只需将单个希伯来语 Unicode 字符或此类字符序列翻译为 .map 文件中的拉丁字符,该文件稍后由 SIL 使用 teckit_compile 进行编译,并由 polyglossia 中的映射函数进行修改。我很难实现上述规则 2,因为它依赖于另一个先前辅音上 shva 的存在。由于映射文件基于 Unicode 数字序列,因此为了实现规则 2,我实际上必须列出 shva、辅音和 shva 的每种组合。
有没有什么技巧可以避免这种情况,而是以某种方式引用前一个辅音上任何现有的 shva,即向后两个 unicode 字符?
梅威瑟:
\documentclass[12pt]{article}
\usepackage{fontspec}
\usepackage{polyglossia}
\setmainlanguage[]{english}
\setotherlanguage{hebrew}
\setmainfont{Latin Modern Roman}
\newfontfamily{\hebrewfont}{Narkisim}
\newfontface\heblatrans[Mapping=my-hebrew-to-latin,FakeSlant=0.3]{Latin Modern Roman} % defining font and mapping for the translitteration.
\DeclareTextFontCommand{\texthlt}{\heblatrans}
\begin{document}
\section*{Hebrew to latin transliteration}
Hebrew is written right to left, and the hebrew script consists of consonants such as \texthebrew{ידלת} (ּ\textit{translit.} \texthlt{ידלת}). Vowels are added below the consonant, such as \texthebrew{ָ ַ ְ } (\texthlt{ָ ַ ְ }). When written with a hebrew keyboard each consonant is usually typed first, then the vowel (if any). The last vowel in the previous example \mbox{(\texthebrew{ְ } )} is called \emph{shva}. In transliteration it should in certain cases be left out, namely when
\begin{enumerate}
\item A \emph{shva} is on the last letter (consonant).
\item A \emph{shva} is the second of two \emph{shva}s at the end of a word.
\end{enumerate}
\begin{description}
\item[Hebrew text] \texthebrew{יָדַלְתְּ}
\item[My transliteration] \texthlt{יָדַלְתְּ}
\item[Wanted transliteration] \textit{yād̲alt}
\end{description}
My transliteration scheme uses a custom \verb*|Mapping| in \verb*|polyglossia| in which I simply translate single hebrew unicode characters or sequences of such to latin characters in a \verb*|.map|-file, which is later compiled with \verb*|teckit_compile| by SIL and amended by the \verb*|Mapping| function in \verb*|polyglossia|. I am struggling to implement rule 2 above, as it is dependent on the existence of a \emph{shva} on another previous consonant. Since the mapping file is based on sequences of unicode numbers, I would in practice have to list every combination of \emph{shva}, consonant and \emph{shva} that there is. Is there a trick to avoid this, and instead somehow make reference to any existing \emph{shva} on the previous consonant, i.e. two unicode characters backwards?
\end{document}
; TECkit mapping for TeX input conventions <-> Unicode characters
LHSName "hebrew-to-latin"
RHSName "UNICODE"
pass(Unicode)
; ligatures from Knuth's original CMR fonts
U+002D U+002D <> U+2013 ; -- -> en dash
U+002D U+002D U+002D <> U+2014 ; --- -> em dash
U+0027 <> U+2019 ; ' -> right single quote
U+0027 U+0027 <> U+201D ; '' -> right double quote
U+0022 > U+201D ; " -> right double quote
U+0060 <> U+2018 ; ` -> left single quote
U+0060 U+0060 <> U+201C ; `` -> left double quote
U+0021 U+0060 <> U+00A1 ; !` -> inverted exclam
U+003F U+0060 <> U+00BF ; ?` -> inverted question
; additions supported in T1 encoding
U+002C U+002C <> U+201E ; ,, -> DOUBLE LOW-9 QUOTATION MARK
U+003C U+003C <> U+00AB ; << -> LEFT POINTING GUILLEMET
U+003E U+003E <> U+00BB ; >> -> RIGHT POINTING GUILLEMET
; hebrew to latin consonants
U+05D0 <> U+02BE ; alef
U+05D1 <> U+1E07 ; vet
U+05D1 U+05BC <> U+0062 ; bet
U+05D2 <> U+1E21 ; gimel
U+05D2 U+05BC <> U+0067; gimel dagesh
U+05D3 <> U+1E0F; dalet
U+05D3 U+05BC <> U+0064; dalet dagesh
U+05D4 <> U+0068; he
U+05D4 U+05BC <> U+0068; he dagesh
U+05D5 <> U+0077; vav
U+05D6 <> U+007A; zayin
U+05D6 U+05BC <> U+007A; zayin dagesh
U+05D7 <> U+1E25; chet
U+05D8 <> U+1E6D; tet
U+05D8 U+05BC <> U+1E6D; tet dagesh
U+05D9 <> U+0079; yod
U+05D9 U+05BC <> U+0079; yod dagesh
U+05DB <> U+1E35; chaf
U+05DA <> U+1E35; chaf sofit
U+05DB U+05BC <> U+006B; kaf
U+05DA U+05BC <> U+006B; kaf sofit
U+05DC <> U+006C ; lamed
U+05DC U+05BC <> U+006C ; lamed dagesh
U+05DE <> U+006D; mem
U+05DE U+05BC <> U+006D; mem dagesh
U+05DD <> U+006D; mem sofit
U+05E0 <> U+006E; nun
U+05E0 U+05BC <> U+006E; nun dagesh
U+05DF <> U+006E; nun sofit
U+05E1 <> U+0073; samek
U+05E1 U+05BC <> U+0073; samek dagesh
U+05E2 <> U+02BF; ayin
U+05E4 <> U+0070 U+0304; fei
U+05E4 U+05BC <> U+0070; pei
U+05E3 <> U+0070 U+0304; fei sofit
U+05E3 U+05BC <> U+0070; pei sofit
U+05E6 <> U+1E63; tzadi
U+05E6 U+05BC <> U+1E63; tzadi dagesh
U+05E5 <> U+1E63; tzadi sofit
U+05E7 <> U+0071; kuf
U+05E7 U+05BC <> U+0071; kuf dagesh
U+05E8 <> U+0072; reish
U+05E8 U+05BC <> U+0072; reish dagesh
U+05E9 U+05C1 <> U+0161; shin
U+FB2A <> U+0161; shin alt
U+05E9 U+05C1 U+05BC <> U+0161; shin dagesh
U+FB2C <> U+0161; shin dagesh alt
U+05E9 U+05BC U+05C1 <> U+0161; shin dagesh 2
U+05E9 U+05C2 <> U+015B; sin
U+05E9 U+05C2 U+05BC <> U+015B; sin dagesh
U+05E9 U+05BC U+05C2 <> U+015B; sin dagesh 2
U+05EA <> U+1E6F; tav
U+05EA U+05BC <> U+0074; tav dagesh
;hebrew to latin vowels
U+05B0 <> U+0115; shva
U+05B1 <> U+0115; chataf segol
U+05B2 <> U+0103; chataf patach
U+05B3 <> U+014F; chataf kamatz
U+05B4 <> U+0069; chirik
U+05B5 <> U+0113; tzeire
U+05B6 <> U+0065; segol
U+05B7 <> U+0061; patach
U+05B8 <> U+0101; kamatz gadol
U+05B9 <> U+014D; cholam
U+05BB <> U+0075; kubutz
U+05D5 U+05BC <> U+00FB; shuruk
U+05D5 U+05B9 <> U+00F4; full holem
U+05B4 U+05D9 <> U+00EE; hireq yod
U+05B4 U+05D9 U+05BC <> U+00EE U+0079; hireq yod2
U+05B8 U+05D4 <> U+00E2; final qamets he
;hebrew diphtongs
U+05B5 U+05D9 <> U+00EA; tzeire yud
U+05B6 U+05D9 <> U+00EA; segol yud
U+05B7 U+05D9 <> U+0061 U+0069; patach yud
U+05B7 U+05D9 <> U+0061 U+0069; patach yud
U+05B8 U+05D9 U+05B0 <> U+0101 U+0069; kamatz gadol yud
U+05B8 U+05D9 <> U+0101 U+0069; kamatz katan yud
U+05B9 U+05D9 U+05B0 <> U+014D U+0069; cholam yud
U+05B9 U+05D9 <> U+014D U+0069; cholam yud
U+05BB U+05D9 U+05B0 <> U+0075 U+0069; kubutz yud
U+05BB U+05D9 <> U+0075 U+0069; kubutz yud
U+05D5 U+05BC U+05D9 U+05B0 <> U+00FB U+0020; shuruk yud
;hebrew qamets chatuph
;U+05B8
;hebrew misc symbols
U+05BE <> U+002D; maqaf (hyphen)
U+05BC <> U+2060; dagesh alone replaced with word joiner
U+05C3 <> U+2060; sof pasuq replaced with word joiner
;hebrew to latin accents
U+0591 <> U+2060; accent replaced with word joiner
U+0592 <> U+2060; accent replaced with word joiner
U+0593 <> U+2060; accent replaced with word joiner
U+0594 <> U+2060; accent replaced with word joiner
U+0595 <> U+2060; accent replaced with word joiner
U+0596 <> U+2060; accent replaced with word joiner
U+0597 <> U+2060; accent replaced with word joiner
U+0598 <> U+2060; accent replaced with word joiner
U+0599 <> U+2060; accent replaced with word joiner
U+059A <> U+2060; accent replaced with word joiner
U+059B <> U+2060; accent replaced with word joiner
U+059C <> U+2060; accent replaced with word joiner
U+059D <> U+2060; accent replaced with word joiner
U+059E <> U+2060; accent replaced with word joiner
U+059F <> U+2060; accent replaced with word joiner
U+05A0 <> U+2060; accent replaced with word joiner
U+05A1 <> U+2060; accent replaced with word joiner
U+05A2 <> U+2060; accent replaced with word joiner
U+05A3 <> U+2060; accent replaced with word joiner
U+05A4 <> U+2060; accent replaced with word joiner
U+05A5 <> U+2060; accent replaced with word joiner
U+05A6 <> U+2060; accent replaced with word joiner
U+05A7 <> U+2060; accent replaced with word joiner
U+05A8 <> U+2060; accent replaced with word joiner
U+05A9 <> U+2060; accent replaced with word joiner
U+05AA <> U+2060; accent replaced with word joiner
U+05AB <> U+2060; accent replaced with word joiner
U+05AC <> U+2060; accent replaced with word joiner
U+05AD <> U+2060; accent replaced with word joiner
U+05AE <> U+2060; accent replaced with word joiner
U+05AF <> U+2060; accent replaced with word joiner
my-hebrew-to-latin.tec(由 my-hebrew-to-latin.map 的 teckit_compile 生成)
zQmp "ðxÚíYIpUE½oJ@!$@~H„A‘1„€€œ¨²
‰@AŠÂT
Å‚….P£¢¢‚¢¢8 2$•º`ÁBF™D&Rå‚…UH}Ïíwþÿ_Q¬4]uþ;·O÷}Ýýú
÷þEe5âŠ#"O›_… FâÖ `¢Ñ¥ÃE3«*–VV-]X´°jvÍœÅK*+ªµeêô©“ÆM+/Ó‹J¦‹§n:ÔJXà#uPh+Ú¬iÛè›óD¯;Q‰þ‹k¦ÞZnT’¡2L†Ë)£¤X–-cä+ã¤TÆËy»t’L–ÇdŠ”ÉT™&Ëò¤<%å~ìÀÁâKo¹WúH )ð™*â÷ø9†üéÙûIºt”é„]ž)ù’%¥«t‘l)nrŸt—Ò_r¤§\é%ìþ¼Ûº>÷Hß8ogiæ>þóÒ^oQi÷ÏÞ¶uƒÞq¥(j]†»¡\¢xF]ñ£)’õ$
v:Ž™@ÔwzDÉr^@ÈC}o Ð( ú…À@`ð 0
FÅ@ 0”€‰ÀdœsŠH$¯ç"¹údY à%ïî„=/û
¢3·½3-Üq‘ò¦ôä]PÌï—™@€oÑï’@#°8\ÅpœŽ@0À:8Ï 5ÀÀj ýôsö G€sÀÕð“ÉE_7à[Ç<`<.Îë®^Ö €-À·À Îï^~ñðüö²€> üxðãaÆïÍž0sðÖ›ŒÇÃ<<øñàÇû]ÄO²ÌÅÇ»Í/0ãñ1ãñ1ãñ1¿Ø«ßd£ üË5 ‚:1Më,»^×Ï¡
þœg¸Å€îY®U;3l\v…>
«NÏ¿+©n7ÛÍHj·[ë\ËŸö³ìÝì£sÛÂyé® ËÙÎÔkçë#•ìS—XG¿‰+ÙGëÂ>ãZíbmè§Nmú©·ü”ÑO¼>VÇ~õj³_ƒu-–Yý8æeV¿µ¹Öù
½—Úàݨ•+`÷¤¦<›Z^hËzÚ£iï¤]˜aîH.íâŒð>ÛD»D»ë´ëý,ǸϑTjû©Í¥¶ZµÔfS; -ÚAjó¨éu˜Gí0µÔ+§v$ÔrP;¢œZl_,§vÔÚǩͧvý¢v,I;fi'¨UQÓgMµ“Ôbc9©œÚ)jÕÔN)§všÚj§•S;GÆ ÷È9åx&øÔÏ&ég“ôó\ŸYô}^9µì»ˆÚåÔ.R[LMcÅÅÔš¨-¥ÖŸ©må:7QÛŠC&µmIÚ6ð9ÜûÛ©UX÷C³:öÙ®v"&Íj“ˆIÛ
mEž¹í÷À¾‰¯ã•öa›nÿár?w†Ý¦Ï*Qš >Ó(¯+b´Á8æ1Ô|E"·¾xi,Øèe¾n^éÐB]Æ
útâ1ý&EÍÍËPí¨°yÔ›jq܈›ÿºhô<ð6.ÌIµTX¼kdqËK›xìŸ&÷0§ÐN:Þ#ÞzqZË]Z2¥ß6¢¬$;ûoôéÂcÎ
ÚEná¸ó[¨Ëe^ µüÝœ”grRmM^Ê“tÔuŠ:’èV>ŽzËqì
dÝ€î€æ¬ðUü ìÓüŽû5‡…ã“Çrõú‡4…ã-æ²<Ö<Úœ×\Ö`T4‡aE}ü×<ø×À7bbx'‚mÐ&“4¿{G4Os\¦täÔoŠ1À3G+ÕÀZ@s{ 1/ó¯±4 óD¥@9óD«€Z`€~Î÷ÀA+O¤)ÜO.úºC¬<ÑL扞gž¨–y"Äü.b'w/óD™'ò™'ÂØ=ìgñ·7x˜
hL‡±x/k€ Æãa<Æã!žô®†Ÿy>üø¹øXcñ±þRà –9"ô÷¿Õ<€¸ÙÇœü_ß±¦ˆ¡ø ôzb<|X— sà/€¿ þŒ)À˜Œ)€OÍihl`}5†×87À˜c
—‚ë“‚=”R a~¡&ÃÌ?8—dŸ
cZg5î
-û¬ÚhïÁ¾,çþ8jµ»«ãI© *ÅÄ-×1wR¡vèÛq¨×Ñ÷æ°_óú°Ÿæ‰‚:ö+£^ŸÔ¯Y=û•©Í¹,£Þм_ózŽs™Úã‚_ËðÞãú›¾\‡k±úÐŽdòß„"Çœ3’³Y§mrø¼Í§Ñ”æŒÊés½‰¿öNðÑ ;{.RßÄÜm1mÍkiN©„ö³bòHñk7WLî(nÏ“/2¶Î]÷ßAjó¸×bÚ
ðÃÔOh¹À„ZÈ~˹¶nŒ[ý}Ç-~Œíæ‹ÉÅÇ©ö1Ë'žÁ j†[šžÿ$5Ó´ŸÈ«õ>b»jÞS–ö3ùðÓlg¸5þYàç9oÓ´_ècøú0Ü:×bð‹ÔOÌ[ÿ?.»¸§a_ë–ðeôÂ$½‘ýÕßÖíf›¸Mv]#Ûé¸ô™Ôd§3ïƒÿŠí®òž¨§}†÷rlO5ñ]sÔ²·%î=ý_"vÿ³øËÅâ¯Z¼Öâ¯Yüu‹¿añ5ÓâoYüm‹¯µø;×âïY|Å×[ü}‹`ñ
ÿÐâYüc‹o´ø'ÿÔâ›,þ™Å?·øÿòÂ[Kd
答案1
我发现最好的解决方案是在 LaTeX 之外。我改为在 Python 脚本中解析 .tex 文件,然后编译此文档。
参见下面的python脚本:
# This python script takes in a string (either word or sentence(s)) and removes the sheva on the last consonant, and second last consonant (if there is a sheva on the last consonant). It does so irrespective of any dagesh which might throw it off.
import re
def remove_sheva(text):
# Defining dagesh and sheva
sheva = '\u05B0'
dagesh = '\u05BC'
# Splitting text in single words
words = text.split()
# Treating each word separately
new_words = []
for word in words:
# Finding indicies for all consonants in the word
consonant_indices = [m.start() for m in re.finditer(r'[\u05D0-\u05EA]', word)]
# If no consonants in the word, adding them in list over nye words without any change
if not consonant_indices:
new_words.append(word)
continue
# Finding indicies of last and second last consonant.
last_consonant_index = consonant_indices[-1]
second_last_consonant_index = consonant_indices[-2] if len(consonant_indices) > 1 else None
# Checking for any sheva after last consonant
sheva_after_last_consonant = False
if last_consonant_index + 1 < len(word) and word[last_consonant_index + 1] == sheva:
sheva_after_last_consonant = True
word = word[:last_consonant_index + 1] + word[last_consonant_index + 2:]
elif last_consonant_index + 2 < len(word) and word[last_consonant_index + 1] == dagesh and word[last_consonant_index + 2] == sheva:
sheva_after_last_consonant = True
word = word[:last_consonant_index + 2] + word[last_consonant_index + 3:]
# Removing sheva between the two last consonants, but only if there is a sheva on last consonant.
if second_last_consonant_index is not None and sheva_after_last_consonant:
if second_last_consonant_index + 2 < len(word) and word[second_last_consonant_index + 1] == dagesh and word[second_last_consonant_index + 2] == sheva:
word = word[:second_last_consonant_index + 2] + word[second_last_consonant_index + 3:]
elif second_last_consonant_index + 1 < len(word) and word[second_last_consonant_index + 1] == sheva:
word = word[:second_last_consonant_index + 1] + word[second_last_consonant_index + 2:]
new_words.append(word)
# Joining words again to create sentence.
new_text = ' '.join(new_words)
return new_text