是否有任何包可以将数字拼音(例如 dian4 nao3)转换为带音调符号的 UTF-8 拼音(例如 diàn nǎo)?
我找到了这个 (https://stackoverflow.com/a/8200388/2421048) Python 脚本,因此使用 directlua 和 LuaLaTeX 应该可以。
答案1
我已经将问题中提到的 Python 脚本翻译成了 Lua。使用 LuaLaTex 编译它,它应该可以工作:
\documentclass{article}
\usepackage{fontspec}
\usepackage{luacode}
\begin{luacode*}
PinyinToneMark = {
{'ā', 'á', 'ǎ', 'à'},
{'ē', 'é', 'ě', 'è'},
{'ī', 'í', 'ǐ', 'ì'},
{'ō', 'ó', 'ǒ', 'ò'},
{'ū', 'ú', 'ǔ', 'ù'},
{'ǖ', 'ǘ', 'ǚ', 'ǜ'}
}
function convertPinyin(str)
if str~= nil and string.len(str)>0 then
local s = string.lower(str)
local r = ''
local t = ''
for i = 1, string.len(s) do
local c = s:sub(i,i)
if c >= 'a' and c <= 'z' then
t = t .. c
elseif c >= '0' and c <= '5' then
local tone = tonumber(c)
if tone ~= 0 then
if string.find(t, 'a') ~= nil then
t = string.gsub(t, "a", PinyinToneMark[1][tone])
elseif string.find(t, 'e') ~= nil then
t = string.gsub(t, "e", PinyinToneMark[2][tone])
elseif string.find(t, 'i') ~= nil then
t = string.gsub(t, "i", PinyinToneMark[3][tone])
elseif string.find(t, 'o') ~= nil then
t = string.gsub(t, "o", PinyinToneMark[4][tone])
elseif string.find(t, 'u') ~= nil then
t = string.gsub(t, "u", PinyinToneMark[5][tone])
elseif string.find(t, 'v') ~= nil then
t = string.gsub(t, "v", PinyinToneMark[6][tone])
end
end
r = r .. t
t = ""
end
end
tex.print(r)
end
end
\end{luacode*}
\begin{document}
\directlua{convertPinyin("dian4 nao3")}
\end{document}
此版本仅处理 v,而不处理 u: 或 ü。
答案2
如果要使用xeCJK
包,则需要 XeLaTeX 而不是 LuaLaTeX。
您可以使用xpinyin
拼音包。
\documentclass{article}
\usepackage{xeCJK}
\setCJKmainfont{SimSun}
\usepackage{xpinyin}
\begin{document}
电脑 \pinyin{dian4 nao3}
\end{document}
你会得到
电脑 diàn nǎo
答案3
在 Luatex 中,转换无需额外的软件包即可实现。下面是一些示例代码,用于操作输入字符串。请注意,我对中文一无所知,对拼音也不了解。我所做的只是按照WP 入口 这看起来相当简单。因此我预计转换需要进一步调整。如果它产生不正确的结果,请用更多示例扩展您的问题。
这是主要的 TeX 文档(纯文本,但它应该转换为 Latex):
%% load some font that covers the diacritic marks
\input luaotfload.sty
\font\diacritics = "file:lmroman10-regular.otf:mode=node" at 10pt
\diacritics
%% --------------------------------------------------------------------
%% load conversion routines; adjust filename here
\directlua{dofile "\jobname.lua"}
%% wrap converter in a TeX macro
\protected\def\convertpinyin#1{%
%% switch to appropriate hyphenation pattern goes here
\directlua{packagedata.pinyintones.convert ([==[#1]==])}%
}
%% --------------------------------------------------------------------
%% demo
\def\showtest#1{((#1) (\convertpinyin{#1}))\par}
\def\testa{dian4 nao3}
\def\testb{ma ma1 ma2 ma3 ma4}
\showtest\testa
\showtest\testb
\bye
它会加载同名的 Lua 文件,但您可以将调用更改
dofile()
为适合您的设置的内容。以下是代码:
local utf = utf or require "unicode.utf8"
local lpeg = require "lpeg"
local unpack = unpack or table.unpack
local type = type
local iowrite = io.write
local stringformat = string.format
local tableconcat = table.concat
local utfchar = utf.char
local texsprint = tex.sprint
local C, Cg, Ct = lpeg.C, lpeg.Cg, lpeg.Ct
local P, R, S, lpegmatch = lpeg.P, lpeg.R, lpeg.S, lpeg.match
packagedata = packagedata or { }
packagedata.pinyintones = packagedata.pinyintones or { }
local pinyintones = packagedata.pinyintones
-----------------------------------------------------------------------
--- conversion
-----------------------------------------------------------------------
local toneno = R"05"
local consonant = S"bcdfghjklmnpqrstvwxyz" + S"BCDFGHJKLMNPQRSTVWXYZ"
local vowel = S"aeiou" + S"AEIOU" + "ü" + "Ü"
local nucleus = Ct(C(vowel)^1)
local syllable = Cg((consonant^1)^-1, "onset")
* Cg(nucleus, "nucleus")
* Cg((consonant^1)^-1, "coda")
* Cg(toneno^-1, "tone")
local skip = (1 - syllable)^1 --- keep this stuff
local pinyin = Ct((Ct(syllable) + C(skip))^1)
local vowelpositions = function (nucleus)
local tmp = { }
for pos, vowel in next, nucleus do
tmp[vowel] = pos
end
return tmp
end
local precedence = { "a", "e", "o" }
local cmacron = utfchar (0x0304)
local cacute = utfchar (0x0301)
local ccaron = utfchar (0x030c)
local cgrave = utfchar (0x0300)
local todiacritic = function (str)
local result = { }
local analyzed = lpegmatch (pinyin, str)
--inspect (analyzed)
for i = 1, #analyzed do
local elm = analyzed[i]
local t = type (elm)
if t == "table" then --- syllable
local nucleus = elm.nucleus
local nvowels = #nucleus
local tone = elm.tone
if tone then
tone = tonumber (tone)
end
if not tone then --- add unmodified
result[#result+1] = elm.onset
result[#result+1] = tableconcat (nucleus)
result[#result+1] = elm.coda
else
local tonified
if nvowels == 1 then --- single vowel receives tone
local vowel = nucleus[1]
if tone == 1 then --- inlined for performance
tonified = vowel .. cmacron
elseif tone == 2 then
tonified = vowel .. cacute
elseif tone == 3 then
tonified = vowel .. ccaron
elseif tone == 4 then
tonified = vowel .. cgrave
else
tonified = vowel
end
elseif nvowels > 1 then
local positions = vowelpositions (nucleus)
local pos
--- 1) locate correct vowel
for j = 1, 3 do
pos = positions[precedence[j]]
if pos then
break
end
end
if not pos then -- iu or ui, thus second gets tone
pos = 2
end
local vowel = nucleus[pos]
--- 2) place tone mark
if tone == 1 then
nucleus[pos] = vowel .. cmacron
elseif tone == 2 then
nucleus[pos] = vowel .. cacute
elseif tone == 3 then
nucleus[pos] = vowel .. ccaron
elseif tone == 4 then
nucleus[pos] = vowel .. cgrave
end
tonified = tableconcat (nucleus)
else --- no vowel, could be mismatch
tonified = ""
end
result[#result+1] = elm.onset
result[#result+1] = tonified
result[#result+1] = elm.coda
end
elseif t == "string" then
result[#result+1] = elm
end
end
return tableconcat (result)
end
-----------------------------------------------------------------------
--- export
-----------------------------------------------------------------------
pinyintones.convert = function (str)
local converted = todiacritic (str)
if converted then
texsprint (converted)
end
end
结果:
为了方便起见,我创建了一个 要旨的代码。