将数字拼音转换为带声调的拼音

将数字拼音转换为带声调的拼音

是否有任何包可以将数字拼音(例如 dian4 nao3)转换为带音调符号的 UTF-8 拼音(例如 diàn​ nǎo)?

我找到了这个 (https://stackoverflow.com/a/8200388/2421048) Python 脚本,因此使用 directlua 和 LuaLaTeX 应该可以。

答案1

我已经将问题中提到的 Python 脚本翻译成了 Lua。使用 LuaLaTex 编译它,它应该可以工作:

\documentclass{article}
\usepackage{fontspec}

\usepackage{luacode}
\begin{luacode*}
PinyinToneMark = {
  {'ā', 'á', 'ǎ', 'à'},
  {'ē', 'é', 'ě', 'è'},
  {'ī', 'í', 'ǐ', 'ì'},
  {'ō', 'ó', 'ǒ', 'ò'},
  {'ū', 'ú', 'ǔ', 'ù'},
  {'ǖ', 'ǘ', 'ǚ', 'ǜ'}
}

function convertPinyin(str)
  if str~= nil and string.len(str)>0 then
  local s = string.lower(str)
  local r = ''
  local t = ''
  for i = 1, string.len(s) do
    local c = s:sub(i,i)
    if c >= 'a' and c <= 'z' then
      t = t .. c
    elseif c >= '0' and c <= '5' then
        local tone = tonumber(c)
        if tone ~= 0 then
          if string.find(t, 'a') ~= nil then
              t = string.gsub(t, "a", PinyinToneMark[1][tone])
          elseif string.find(t, 'e') ~= nil then
              t = string.gsub(t, "e", PinyinToneMark[2][tone])
          elseif string.find(t, 'i') ~= nil then
              t = string.gsub(t, "i", PinyinToneMark[3][tone])
          elseif string.find(t, 'o') ~= nil then
              t = string.gsub(t, "o", PinyinToneMark[4][tone])
          elseif string.find(t, 'u') ~= nil then
              t = string.gsub(t, "u", PinyinToneMark[5][tone])
          elseif string.find(t, 'v') ~= nil then
              t = string.gsub(t, "v", PinyinToneMark[6][tone])
          end
        end
      r = r .. t
      t = ""
    end
  end
  tex.print(r)
  end
end
\end{luacode*}

\begin{document}
 \directlua{convertPinyin("dian4 nao3")}
\end{document}

此版本仅处理 v,而不处理 u: 或 ü。

答案2

如果要使用xeCJK包,则需要 XeLaTeX 而不是 LuaLaTeX。

您可以使用xpinyin拼音包。

\documentclass{article}
\usepackage{xeCJK}
\setCJKmainfont{SimSun}
\usepackage{xpinyin}
\begin{document}

电脑 \pinyin{dian4 nao3}

\end{document}

你会得到

电脑 diàn nǎo

答案3

在 Luatex 中,转换无需额外的软件包即可实现。下面是一些示例代码,用于操作输入字符串。请注意,我对中文一无所知,对拼音也不了解。我所做的只是按照WP 入口 这看起来相当简单。因此我预计转换需要进一步调整。如果它产生不正确的结果,请用更多示例扩展您的问题。

这是主要的 TeX 文档(纯文本,但它应该转换为 Latex):

%% load some font that covers the diacritic marks
\input luaotfload.sty
\font\diacritics = "file:lmroman10-regular.otf:mode=node" at 10pt
\diacritics

%% --------------------------------------------------------------------
%% load conversion routines; adjust filename here
\directlua{dofile "\jobname.lua"}

%% wrap converter in a TeX macro
\protected\def\convertpinyin#1{%
  %% switch to appropriate hyphenation pattern goes here
  \directlua{packagedata.pinyintones.convert ([==[#1]==])}%
}

%% --------------------------------------------------------------------
%% demo

\def\showtest#1{((#1) (\convertpinyin{#1}))\par}

\def\testa{dian4 nao3}
\def\testb{ma ma1 ma2 ma3 ma4}

\showtest\testa
\showtest\testb

\bye

它会加载同名的 Lua 文件,但您可以将调用更改 dofile()为适合您的设置的内容。以下是代码:

local utf                 = utf or require "unicode.utf8"
local lpeg                = require "lpeg"

local unpack              = unpack or table.unpack
local type                = type
local iowrite             = io.write
local stringformat        = string.format
local tableconcat         = table.concat
local utfchar             = utf.char
local texsprint           = tex.sprint

local C, Cg, Ct           = lpeg.C, lpeg.Cg, lpeg.Ct
local P, R, S, lpegmatch  = lpeg.P, lpeg.R, lpeg.S, lpeg.match

packagedata               = packagedata or { }
packagedata.pinyintones   = packagedata.pinyintones or { }
local pinyintones         = packagedata.pinyintones

-----------------------------------------------------------------------
---                           conversion
-----------------------------------------------------------------------

local toneno      = R"05"
local consonant   = S"bcdfghjklmnpqrstvwxyz" + S"BCDFGHJKLMNPQRSTVWXYZ"
local vowel       = S"aeiou" + S"AEIOU" + "ü" + "Ü"
local nucleus     = Ct(C(vowel)^1)
local syllable    = Cg((consonant^1)^-1, "onset")
                  * Cg(nucleus,          "nucleus")
                  * Cg((consonant^1)^-1, "coda")
                  * Cg(toneno^-1,        "tone")
local skip        = (1 - syllable)^1 --- keep this stuff
local pinyin      = Ct((Ct(syllable) + C(skip))^1)

local vowelpositions = function (nucleus)
  local tmp = { }
  for pos, vowel in next, nucleus do
    tmp[vowel] = pos
  end
  return tmp
end

local precedence = { "a", "e", "o" }

local cmacron = utfchar (0x0304)
local cacute  = utfchar (0x0301)
local ccaron  = utfchar (0x030c)
local cgrave  = utfchar (0x0300)

local todiacritic = function (str)
  local result   = { }
  local analyzed = lpegmatch (pinyin, str)
  --inspect (analyzed)
  for i = 1, #analyzed do
    local elm = analyzed[i]
    local t   = type (elm)

    if t == "table" then --- syllable
      local nucleus  = elm.nucleus
      local nvowels  = #nucleus
      local tone     = elm.tone
      if tone then
        tone = tonumber (tone)
      end

      if not tone then --- add unmodified
        result[#result+1] = elm.onset
        result[#result+1] = tableconcat (nucleus)
        result[#result+1] = elm.coda
      else
        local tonified
        if nvowels == 1 then --- single vowel receives tone
          local vowel = nucleus[1]
          if tone == 1 then --- inlined for performance
            tonified = vowel .. cmacron
          elseif tone == 2 then
            tonified = vowel .. cacute
          elseif tone == 3 then
            tonified = vowel .. ccaron
          elseif tone == 4 then
            tonified = vowel .. cgrave
          else
            tonified = vowel
          end

        elseif nvowels > 1 then
          local positions = vowelpositions (nucleus)
          local pos

          --- 1) locate correct vowel
          for j = 1, 3 do
            pos = positions[precedence[j]]
            if pos then
              break
            end
          end
          if not pos then -- iu or ui, thus second gets tone
            pos = 2
          end

          local vowel = nucleus[pos]

          --- 2) place tone mark
          if tone == 1 then
            nucleus[pos] = vowel .. cmacron
          elseif tone == 2 then
            nucleus[pos] = vowel .. cacute
          elseif tone == 3 then
            nucleus[pos] = vowel .. ccaron
          elseif tone == 4 then
            nucleus[pos] = vowel .. cgrave
          end

          tonified = tableconcat (nucleus)
        else --- no vowel, could be mismatch
          tonified = ""
        end
        result[#result+1] = elm.onset
        result[#result+1] = tonified
        result[#result+1] = elm.coda
      end

    elseif t == "string" then
      result[#result+1] = elm
    end

  end
  return tableconcat (result)
end

-----------------------------------------------------------------------
--- export
-----------------------------------------------------------------------

pinyintones.convert = function (str)
  local converted = todiacritic (str)
  if converted then
    texsprint (converted)
  end
end

结果:

拼音转换演示

为了方便起见,我创建了一个 要旨的代码。

相关内容