如何用音色从词典中排版中文?

如何用音色从词典中排版中文?

我已经安装了 MDBG Word 词典,它使用CC-CE词典数据库。我已经安装了Windows 版本但我们可以试试字典在线的,我使用 wordsleep作为测试用例。我们如何在 TeX 中排版搜索结果?

有两个主要问题:

  • 他们在处理汉字时使用音色。

  • 如果繁体中文和简体中文中的字形相同,则使用短划线代替。

我附上了两张截图,第一张是从 Windows 版本截取的,第二张是从 Internet 版本截取的。

Windows 版本:搜索术语的结果

网页版:搜索术语的结果

答案1

我对中文排版还很陌生,但是这个念头很诱人,所以我还是尝试了一下。

我已经安装了 Windows 版本的MDBG中文阅读器我可以在其中选择一个、多个或所有项目并将它们导出到文本文件(Ctrl+Alt+X),列由表格分隔。我猜这会类似于网页版的处理结果,我还没有尝试过。

我也下载并安装了花园字体以供以后使用(我使用HanaMinA.ttf它们来实际排版中文字符,并使用拉丁现代字体来排版所有其他文本部分)。

我正在逐行解析输入文件,并根据声调对汉字进行换行。我使用了五种颜色,并使用了以下数据:词典得到全部拼音声调标记。

这是我的测试用例文件(sleep.tsv):

simplified  traditional pinyin  definition  hsk
觉   覺   jiào    a nap; a sleep; CL:場|场[chang2]  HSK1
睡   睡   shuì    to sleep; to lie down   HSK1
香   香   xiāng   fragrant; sweet smelling; aromatic; savory or appetizing; (to eat) with relish; (of sleep) sound; perfume or spice; joss or incense stick; CL:根[gen1]   HSK3
眠   眠   mián    to sleep; to hibernate  HSK5
寎   寎   bìng    nightmare; start in sleep   
讇   讇   chǎn    to talk in one's sleep; old variant of 諂|谄[chan3]   
痵   痵   jì  nervous start in sleep  
寐   寐   mèi to sleep soundly    
铺   鋪   pù  plank bed; place to sleep; shop; store; (old) relay station 
寤   寤   wù  to awake from sleep 
瞓   瞓   xùn to sleep (Cantonese); Mandarin equivalent: 睡[shui4] 
呓   囈   yì  to talk in one's sleep  
睡觉  睡覺  shuì jiào   to go to bed; to sleep  HSK1
安眠  安眠  ān mián sleep peacefully    
安息  安息  ān xī   to rest; to go to sleep; to rest in peace; (history) Parthia    
打盹  打盹  dǎ dǔn  to doze off; to drop to sleep momentarily; (see also 打瞌睡)   
诶诒  誒詒  ēi yí   to rave; to babble in one's sleep   
分房  分房  fēn fáng    to sleep in separate rooms; distribution of social housing  
酣眠  酣眠  hān mián    to sleep soundly; fast asleep   
酣睡  酣睡  hān shuì    to sleep soundly; to fall into a deep sleep 
合眼  合眼  hé yǎn  to close one's eyes; to get to sleep    
昏睡  昏睡  hūn shuì    sleep; drowse when unconscious; lethargic sleep; lethargy   
假寐  假寐  jiǎ mèi to doze; to take a nap; nodding off to sleep    
交睫  交睫  jiāo jié    to sleep; lit. one's eyelids join   
解酲  解酲  jiě chéng   to sober up; to sleep off the effect of drink   
惊醒  驚醒  jīng xǐng   to rouse; to be woken by sth; to wake with a start; to sleep lightly    
就寝  就寢  jiù qǐn to go to sleep; to go to bed (literary) 
困觉  睏覺  kùn jiào    (dialect) to sleep  
露宿  露宿  lù sù   to sleep outdoors; to spend the night in the open   
乱搞  亂搞  luàn gǎo    to make a mess; to mess with; to be wild; to sleep around; to jump into bed 
梦话  夢話  mèng huà    talking in one's sleep; words spoken during sleep; fig. speech bearing no relation to reality; delusions    
梦寐  夢寐  mèng mèi    to dream; to sleep  
梦魔  夢魔  mèng mó night demon (malign spirit believed to plague people during sleep)  
梦呓  夢囈  mèng yì talking in one's sleep; delirious ravings; nonsense; sheer fantasy  
梦游  夢遊  mèng yóu    sleep walking; fig. dream voyage    
磨牙  磨牙  mó yá   to grind one's teeth (during sleep); pointless arguing  
睡眠  睡眠  shuì mián   sleep   
睡乡  睡鄉  shuì xiāng  sleep; the land of Nod; dreamland   
统铺  統鋪  tǒng pù a common bed (to sleep many)    
香甜  香甜  xiāng tián  fragrant and sweet; sound (sleep)   
歇息  歇息  xiē xi  to have a rest; to stay for the night; to go to bed; to sleep   
休憩  休憩  xiū qì  to rest; to sleep   
瞓觉  瞓覺  xùn jiào    to sleep (Cantonese); Mandarin equivalent: 睡覺|睡觉[shui4 jiao4]   
呓语  囈語  yì yǔ   to talk in one's sleep; crazy talk  
废寝食 廢寢食 fèi qǐn shí to neglect sleep and food   
鬼压床 鬼壓床 guǐ yā chuáng   (coll.) sleep paralysis 
鸡毛店 雞毛店 jī máo diàn a simple inn with only chicken feathers to sleep on 
美容觉 美容覺 měi róng jiào   beauty sleep (before midnight)  
梦行症 夢行症 mèng xíng zhèng somnambulism; sleep walking 
梦游症 夢遊症 mèng yóu zhèng  somnambulism; sleep walking 
撒呓挣 撒囈掙 sā yì zhēng somniloquy; to talk or act in one's sleep; sleep-walking    
睡懒觉 睡懶覺 shuì lǎn jiào   to sleep in 
做厅长 做廳長 zuò tīng zhǎng  (jocularly) to sleep on the couch; to sleep in the living room  
碧草如茵    碧草如茵    bì cǎo rú yīn   green grass like cushion (idiom); green meadow so inviting to sleep on  
不眠不休    不眠不休    bù mián bù xiū  without stopping to sleep or have a rest (idiom)    
抵足而眠    抵足而眠    dǐ zú ér mián   lit. to live and sleep together (idiom); fig. a close friendship    
抵足而卧    抵足而臥    dǐ zú ér wò lit. to live and sleep together (idiom); fig. a close friendship    
废寝忘餐    廢寢忘餐    fèi qǐn wàng cān    to neglect sleep and food (idiom); to skip one's sleep and meals; to be completely wrapped up in one's work 
废寝忘食    廢寢忘食    fèi qǐn wàng shí    to neglect sleep and forget about food (idiom); to skip one's sleep and meals; to be completely wrapped up in one's work    
绿草如茵    綠草如茵    lǜ cǎo rú yīn   green grass like cushion (idiom); green meadow so inviting to sleep on  
目不交睫    目不交睫    mù bù jiāo jié  lit. the eyelashes do not come together (idiom); fig. to not sleep a wink   
起早贪黑    起早貪黑    qǐ zǎo tān hēi  to rise early and sleep late    
食肉寝皮    食肉寢皮    shí ròu qǐn pí  to eat sb's flesh and sleep on their hide (idiom); to swear revenge on sb; implacable hatred; to have sb's guts for garters 
睡回笼觉    睡回籠覺    shuì huí lóng jiào  to go back to sleep (instead of rising up in the morning); to sleep in  
睡眠不足    睡眠不足    shuì mián bù zú lack of sleep; sleep deficit    
睡眠失调    睡眠失調    shuì mián shī tiáo  sleep disorder  
夙兴夜寐    夙興夜寐    sù xīng yè mèi  to rise early and sleep late (idiom); to work hard; to study diligently; to burn the candle at both ends    
我醉欲眠    我醉欲眠    wǒ zuì yù mián  lit. I'm drunk and would like to sleep (idiom); (used to indicate one's sincere and straightforward nature) 
夜不成眠    夜不成眠    yè bù chéng mián    to be unable to sleep at night  
一觉醒来    一覺醒來    yī jiào xǐng lái    to wake up from a sleep 
枕戈寝甲    枕戈寢甲    zhěn gē qǐn jiǎ to sleep on one's armor with spear by the pillow (idiom); ready for battle; determined to kill the enemy; Be prepared!  
吃喝拉撒睡   吃喝拉撒睡   chī hē lā sā shuì   to eat, drink, shit, piss, and sleep; (fig.) the ordinary daily routine 
快速动眼期   快速動眼期   kuài sù dòng yǎn qī REM sleep   
乱搞男女关系  亂搞男女關係  luàn gǎo nán nǚ guān xì to be promiscuous; to sleep around  

我准备了一个独立的 Lua 脚本,用于生成mal-result.tex文件。这样做的好处是我们可以稍后使用xelatexlualatex处理它。这是主文件(mal-chinese.lua):

-- I am mal-chinese.lua file...
-- I take data from MDBG Chinese Reader's export (CEDict) and process them to have tone colors.
-- http://www.mdbg.net/chindict/chindict.php (I used Windows version)
-- sleep, selecting all, Ctrl+Alt+X, sleep.tsv

-- Input data...
pinyincolor={"mgray","mred","morange","mgreen","mblue","mcyan"}
pinyindata={"aeiouü","āēīōūǖ","áéíóúǘ","ǎěǐǒǔǚ","àèìòùǜ"}
--print(pinyindata[1])

-- An inspiration to use UTF-8 coded characters...
--http://stackoverflow.com/questions/13235091/extract-the-first-letter-of-a-utf-8-string-with-lua
--http://stackoverflow.com/questions/15979519/detect-if-last-character-is-not-multibyte-in-lua?lq=1
function tostr(tstring)
  return tstring:gmatch("[%z\1-\127\192-\255][\128-\191]*")
end -- of function tostr 

-- A generated TeX file to be load later...
whereto=io.open("mal-result.tex","w")

-- The main function to process one line...
function processme(mstring)

-- Initializing tables and counters...
local simp={}
local trad={}
local pinyin={}
local colors={}
local rest=""
local c=0 -- a character counter
local w=0 -- a word counter
local t=0 -- a tabular counter
local temp=0 -- a color counter
local diff=0 -- a glyph switcher

--print() -- one empty line
for code in tostr(mstring) do

  if t<2 then w=w+1 end -- every Chinese glyph is a new word
  if t>1 and code==" " then w=w+1 end -- word is after space
  if w==0 then w=1 end -- initializing a word counter

  if code=="\t" then -- \t, \9
    --io.write(" tab\n")
    t=t+1
    w=0
  else

  if t==0 then simp[w]=code end -- Simplified Chinese
  if t==1 then trad[w]=code end -- Traditional Chinese
  if t==2 then -- Pinyin
    --io.write(code)
    if code~=" " then
      pinyin[w]=(pinyin[w] or "")..code -- adding a character
    end -- of if code
    for data=1,#pinyindata do
      for char in tostr(pinyindata[data]) do
          if code==char then
            if data>c then c=data end -- print(code)
          end -- of if code
      end -- of for char
    end -- of for data
  end -- of if code
  end --of t==2

  if t==3 then -- >2 -- Translation, I don't use t==4
    if code~="\t" then
      rest=(rest or "")..code
    end -- of if code
  end -- of t>2

  -- Save the colors of a specific glyph
  if (t==2 and code==" ") or (t==3 and code=="\t") then
    -- print(" "..c.." "..pinyincolor[c])
    temp=temp+1
    colors[temp]=c
    c=0
  end -- of if

end -- for code

-- Start writing TeX file: simplified Chinese
diff=0
whereto:write("{\\mchinese ")
for i=1,#simp do
  whereto:write("{\\color{"..pinyincolor[colors[i]].."}")
  whereto:write(simp[i])
  whereto:write("}")
  if simp[i]~=trad[i] then
    diff=1 -- typeset traditional Chinese
  end -- of if
end -- of for i
whereto:write(" ")

-- If there is a difference among glyphs, write traditional Chinese
if diff==1 then 
whereto:write("{\\color{"..pinyincolor[6].."}[}")
for i=1,#trad do
  whereto:write("{\\color{"..pinyincolor[colors[i]].."}")
  if simp[i]~=trad[i] then
    whereto:write(trad[i]) -- if glyph is different, write it
  else
    whereto:write("-") -- – if glyph is equal, use en dash
  end -- of if
  whereto:write("}")
end -- of for
whereto:write("{\\color{"..pinyincolor[6].."}]}")
end -- of if
whereto:write("} ")

-- Write pinyin after glyphs...
whereto:write("{\\color{"..pinyincolor[6].."}")
for i=1,#pinyin do
  whereto:write(pinyin[i].." ")
end
whereto:write("}\\par\\nopagebreak ")
--for i=1,#simp do
--  print(simp[i],trad[i],simp[i]==trad[i],pinyin[i],colors[i],pinyincolor[colors[i]])
--end
--whereto:write("\n"..rest) -- plain typeset

-- Write translation after glyphs+pinyin...
for code in tostr(rest) do
  --print(code)
  if #code>2 then
    whereto:write("{\\mchinese "..code.."}")
  else
    whereto:write(code)
  end -- of if
end -- of for code
whereto:write("\\par\\medskip\n")

end -- of function processme

-- If we would like to test one or just a couple of lines...
-- processme("酣眠    酣眠  hān mián    to sleep soundly; fast asleep")
-- processme("歇息    歇息  xiē xi  to have a rest; to stay for the night; to go to bed; to sleep   ")
-- processme("碧草如茵  碧草如茵    bì cǎo rú yīn   green grass like cushion (idiom); green meadow so inviting to sleep on  ")
-- processme("废寝忘餐  廢寢忘餐    fèi qǐn wàng cān    to neglect sleep and food (idiom); to skip one's sleep and meals; to be completely wrapped up in one's work ")
-- processme("瞓觉    瞓覺  xùn jiào    to sleep (Cantonese); Mandarin equivalent: 睡覺|睡觉[shui4 jiao4]   ")

-- Process whole file which we got as an export after searching for "sleep"...
local linec=-1
io.write("Processing line ")
for line in io.lines("sleep.tsv") do
  linec=linec+1
  io.write(linec.." ")
  if linec>0 then
    processme(line)
  end -- of if linec
end -- of for line

运行后:

texlua mal-chinese.lua

我们正在获取mal-result.tex经过处理的文件:

lualatex mal-chinese.tex

我们可以使用xelatex以下代码中的第 13 行并取消注释第 14 行(mal-chinese.tex):

% run: xelatex or lualatex mal-chinese.tex
% comment out line 13 or 14, respectively
\documentclass[a4paper]{article}
\pagestyle{empty}
\parindent=0pt
\usepackage{xcolor}
\definecolor{mred}{RGB}{255,0,0}
\definecolor{morange}{RGB}{255,140,0}
\definecolor{mgreen}{RGB}{0,128,0}
\definecolor{mblue}{RGB}{0,0,255}
\definecolor{mgray}{RGB}{68,68,68}
\definecolor{mcyan}{RGB}{57,106,146}
\usepackage{luatextra} % for lualatex
%\usepackage{xltxtra} % for xelatex
\newfontfamily\mchinese{HanaMinA.ttf}
\begin{document}
\input{mal-result.tex}
\end{document}

我附上了最后两个条目的特写和所有四页(共 74 个条目)的预览。我们可以使用音色来标记拼音音调(网络版),步骤都一样,我尝试模拟 Windows 版本的结果。

结果片段

结果预览:第 1+2 页

结果预览:第 3+4 页

相关内容