我已经安装了 MDBG Word 词典,它使用CC-CE词典数据库。我已经安装了Windows 版本但我们可以试试字典在线的,我使用 wordsleep
作为测试用例。我们如何在 TeX 中排版搜索结果?
有两个主要问题:
他们在处理汉字时使用音色。
如果繁体中文和简体中文中的字形相同,则使用短划线代替。
我附上了两张截图,第一张是从 Windows 版本截取的,第二张是从 Internet 版本截取的。
答案1
我对中文排版还很陌生,但是这个念头很诱人,所以我还是尝试了一下。
我已经安装了 Windows 版本的MDBG中文阅读器我可以在其中选择一个、多个或所有项目并将它们导出到文本文件(Ctrl+Alt+X),列由表格分隔。我猜这会类似于网页版的处理结果,我还没有尝试过。
我也下载并安装了花园字体以供以后使用(我使用HanaMinA.ttf
它们来实际排版中文字符,并使用拉丁现代字体来排版所有其他文本部分)。
我正在逐行解析输入文件,并根据声调对汉字进行换行。我使用了五种颜色,并使用了以下数据:词典得到全部拼音声调标记。
这是我的测试用例文件(sleep.tsv
):
simplified traditional pinyin definition hsk
觉 覺 jiào a nap; a sleep; CL:場|场[chang2] HSK1
睡 睡 shuì to sleep; to lie down HSK1
香 香 xiāng fragrant; sweet smelling; aromatic; savory or appetizing; (to eat) with relish; (of sleep) sound; perfume or spice; joss or incense stick; CL:根[gen1] HSK3
眠 眠 mián to sleep; to hibernate HSK5
寎 寎 bìng nightmare; start in sleep
讇 讇 chǎn to talk in one's sleep; old variant of 諂|谄[chan3]
痵 痵 jì nervous start in sleep
寐 寐 mèi to sleep soundly
铺 鋪 pù plank bed; place to sleep; shop; store; (old) relay station
寤 寤 wù to awake from sleep
瞓 瞓 xùn to sleep (Cantonese); Mandarin equivalent: 睡[shui4]
呓 囈 yì to talk in one's sleep
睡觉 睡覺 shuì jiào to go to bed; to sleep HSK1
安眠 安眠 ān mián sleep peacefully
安息 安息 ān xī to rest; to go to sleep; to rest in peace; (history) Parthia
打盹 打盹 dǎ dǔn to doze off; to drop to sleep momentarily; (see also 打瞌睡)
诶诒 誒詒 ēi yí to rave; to babble in one's sleep
分房 分房 fēn fáng to sleep in separate rooms; distribution of social housing
酣眠 酣眠 hān mián to sleep soundly; fast asleep
酣睡 酣睡 hān shuì to sleep soundly; to fall into a deep sleep
合眼 合眼 hé yǎn to close one's eyes; to get to sleep
昏睡 昏睡 hūn shuì sleep; drowse when unconscious; lethargic sleep; lethargy
假寐 假寐 jiǎ mèi to doze; to take a nap; nodding off to sleep
交睫 交睫 jiāo jié to sleep; lit. one's eyelids join
解酲 解酲 jiě chéng to sober up; to sleep off the effect of drink
惊醒 驚醒 jīng xǐng to rouse; to be woken by sth; to wake with a start; to sleep lightly
就寝 就寢 jiù qǐn to go to sleep; to go to bed (literary)
困觉 睏覺 kùn jiào (dialect) to sleep
露宿 露宿 lù sù to sleep outdoors; to spend the night in the open
乱搞 亂搞 luàn gǎo to make a mess; to mess with; to be wild; to sleep around; to jump into bed
梦话 夢話 mèng huà talking in one's sleep; words spoken during sleep; fig. speech bearing no relation to reality; delusions
梦寐 夢寐 mèng mèi to dream; to sleep
梦魔 夢魔 mèng mó night demon (malign spirit believed to plague people during sleep)
梦呓 夢囈 mèng yì talking in one's sleep; delirious ravings; nonsense; sheer fantasy
梦游 夢遊 mèng yóu sleep walking; fig. dream voyage
磨牙 磨牙 mó yá to grind one's teeth (during sleep); pointless arguing
睡眠 睡眠 shuì mián sleep
睡乡 睡鄉 shuì xiāng sleep; the land of Nod; dreamland
统铺 統鋪 tǒng pù a common bed (to sleep many)
香甜 香甜 xiāng tián fragrant and sweet; sound (sleep)
歇息 歇息 xiē xi to have a rest; to stay for the night; to go to bed; to sleep
休憩 休憩 xiū qì to rest; to sleep
瞓觉 瞓覺 xùn jiào to sleep (Cantonese); Mandarin equivalent: 睡覺|睡觉[shui4 jiao4]
呓语 囈語 yì yǔ to talk in one's sleep; crazy talk
废寝食 廢寢食 fèi qǐn shí to neglect sleep and food
鬼压床 鬼壓床 guǐ yā chuáng (coll.) sleep paralysis
鸡毛店 雞毛店 jī máo diàn a simple inn with only chicken feathers to sleep on
美容觉 美容覺 měi róng jiào beauty sleep (before midnight)
梦行症 夢行症 mèng xíng zhèng somnambulism; sleep walking
梦游症 夢遊症 mèng yóu zhèng somnambulism; sleep walking
撒呓挣 撒囈掙 sā yì zhēng somniloquy; to talk or act in one's sleep; sleep-walking
睡懒觉 睡懶覺 shuì lǎn jiào to sleep in
做厅长 做廳長 zuò tīng zhǎng (jocularly) to sleep on the couch; to sleep in the living room
碧草如茵 碧草如茵 bì cǎo rú yīn green grass like cushion (idiom); green meadow so inviting to sleep on
不眠不休 不眠不休 bù mián bù xiū without stopping to sleep or have a rest (idiom)
抵足而眠 抵足而眠 dǐ zú ér mián lit. to live and sleep together (idiom); fig. a close friendship
抵足而卧 抵足而臥 dǐ zú ér wò lit. to live and sleep together (idiom); fig. a close friendship
废寝忘餐 廢寢忘餐 fèi qǐn wàng cān to neglect sleep and food (idiom); to skip one's sleep and meals; to be completely wrapped up in one's work
废寝忘食 廢寢忘食 fèi qǐn wàng shí to neglect sleep and forget about food (idiom); to skip one's sleep and meals; to be completely wrapped up in one's work
绿草如茵 綠草如茵 lǜ cǎo rú yīn green grass like cushion (idiom); green meadow so inviting to sleep on
目不交睫 目不交睫 mù bù jiāo jié lit. the eyelashes do not come together (idiom); fig. to not sleep a wink
起早贪黑 起早貪黑 qǐ zǎo tān hēi to rise early and sleep late
食肉寝皮 食肉寢皮 shí ròu qǐn pí to eat sb's flesh and sleep on their hide (idiom); to swear revenge on sb; implacable hatred; to have sb's guts for garters
睡回笼觉 睡回籠覺 shuì huí lóng jiào to go back to sleep (instead of rising up in the morning); to sleep in
睡眠不足 睡眠不足 shuì mián bù zú lack of sleep; sleep deficit
睡眠失调 睡眠失調 shuì mián shī tiáo sleep disorder
夙兴夜寐 夙興夜寐 sù xīng yè mèi to rise early and sleep late (idiom); to work hard; to study diligently; to burn the candle at both ends
我醉欲眠 我醉欲眠 wǒ zuì yù mián lit. I'm drunk and would like to sleep (idiom); (used to indicate one's sincere and straightforward nature)
夜不成眠 夜不成眠 yè bù chéng mián to be unable to sleep at night
一觉醒来 一覺醒來 yī jiào xǐng lái to wake up from a sleep
枕戈寝甲 枕戈寢甲 zhěn gē qǐn jiǎ to sleep on one's armor with spear by the pillow (idiom); ready for battle; determined to kill the enemy; Be prepared!
吃喝拉撒睡 吃喝拉撒睡 chī hē lā sā shuì to eat, drink, shit, piss, and sleep; (fig.) the ordinary daily routine
快速动眼期 快速動眼期 kuài sù dòng yǎn qī REM sleep
乱搞男女关系 亂搞男女關係 luàn gǎo nán nǚ guān xì to be promiscuous; to sleep around
我准备了一个独立的 Lua 脚本,用于生成mal-result.tex
文件。这样做的好处是我们可以稍后使用xelatex
和lualatex
处理它。这是主文件(mal-chinese.lua
):
-- I am mal-chinese.lua file...
-- I take data from MDBG Chinese Reader's export (CEDict) and process them to have tone colors.
-- http://www.mdbg.net/chindict/chindict.php (I used Windows version)
-- sleep, selecting all, Ctrl+Alt+X, sleep.tsv
-- Input data...
pinyincolor={"mgray","mred","morange","mgreen","mblue","mcyan"}
pinyindata={"aeiouü","āēīōūǖ","áéíóúǘ","ǎěǐǒǔǚ","àèìòùǜ"}
--print(pinyindata[1])
-- An inspiration to use UTF-8 coded characters...
--http://stackoverflow.com/questions/13235091/extract-the-first-letter-of-a-utf-8-string-with-lua
--http://stackoverflow.com/questions/15979519/detect-if-last-character-is-not-multibyte-in-lua?lq=1
function tostr(tstring)
return tstring:gmatch("[%z\1-\127\192-\255][\128-\191]*")
end -- of function tostr
-- A generated TeX file to be load later...
whereto=io.open("mal-result.tex","w")
-- The main function to process one line...
function processme(mstring)
-- Initializing tables and counters...
local simp={}
local trad={}
local pinyin={}
local colors={}
local rest=""
local c=0 -- a character counter
local w=0 -- a word counter
local t=0 -- a tabular counter
local temp=0 -- a color counter
local diff=0 -- a glyph switcher
--print() -- one empty line
for code in tostr(mstring) do
if t<2 then w=w+1 end -- every Chinese glyph is a new word
if t>1 and code==" " then w=w+1 end -- word is after space
if w==0 then w=1 end -- initializing a word counter
if code=="\t" then -- \t, \9
--io.write(" tab\n")
t=t+1
w=0
else
if t==0 then simp[w]=code end -- Simplified Chinese
if t==1 then trad[w]=code end -- Traditional Chinese
if t==2 then -- Pinyin
--io.write(code)
if code~=" " then
pinyin[w]=(pinyin[w] or "")..code -- adding a character
end -- of if code
for data=1,#pinyindata do
for char in tostr(pinyindata[data]) do
if code==char then
if data>c then c=data end -- print(code)
end -- of if code
end -- of for char
end -- of for data
end -- of if code
end --of t==2
if t==3 then -- >2 -- Translation, I don't use t==4
if code~="\t" then
rest=(rest or "")..code
end -- of if code
end -- of t>2
-- Save the colors of a specific glyph
if (t==2 and code==" ") or (t==3 and code=="\t") then
-- print(" "..c.." "..pinyincolor[c])
temp=temp+1
colors[temp]=c
c=0
end -- of if
end -- for code
-- Start writing TeX file: simplified Chinese
diff=0
whereto:write("{\\mchinese ")
for i=1,#simp do
whereto:write("{\\color{"..pinyincolor[colors[i]].."}")
whereto:write(simp[i])
whereto:write("}")
if simp[i]~=trad[i] then
diff=1 -- typeset traditional Chinese
end -- of if
end -- of for i
whereto:write(" ")
-- If there is a difference among glyphs, write traditional Chinese
if diff==1 then
whereto:write("{\\color{"..pinyincolor[6].."}[}")
for i=1,#trad do
whereto:write("{\\color{"..pinyincolor[colors[i]].."}")
if simp[i]~=trad[i] then
whereto:write(trad[i]) -- if glyph is different, write it
else
whereto:write("-") -- – if glyph is equal, use en dash
end -- of if
whereto:write("}")
end -- of for
whereto:write("{\\color{"..pinyincolor[6].."}]}")
end -- of if
whereto:write("} ")
-- Write pinyin after glyphs...
whereto:write("{\\color{"..pinyincolor[6].."}")
for i=1,#pinyin do
whereto:write(pinyin[i].." ")
end
whereto:write("}\\par\\nopagebreak ")
--for i=1,#simp do
-- print(simp[i],trad[i],simp[i]==trad[i],pinyin[i],colors[i],pinyincolor[colors[i]])
--end
--whereto:write("\n"..rest) -- plain typeset
-- Write translation after glyphs+pinyin...
for code in tostr(rest) do
--print(code)
if #code>2 then
whereto:write("{\\mchinese "..code.."}")
else
whereto:write(code)
end -- of if
end -- of for code
whereto:write("\\par\\medskip\n")
end -- of function processme
-- If we would like to test one or just a couple of lines...
-- processme("酣眠 酣眠 hān mián to sleep soundly; fast asleep")
-- processme("歇息 歇息 xiē xi to have a rest; to stay for the night; to go to bed; to sleep ")
-- processme("碧草如茵 碧草如茵 bì cǎo rú yīn green grass like cushion (idiom); green meadow so inviting to sleep on ")
-- processme("废寝忘餐 廢寢忘餐 fèi qǐn wàng cān to neglect sleep and food (idiom); to skip one's sleep and meals; to be completely wrapped up in one's work ")
-- processme("瞓觉 瞓覺 xùn jiào to sleep (Cantonese); Mandarin equivalent: 睡覺|睡觉[shui4 jiao4] ")
-- Process whole file which we got as an export after searching for "sleep"...
local linec=-1
io.write("Processing line ")
for line in io.lines("sleep.tsv") do
linec=linec+1
io.write(linec.." ")
if linec>0 then
processme(line)
end -- of if linec
end -- of for line
运行后:
texlua mal-chinese.lua
我们正在获取mal-result.tex
经过处理的文件:
lualatex mal-chinese.tex
我们可以使用xelatex
以下代码中的第 13 行并取消注释第 14 行(mal-chinese.tex
):
% run: xelatex or lualatex mal-chinese.tex
% comment out line 13 or 14, respectively
\documentclass[a4paper]{article}
\pagestyle{empty}
\parindent=0pt
\usepackage{xcolor}
\definecolor{mred}{RGB}{255,0,0}
\definecolor{morange}{RGB}{255,140,0}
\definecolor{mgreen}{RGB}{0,128,0}
\definecolor{mblue}{RGB}{0,0,255}
\definecolor{mgray}{RGB}{68,68,68}
\definecolor{mcyan}{RGB}{57,106,146}
\usepackage{luatextra} % for lualatex
%\usepackage{xltxtra} % for xelatex
\newfontfamily\mchinese{HanaMinA.ttf}
\begin{document}
\input{mal-result.tex}
\end{document}
我附上了最后两个条目的特写和所有四页(共 74 个条目)的预览。我们可以使用音色来标记拼音音调(网络版),步骤都一样,我尝试模拟 Windows 版本的结果。