如何从 CC-EDICT 词典中提取中文单词的定义？

Question

我为 LuaLaTeX 创建了一个简单的包ccedict.sty：：

\ProvidesPackage{ccedict}
\RequirePackage{luacode}
\RequirePackage{luatexja-fontspec}
\setmainjfont{FandolSong}

\begin{luacode*}
local dict = {}

function load_ccedict(filename)
  for line in io.lines(filename) do
  local traditional, simplified, spelling, description = line:match("^(.+)%s+(.+)%s+%[(.-)%]%s*(.+)")
    if traditional then
      -- insert new record for the current header
      local rec = dict[simplified] or {}
      table.insert(rec, {spelling = spelling, description = description, simplified = simplified, traditional = traditional})
      dict[simplified] = rec
    end
  end
end

function ccedict_get_term(term)
  local rec = dict[term] or {{spelling="cannot find term", description = "", simplified = term, traditional=""}}
  local new = {}
  local traditional, simplified 
  for _, v in ipairs(rec) do
  -- this will be printed in the footnote
  new[#new+1] = " [" .. v.spelling  .. "] " .. v.description 
  -- save the simplified and traditional terms
  traditional, simplified = v.traditional, v.simplified
  end
  -- you may want to add some separator betweend traditional and simplified terms
  -- the multiple terms will be separated using semicolon
  return traditional .. ": " .. simplified .. " " ..table.concat(new, "; ")
end

\end{luacode*}

\newcommand\loadccedict[1]{\directlua{load_ccedict("\detokenize{#1}")}}

\loadccedict{cedict_ts.u8}

\newcommand\chinese[1]{#1\footnote{\directlua{tex.sprint(ccedict_get_term("\detokenize{#1}"))}}}


\endinput

有两个 Lua 函数 -load_ccedict加载字典并制作查询表，并ccedict_get_term返回术语信息。该命令\chinese使用此信息打印脚注。

它使用luatexja-fontspec带有FandolSong字体的软件包来获得对中文的开箱即用支持。您可能想使用其他字体，因为我发现它不支持所有繁体中文字符。

以下是一个示例文档：

\documentclass{article}
\usepackage{ccedict}
\begin{document}

Hello \chinese{炒面}, \chinese{得}
\end{document}

结果是：

（您可以在第一个脚注中看到缺失的字符。我对中文字体一无所知，所以我无法判断哪一个对繁体中文的支持更好）

Answer 1