如何使用粗体单词作为键并将列表附加到页面、部分或章节的末尾来创建紧凑的词汇表/词典?

如何使用粗体单词作为键并将列表附加到页面、部分或章节的末尾来创建紧凑的词汇表/词典?

以下是带有单词含义的压缩词汇表的示例。首先是文本,然后是词汇表:

的组合光伏技术、太阳能热技术以及反射或折射太阳能聚光器自 20 世纪 70 年代末和 80 年代初以来,这种方法一直是开发人员和研究人员非常青睐的选择。其结果是所谓的浓缩光伏热能系统是聚光光伏系统和光伏热能系统的混合组合。

光伏:此处对该词的定义,太阳能聚光器:这里有另一个定义,光伏热能:还有另一个

这种格式有助于作者给语言学习者写作,以便文本具有正常大小,定义紧凑,并在需要时可供参考。

该特征可以描述如下:

  1. 粗体单词表示您可以在词汇表中找到其含义。
  2. 词汇表附在章节/节的末尾,紧凑,没有太多空格,字体较小。另一种选择是将其作为脚注,但根据文本格式和有多少单词有定义,脚注可能会占用太多空间。例如,语言学习者的书籍通常很小,就像口袋书一样,这样学习者就可以“结束”一本书并保持阅读下一本书的动力。我想无论如何,将其附加在章节末尾的代码会更容易。
  3. 额外(不是那么重要):如果之前已经写过单词定义,它就不会在接下来的页面或章节中再次出现。

是否可以使用内联命令(如\wd{word}{definition}文本中的命令)来实现这一点?它会一直被使用,例如,平均每 10 个单词就会有一个附加的定义。

\wd{ 的组合光伏}{def of this word} 技术、太阳能热技术以及反射或折射 \ws{太阳能聚光器自 20 世纪 70 年代末和 80 年代初以来, }{def of this word} 一直是开发人员和研究人员非常青睐的选择。其结果就是所谓的浓缩 \wd{光伏热能{def of this word} 系统是聚光光伏系统和光伏热能系统的混合组合。

答案1

此解决方案的许多方面都具有高度实验性。但我认为这应该为您提供了一个实现所需目标的框架。请确保完成以下步骤以准备文档编译。此解决方案仅适用于 LuaTeX。

  1. 准备好你的字典并将其保存为 JSON 格式。为了演示,我找到了一堆免费提供的字典这里我选择下载福多克因为它的文本看起来更容易解析。我使用下面的 Python 脚本将其转换为 JSON 格式。
import json

_latex_special_chars = {
    '&': r'\&',
    '%': r'\%',
    '$': r'\$',
    '#': r'\#',
    '_': r'\_',
    '{': r'\{',
    '}': r'\}',
    '~': r'\textasciitilde{}',
    '^': r'\^{}',
    '\\': r'\textbackslash{}',
    '\n': '\\newline ',
    '-': r'{-}',
    '\xA0': '~',  # Non-breaking space
    '[': r'{[}',
    ']': r'{]}',
}

def escape_latex(s):
    return ''.join(_latex_special_chars.get(c, c) for c in str(s))

stop_words = ['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your', 'yours', 'yourself',
              'yourselves', 'he', 'him', 'his', 'himself', 'she', 'her', 'hers', 'herself', 'it', 'its', 'itself',
              'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', 'these',
              'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do',
              'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while',
              'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before',
              'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'again',
              'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each',
              'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so', 'than',
              'too', 'very', 's', 't', 'can', 'will', 'just', 'don', 'should', 'now']

the_dict = dict()
cur_item = ''
cur_item_def = []


def flush_cur_item():
    global cur_item
    if len(cur_item) > 0:
        if cur_item.lower() not in stop_words:
            defn_text = '\n'.join(cur_item_def)
            the_dict[cur_item] = str(escape_latex(defn_text))
        cur_item = ''
        cur_item_def.clear()


with open('foldoc.txt', 'r')  as infile:
    for line in infile:
        line = line.rstrip()
        if line.startswith('\t'):
            cur_item_def.append(line.strip())
        elif len(line.strip()) == 0:
            cur_item_def.append(line.strip())
        else:
            flush_cur_item()
            cur_item = line

flush_cur_item()

with open('fodol.json', 'w') as outfile:
    json.dump(the_dict, outfile, indent=2)

  1. 下载json.luahttps://github.com/rxi/json.lua) 并将其放入您的工作目录中。这允许 Lua 解析 JSON 文件。
  2. 编译以下源文件
\documentclass{article}
\usepackage{luacode}
\usepackage{fontspec}
\usepackage{expl3}

\setmainfont{DejaVu Serif}

% get the font id of bfseries font
\ExplSyntaxOn
\group_begin:
\bfseries
\directlua{
    bf_font_id = font.current()
}
\group_end:
\ExplSyntaxOff

\begin{luacode*}
-- load the dictionary
-- this step requires JSON.lua library
-- https://github.com/rxi/json.lua
local json = require"json"
local infile = io.open("fodol.json", "r")
dictionary = json.decode(infile:read("a+"))
infile:close()

inspect = require"inspect"

do_glossary = true
glossary_word = {}
glossary_defn = {}

local glyph_id = node.id("glyph")
local glue_id = node.id("glue")
processed_words = {}

function is_letter(glyph)
    local chr = glyph.char
    return (chr >= 65 and chr <= 90) or (chr >= 97 and chr <= 122)
end

function glyph_table_to_str(tbl)
    local res = ""
    for _, item in ipairs(tbl) do
        res = res .. utf8.char(item.char)
    end
    return res
end

function process_glyphs(glyphs)
    if #glyphs == 0 then
        return
    end
    local word = glyph_table_to_str(glyphs)
    if processed_words[word] ~= nil then
        return
    end
    processed_words[word] = 1
    
    -- try original case and lowercase
    local defn = dictionary[word] or dictionary[string.lower(word)]
    if defn ~= nil then
        table.insert(glossary_word, word)
        table.insert(glossary_defn, defn)
        for _ ,item in ipairs(glyphs) do
            item.font = bf_font_id
        end
    end
    texio.write_nl(tostring(glossary_word))
end

function show_glossary()
    texio.write_nl(inspect(glossary_defn))
    if #glossary_word > 0 then
        tex.print([[\begin{itemize}]])
        for ind, item in ipairs(glossary_word) do
            tex.print(string.format([[\item \textbf{%s}: %s]], item, glossary_defn[ind]))
        end
        tex.print([[\end{itemize}]])
        glossary_word = {}
        glossary_defn = {}
    end
end

function pre_callback(n)
    if not do_glossary then
        return n
    end
    local prev_glyph = {}
    local word = ""
    for n1 in node.traverse(n) do
        if n1.id == glyph_id and is_letter(n1) then
            table.insert(prev_glyph, n1)
        elseif n1.id == glue_id then
            process_glyphs(prev_glyph)
            prev_glyph = {}
        end
    end
    process_glyphs(prev_glyph)
    return n
end

luatexbase.add_to_callback("pre_linebreak_filter", pre_callback, "pre_callback")
\end{luacode*}

\newcommand{\EnableGlossary}{
    \directlua{do_glossary=true}
}
\newcommand{\DisableGlossary}{
    \directlua{do_glossary=false}
}
\newcommand{\PrintGlossary}{
    \subsection{Glossary}
    \begingroup
    \small
    \DisableGlossary
    \directlua{show_glossary()}
    \EnableGlossary
    \endgroup
}

\begin{document}
% https://arstechnica.com/science/2020/12/google-develops-an-ai-that-can-learn-both-chess-and-pac-man/
\section{First section}
The first major conquest of artificial intelligence was chess. The game has a dizzying number of possible combinations, but it was relatively tractable because it was structured by a set of clear rules. An algorithm could always have perfect knowledge of the state of the game and know every possible move that both it and its opponent could make. The state of the game could be evaluated just by looking at the board.

% always call this command on a new paragraph
\PrintGlossary

\section{Second section}
But many other games aren't that simple. If you take something like Pac-Man, then figuring out the ideal move would involve considering the shape of the maze, the location of the ghosts, the location of any additional areas to clear, the availability of power-ups, etc., and the best plan can end up in disaster if Blinky or Clyde makes an unexpected move. We've developed AIs that can tackle these games, too, but they have had to take a very different approach to the ones that conquered chess and Go.


% always call this command on a new paragraph
\PrintGlossary
\end{document}

结果如下所示。

在此处输入图片描述

存在的问题:

  • 很难控制在何处应用词汇表以及在何处不应用。目前,我只提供\EnableGlossary\DisableGlossary命令,但为了避免在标题、说明等中使用词汇表而不断切换它们似乎很繁琐。
  • 现在,词汇表必须用\PrintGlossary命令手动打印。在 LaTeX 中检测某个部分的结尾似乎很困难。请参阅知道它位于某个部分末尾的宏?了解更多想法。
  • 我忘了实现内联词汇表功能。但有了这个基础架构,它应该非常简单:
    \newcommand{\inlg}[2]{%
        \directlua{
            table.insert(glossary_word, "\luaescapestring{#1}")
            table.insert(glossary_defn, "\luaescapestring{#2}")
        }%
        \textbf{#1}%
    }
    
    但由于LuaTeX的机制,这些内联词汇表项会出现无序的情况。

答案2

简而言之......你描述了确切地包的功能喜欢glossary(点击这里了解更多信息)。但我知道直接进入这个领域会很困难,所以请先查看下面的代码,尝试添加更多你想自己包含的功能(因为边做边学^_-)并享受乐趣。

\documentclass{article}
\usepackage[acronyms]{glossaries}

% code to define your entries
\newacronym[description={definition of this word here}]{PV}{PV}{photovoltaic}
\newacronym[description={definition of this word here}]{SC}{SC}{solar concentrators}
\newacronym[description={and another here}]{CPVT}{CPVT}{photovoltaic thermal}

% code to format the entires in a bold fashion
\renewcommand{\glstextformat}[1]{\textbf{#1}}
\renewcommand*{\glsentryfmt}{%
    \glsgenentryfmt
    \ifglsused{\glslabel}{}{\space(\glsentrysymbol{\glslabel})}%
}

% code to control the size of entries within the glossary
\renewcommand{\glossarypreamble}{\small}
    
% code to see the acronym description as footnote instead of inline
% \setacronymstyle{footnote-sc-desc}

\makeglossaries
\begin{document}

\section{First}
Use of \gls{PV} and \gls{SC} 

\section{Second}
Use of \gls{CPVT} and again \gls{SC} 


\printacronyms
\end{document}

在此处输入图片描述

相关内容