根据关键字文件自动建立索引

Question

正如其他人所说，这不能产生好的索引，因为并非每个术语的用法都很重要，这无法找到概念，不会有交叉引用，无法处理同义词，同音词等。

但如果你真的想要，lua 中有一个简单的脚本autoindex.lua：

#!/usr/bin/env texlua
local indexterms = arg[1]
local file = io.open(indexterms,"r")
if not file then 
    print("Cannot load index terms: ".. inputfile)
    os.exit()
end
local terms = {}
for line in file:lines() do
    terms[#terms + 1] = line
end
file:close()
local text = io.read("*all")
local page = 1 
local words = {}
-- Process pages
for t in text:gmatch("[^\f]*") do
    --tokenize words
    --add more characters which can't be part of words
    for x in t:gmatch("([^%s%.,!%?%(%)%-i@%$]+)") do
        local x = string.lower(x) -- normalize strings. note that this doesn't handle unicode
        local w = words[x] or {}
        w[page] = true
        words[x] = w
    end
    page = page + 1
end
for _, term in pairs(terms) do
    local match = words[term] or {}
    for page,_ in pairs(match) do
        print('\indexentry{'..term..'}{'..page..'}')
    end
end

您必须首先pdf使用实用程序将文件转换为文本pdftotext：

pdftotext filename.pdf outputfile.txt

这将保留分页符。然后像这样调用此脚本：

texlua autoindex.lua filewithterms < outputfile.txt > indexfile.idx

makeindex这会将标准格式的条目写入indexfile.idx：

indexentry{hello}{13}
indexentry{hello}{9}
indexentry{hello}{7}
indexentry{world}{7}
indexentry{world}{3}
indexentry{world}{13}
indexentry{world}{9}
indexentry{world}{5}

您可以使用makeindex或xindy然后创建索引。

Answer 1