xindex - 排序本地字符（ÆØÅæøå）

Question 1

我刚刚创建了一个新包，添加了对Unicode 排序算法对于 LuaTeX -卢阿。我已经添加了对几种语言的支持，例如捷克语、德语或挪威语。我们可以使用它来代替Xindex内置排序机制。

尝试以下版本xindex-norsk.lua：

-----------------------------------------------------------------------
--         FILE:  xindex-norsk.lua
--  DESCRIPTION:  configuration file for xindex.lua
-- REQUIREMENTS:  
--       AUTHOR:  Herbert Voß
--     MODIFIED:  Sveinung Heggen (2020-01-02)
--      LICENSE:  LPPL1.3
-----------------------------------------------------------------------

if not modules then modules = { } end modules ['xindex-cfg'] = {
      version = 0.20,
      comment = "configuration to xindex.lua",
       author = "Herbert Voss",
    copyright = "Herbert Voss",
      license = "LPPL 1.3"
}

local ducet = require "lua-uca.lua-uca-ducet"
local collator = require "lua-uca.lua-uca-collator"
local languages = require "lua-uca.lua-uca-languages"
local collator_obj = collator.new(ducet)

local language = "en" -- default language
-- language specified on the command line doesn't seem to be available
-- in the config file, so we just try to find it ourselves
for i, a in ipairs(arg) do
  if a == "-l" or a=="--language" then
    language = arg[i+1]
    break
  end
end

if languages[language] then
  print("[Lua-UCA] Loading language: " .. language)
  collator_obj = languages[language](collator_obj)
end

local upper = unicode.utf8.upper


escape_chars = { -- by default " is the escape char
  {'""', "\\escapedquote",      '\"{}' },
  {'"@', "\\escapedat",         "@"    },
  {'"|', "\\escapedvert",       "|"    },
  {'"!', "\\escapedexcl",       "!"    },
  {'"(', "\\escapedparenleft",  "("   },
  {'")', "\\escapedparenright", ")"  }
}

itemPageDelimiter = ","     -- Hello, 14
compressPages     = true    -- something like 12--15, instead of 12,13,14,15. the |( ... |) syntax is still valid
fCompress         = true    -- 3f -> page 3, 4 and 3ff -> page 3, 4, 5
minCompress       = 3       -- 14--17 or 
numericPage       = true    -- for non-numerical page numbers, like "VI-17"
sublabels         = {"", "---\\,", "--\\,", "-\\,"} -- for the (sub(sub(sub-items  first one is for item
pageNoPrefixDel   = ""     -- a delimiter for page numbers like "VI-17"
indexOpening      = ""     -- commands after \begin{theindex}
rangeSymbol       = "--"
idxnewletter      = "\\textbf"  -- Only valid if -n is not set

folium = { 
  de = {"f.", "ff."},
  en = {"f.", "ff."},
  fr = {"\\,sq","\\,sqq"},
  no = {"\\,f.","\\,ff."},
} 

function UTFCompare(a,b)
  local A = a["SortKey"]
  local B = b["SortKey"]
  return collator_obj:compare_strings(A,B)
end

function SORTendhook(list)
  -- get the headers for letter groups
  for k,v in ipairs(list) do 
    -- the collator:get_lowest_char will return character on the given
    -- position. It will be lowercase and without accents.
    local codepoints = collator_obj:string_to_codepoints(v.Entry)
    local codes = collator_obj:get_lowest_char(codepoints, 1)
    local sort_char = utf8.char(table.unpack(codes))
    v.sortChar = upper(sort_char) -- use unicode.utf8.upper to make the char uppercase
  end
  return list
end

--[[
    Each character's position in this array-like table determines its 'priority'.
    Several characters in the same slot have the same 'priority'.
]]

alphabet_lower = { --   for sorting
    { ' ' },  -- only for internal tests
    { 'a', 'á', 'à', },
    { 'b' },
    { 'c', 'ç' },
    { 'd' },
    { 'e', 'é', 'è', 'ë', 'ê' },
    { 'f' },
    { 'g' },
    { 'h' },
    { 'i', 'í', 'ì', 'î', 'ï' },
    { 'j' },
    { 'k' },
    { 'l' },
    { 'm' },
    { 'n', 'ñ' },
    { 'o', 'ó', 'ò', 'ô' },
    { 'p' },
    { 'q' },
    { 'r' },
    { 's', 'š', 'ß' },
    { 't' },
    { 'u', 'ú', 'ù', 'û' },
    { 'v' },
    { 'w' },
    { 'x' },
    { 'y', 'ý', 'ÿ', 'ü' },
    { 'z', 'ž' },
    { 'æ', 'œ', 'ä' },
    { 'ø', 'ö' },
    { 'å' }
}
alphabet_upper = { -- for sorting
    { ' ' },
    { 'A', 'Á', 'À', 'Â'},
    { 'B' },
    { 'C', 'Ç' },
    { 'D' },
    { 'E', 'È', 'É', 'Ë', 'Ê' },
    { 'F' },
    { 'G' },
    { 'H' },
    { 'I', 'Í', 'Ì', 'Ï', 'Î' },
    { 'J' },
    { 'K' },
    { 'L' },
    { 'M' },
    { 'N', 'Ñ' },
    { 'O', 'Ó', 'Ò', 'Ô' },
    { 'P' },
    { 'Q' },
    { 'R' },
    { 'S', 'Š' },
    { 'T' },
    { 'U', 'Ú', 'Ù', 'Û' },
    { 'V' },
    { 'W' },
    { 'X' },
    { 'Y', 'Ý', 'Ÿ', 'Ü' },
    { 'Z', 'Ž' },
    { 'Æ', 'Œ', 'Ä' },
    { 'Ø', 'Ö' },
    { 'Å' }
}

相关代码如下：

local ducet = require "lua-uca.lua-uca-ducet"
local collator = require "lua-uca.lua-uca-collator"
local languages = require "lua-uca.lua-uca-languages"
local collator_obj = collator.new(ducet)
local language = "en" -- default language
-- language specified on the command line doesn't seem to be available
-- in the config file, so we just try to find it ourselves
for i, a in ipairs(arg) do
  if a == "-l" or a=="--language" then
    language = arg[i+1]
    break
  end
end

if languages[language] then
  print("[Lua-UCA] Loading language: " .. language)
  collator_obj = languages[language](collator_obj)
end

local upper = unicode.utf8.upper

function UTFCompare(a,b)
  local A = a["SortKey"]
  local B = b["SortKey"]
  return collator_obj:compare_strings(A,B)
end

function SORTendhook(list)
  -- get the headers for letter groups
  for k,v in ipairs(list) do 
    -- the collator:get_lowest_char will return character on the given
    -- position. It will be lowercase and without accents.
    local codepoints = collator_obj:string_to_codepoints(v.Entry)
    local codes = collator_obj:get_lowest_char(codepoints, 1)
    local sort_char = utf8.char(table.unpack(codes))
    v.sortChar = upper(sort_char) -- use unicode.utf8.upper to make the char uppercase
  end
  return list
end

它加载所需的库，创建排序对象并应用挪威规则。该UTFSort函数由使用Xindex。我们重新定义它以使用我们的排序函数。我发现排序有效，但有一个问题 - 首字母处理不正确，因此Xindex为大写字母和小写字母生成了单独的标题。这在函数中处理SORTendhook。

结果如下：

Answer

我刚刚创建了一个新包，添加了对Unicode 排序算法对于 LuaTeX -卢阿。我已经添加了对几种语言的支持，例如捷克语、德语或挪威语。我们可以使用它来代替Xindex内置排序机制。

尝试以下版本xindex-norsk.lua：

-----------------------------------------------------------------------
--         FILE:  xindex-norsk.lua
--  DESCRIPTION:  configuration file for xindex.lua
-- REQUIREMENTS:  
--       AUTHOR:  Herbert Voß
--     MODIFIED:  Sveinung Heggen (2020-01-02)
--      LICENSE:  LPPL1.3
-----------------------------------------------------------------------

if not modules then modules = { } end modules ['xindex-cfg'] = {
      version = 0.20,
      comment = "configuration to xindex.lua",
       author = "Herbert Voss",
    copyright = "Herbert Voss",
      license = "LPPL 1.3"
}

local ducet = require "lua-uca.lua-uca-ducet"
local collator = require "lua-uca.lua-uca-collator"
local languages = require "lua-uca.lua-uca-languages"
local collator_obj = collator.new(ducet)

local language = "en" -- default language
-- language specified on the command line doesn't seem to be available
-- in the config file, so we just try to find it ourselves
for i, a in ipairs(arg) do
  if a == "-l" or a=="--language" then
    language = arg[i+1]
    break
  end
end

if languages[language] then
  print("[Lua-UCA] Loading language: " .. language)
  collator_obj = languages[language](collator_obj)
end

local upper = unicode.utf8.upper


escape_chars = { -- by default " is the escape char
  {'""', "\\escapedquote",      '\"{}' },
  {'"@', "\\escapedat",         "@"    },
  {'"|', "\\escapedvert",       "|"    },
  {'"!', "\\escapedexcl",       "!"    },
  {'"(', "\\escapedparenleft",  "("   },
  {'")', "\\escapedparenright", ")"  }
}

itemPageDelimiter = ","     -- Hello, 14
compressPages     = true    -- something like 12--15, instead of 12,13,14,15. the |( ... |) syntax is still valid
fCompress         = true    -- 3f -> page 3, 4 and 3ff -> page 3, 4, 5
minCompress       = 3       -- 14--17 or 
numericPage       = true    -- for non-numerical page numbers, like "VI-17"
sublabels         = {"", "---\\,", "--\\,", "-\\,"} -- for the (sub(sub(sub-items  first one is for item
pageNoPrefixDel   = ""     -- a delimiter for page numbers like "VI-17"
indexOpening      = ""     -- commands after \begin{theindex}
rangeSymbol       = "--"
idxnewletter      = "\\textbf"  -- Only valid if -n is not set

folium = { 
  de = {"f.", "ff."},
  en = {"f.", "ff."},
  fr = {"\\,sq","\\,sqq"},
  no = {"\\,f.","\\,ff."},
} 

function UTFCompare(a,b)
  local A = a["SortKey"]
  local B = b["SortKey"]
  return collator_obj:compare_strings(A,B)
end

function SORTendhook(list)
  -- get the headers for letter groups
  for k,v in ipairs(list) do 
    -- the collator:get_lowest_char will return character on the given
    -- position. It will be lowercase and without accents.
    local codepoints = collator_obj:string_to_codepoints(v.Entry)
    local codes = collator_obj:get_lowest_char(codepoints, 1)
    local sort_char = utf8.char(table.unpack(codes))
    v.sortChar = upper(sort_char) -- use unicode.utf8.upper to make the char uppercase
  end
  return list
end

--[[
    Each character's position in this array-like table determines its 'priority'.
    Several characters in the same slot have the same 'priority'.
]]

alphabet_lower = { --   for sorting
    { ' ' },  -- only for internal tests
    { 'a', 'á', 'à', },
    { 'b' },
    { 'c', 'ç' },
    { 'd' },
    { 'e', 'é', 'è', 'ë', 'ê' },
    { 'f' },
    { 'g' },
    { 'h' },
    { 'i', 'í', 'ì', 'î', 'ï' },
    { 'j' },
    { 'k' },
    { 'l' },
    { 'm' },
    { 'n', 'ñ' },
    { 'o', 'ó', 'ò', 'ô' },
    { 'p' },
    { 'q' },
    { 'r' },
    { 's', 'š', 'ß' },
    { 't' },
    { 'u', 'ú', 'ù', 'û' },
    { 'v' },
    { 'w' },
    { 'x' },
    { 'y', 'ý', 'ÿ', 'ü' },
    { 'z', 'ž' },
    { 'æ', 'œ', 'ä' },
    { 'ø', 'ö' },
    { 'å' }
}
alphabet_upper = { -- for sorting
    { ' ' },
    { 'A', 'Á', 'À', 'Â'},
    { 'B' },
    { 'C', 'Ç' },
    { 'D' },
    { 'E', 'È', 'É', 'Ë', 'Ê' },
    { 'F' },
    { 'G' },
    { 'H' },
    { 'I', 'Í', 'Ì', 'Ï', 'Î' },
    { 'J' },
    { 'K' },
    { 'L' },
    { 'M' },
    { 'N', 'Ñ' },
    { 'O', 'Ó', 'Ò', 'Ô' },
    { 'P' },
    { 'Q' },
    { 'R' },
    { 'S', 'Š' },
    { 'T' },
    { 'U', 'Ú', 'Ù', 'Û' },
    { 'V' },
    { 'W' },
    { 'X' },
    { 'Y', 'Ý', 'Ÿ', 'Ü' },
    { 'Z', 'Ž' },
    { 'Æ', 'Œ', 'Ä' },
    { 'Ø', 'Ö' },
    { 'Å' }
}

相关代码如下：

local ducet = require "lua-uca.lua-uca-ducet"
local collator = require "lua-uca.lua-uca-collator"
local languages = require "lua-uca.lua-uca-languages"
local collator_obj = collator.new(ducet)
local language = "en" -- default language
-- language specified on the command line doesn't seem to be available
-- in the config file, so we just try to find it ourselves
for i, a in ipairs(arg) do
  if a == "-l" or a=="--language" then
    language = arg[i+1]
    break
  end
end

if languages[language] then
  print("[Lua-UCA] Loading language: " .. language)
  collator_obj = languages[language](collator_obj)
end

local upper = unicode.utf8.upper

function UTFCompare(a,b)
  local A = a["SortKey"]
  local B = b["SortKey"]
  return collator_obj:compare_strings(A,B)
end

function SORTendhook(list)
  -- get the headers for letter groups
  for k,v in ipairs(list) do 
    -- the collator:get_lowest_char will return character on the given
    -- position. It will be lowercase and without accents.
    local codepoints = collator_obj:string_to_codepoints(v.Entry)
    local codes = collator_obj:get_lowest_char(codepoints, 1)
    local sort_char = utf8.char(table.unpack(codes))
    v.sortChar = upper(sort_char) -- use unicode.utf8.upper to make the char uppercase
  end
  return list
end

它加载所需的库，创建排序对象并应用挪威规则。该UTFSort函数由使用Xindex。我们重新定义它以使用我们的排序函数。我发现排序有效，但有一个问题 - 首字母处理不正确，因此Xindex为大写字母和小写字母生成了单独的标题。这在函数中处理SORTendhook。

结果如下：

Question 2

使用当前xindex版本（0.23）和

xindex -u -l no -c norsk <file>

你会得到

由 Sveinung 4.6.2020 插入

北欧字符按挪威语规则排序表（含萨米语）：

A   Á   B   C   Č   D   Ð   E   F   G   H   I   J   K   L   M   N   Ŋ   O   P   Q   R   S   Š   T   Ŧ   U   V   W   X   Y   Z   Ž   Æ   Ä   Ø   Ö   Å   Aa  
1   3   5   7   9   11  13  15  17  19  21  23  25  27  29  31  33  35  37  39  41  43  45  47  49  51  53  55  57  59  61  63  65  67  69  71  73  75  75  
a   á   b   c   č   d   đ   e   f   g   h   i   j   k   l   m   n   ŋ   o   p   q   r   s   š   t   ŧ   u   v   w   x   y   z   ž   æ   ä   ø   ö   å   aa  
2   4   6   8   10  12  14  16  18  20  22  24  26  28  30  32  34  36  38  40  42  44  46  48  50  52  54  56  58  60  62  64  66  68  70  72  74  76  76  

A   1
a   2
Á   3
á   4
B   5
b   6
C   7
c   8
Č   9
č   10
D   11
d   12
Ð   13
đ   14
E   15
e   16
F   17
f   18
G   19
g   20
H   21
h   22
I   23
i   24
J   25
j   26
K   27
k   28
L   29
l   30
M   31
m   32
N   33
n   34
Ŋ   35
ŋ   36
O   37
o   38
P   39
p   40
Q   41
q   42
R   43
r   44
S   45
s   46
Š   47
š   48
T   49
t   50
Ŧ   51
ŧ   52
U   53
u   54
V   55
v   56
W   57
w   58
X   59
x   60
Y   61
y   62
Z   63
z   64
Ž   65
ž   66
Æ   67
æ   68
Ä   69
ä   70
Ø   71
ø   72
Ö   73
ö   74
Å   75
Aa  75
å   76
aa  76

Answer

使用当前xindex版本（0.23）和

xindex -u -l no -c norsk <file>

你会得到

由 Sveinung 4.6.2020 插入

北欧字符按挪威语规则排序表（含萨米语）：

A   Á   B   C   Č   D   Ð   E   F   G   H   I   J   K   L   M   N   Ŋ   O   P   Q   R   S   Š   T   Ŧ   U   V   W   X   Y   Z   Ž   Æ   Ä   Ø   Ö   Å   Aa  
1   3   5   7   9   11  13  15  17  19  21  23  25  27  29  31  33  35  37  39  41  43  45  47  49  51  53  55  57  59  61  63  65  67  69  71  73  75  75  
a   á   b   c   č   d   đ   e   f   g   h   i   j   k   l   m   n   ŋ   o   p   q   r   s   š   t   ŧ   u   v   w   x   y   z   ž   æ   ä   ø   ö   å   aa  
2   4   6   8   10  12  14  16  18  20  22  24  26  28  30  32  34  36  38  40  42  44  46  48  50  52  54  56  58  60  62  64  66  68  70  72  74  76  76  

A   1
a   2
Á   3
á   4
B   5
b   6
C   7
c   8
Č   9
č   10
D   11
d   12
Ð   13
đ   14
E   15
e   16
F   17
f   18
G   19
g   20
H   21
h   22
I   23
i   24
J   25
j   26
K   27
k   28
L   29
l   30
M   31
m   32
N   33
n   34
Ŋ   35
ŋ   36
O   37
o   38
P   39
p   40
Q   41
q   42
R   43
r   44
S   45
s   46
Š   47
š   48
T   49
t   50
Ŧ   51
ŧ   52
U   53
u   54
V   55
v   56
W   57
w   58
X   59
x   60
Y   61
y   62
Z   63
z   64
Ž   65
ž   66
Æ   67
æ   68
Ä   69
ä   70
Ø   71
ø   72
Ö   73
ö   74
Å   75
Aa  75
å   76
aa  76

xindex - 排序本地字符（ÆØÅæøå）

答案1

答案2

相关内容