Texindy 对冰岛语进行排序

Texindy 对冰岛语进行排序

我使用命令texindy -L icelandic -M lang/icelandic/utf8 dict_main.idx创建了照片名称、作者和许可证的列表。但排序不正确(例如冰岛字母表以字母顺序 þ æ ö 结尾)

梅威瑟:

\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage[]{makeidx}
\usepackage[icelandic, czech]{babel}
\makeindex
\begin{document}
Hello
\index{Þari - Franz Eugen Kohler, Public Domain}
\index{Þistill - ŠARŽÍK František, COPYRIGHT/PD}
\index{Önd - Karney, Lee, PD}
\index{Æðarkóngur - Whitehouse, Laura L., PD}
\index{Avókadó - Forest \& Kim [[p:2684;Starr]], CC-BY}
\index{Auðnutittlingur - Arnstein Rønning, CC BY-SA 3.0}
\index{Asni - Zicha Ondřej, COPYRIGHT/CC-BY-NC}
\index{Á - hvalur.org, CC Unported Licence}
\index{Álft - Bukovský Jiří, COPYRIGHT/CC-BY-NC}
\index{Álka - Jack Spellingbacon from Scotland, CC BY-SA 3.0}

\printindex
\end{document}

运行此命令:

  pdflatex test.tex
  texlua utftexindy.lua -L icelandic test.idx
  pdflatex test.tex

代码细节在这个答案中

答案1

我尝试使用 创建索引lualatex。我关闭了几个包,特别是babel因为有一些字符的硬编码声明为简写,并且它无法很好地处理 utf-8 编码字符。您可能需要这些包进行排版,但在某些情况下,我们可能会省略它们只是为了生成索引。

如果我们检查制定规则字母表(它是用 Perl 编写的),我们在文件中发现以下行/alphabets/icelandic/utf8.pl.in

['A', ['a','A'],['á','Á']@u{,['ǫ́','Ǫ́']}],

据我了解,该@u{}部分使某些排序阶段的字母相等。它包含在其他几行中:

['E', ['e','E']@u{,['ę','Ę']},['ë','Ë'],['é','É']],
['Æ', ['æ','Æ']@u{,['ǽ','Ǽ'],['ę́','Ę́'],['ǿ','Ǿ']},['œ','Œ'],['ä','Ä']],
['Ö', ['ö','Ö'],['ø','Ø']@u{,['ǫ','Ǫ']}],

我们可能期望相同的行为。因此AÁ以及ÆǼ在某个时候是相等的。如果这不是理想的排序,xindy 社区可能正在修复它。我不确定,但它是代码中经常alphabets/general/utf8.pl.in忽略用于排序的变音符号的部分。

不过,我相信这里有一个小错误/拼写错误。索引中的单词组通常使用大写字母,但是存在:

我认为这是错误的:['ð', ['ð','Ð']],
这应该是正确的形式:['Ð', ['ð','Ð']],

我们也可以在下面的例子中发现,有一个小写字母 eth,而不是大写字母。我附上了 TeX 代码和第 2 页的预览。我们运行以下几行:

lualatex mal-icelandic.tex
xindy -M texindy -L icelandic -C utf8 mal-icelandic.idx
lualatex mal-icelandic.tex

%! lualatex mal-icelandic.tex
%! xindy -M texindy -L icelandic -C utf8 mal-icelandic.idx
%! lualatex mal-icelandic.tex
% or with two changes: +xltxtra and -luatextra, we run xelatex
\documentclass{article}
%\usepackage[T1]{fontenc}
%\usepackage[utf8]{inputenc}
%\usepackage[icelandic,czech]{babel}
\usepackage{luatextra} % for lualatex run
%\usepackage{xltxtra} % for xelatex run
\usepackage[colorlinks]{hyperref}%hyperindex=false
\usepackage{makeidx}
\makeindex
\begin{document}
The first paragraph of text\ldots
\index{Þari - Franz Eugen Kohler, Public Domain}
\index{Þistill - ŠARŽÍK František, COPYRIGHT/PD}
\index{Önd - Karney, Lee, PD}
\index{Æðarkóngur - Whitehouse, Laura L., PD}
\index{Avókadó - Forest \& Kim [[p:2684;Starr]], CC-BY}
\index{Auðnutittlingur - Arnstein Rønning, CC BY-SA 3.0}
\index{Asni - Zicha Ondřej, COPYRIGHT/CC-BY-NC}
\index{Á - hvalur.org, CC Unported Licence}
\index{Álft - Bukovský Jiří, COPYRIGHT/CC-BY-NC}
\index{Álka - Jack Spellingbacon from Scotland, CC BY-SA 3.0}
\index{Å - a fake index entry}
% a bug? ['ð',  ['ð','Ð']],
\index{Ð - another fake index entry}
\index{E - a testing index entry}
\index{Ǽ a fake}
\begingroup
\pagestyle{empty}
\def\thispagestyle#1{}
\printindex
\endgroup
\end{document}

姆韦


编辑 1:改进的通用版本(冰岛语排序 + 欧洲西式风格,带有许多带变音符号的字母)

请将这两个文件下载到您的工作目录(我无法在这里直接发布第一个文件,因为它包含一些 TeX.SX 无法显示的特殊字符):

获得http://striz7.fame.utb.cz/tex-sx/is/icelandicmal.xdy
获得http://striz7.fame.utb.cz/tex-sx/is/icelandicmal-test.xdy

我为冰岛语创建了一套新的排序规则,其中混合了西欧的通用排序规则。因此,您可以找到字母组,即使C, Q, W, Z它们Å不在冰岛字母表中。许多字母都添加了变音符号,因此可以考虑对捷克语、斯洛伐克语、波兰语、德语以及可能更多语言中的单词进行排序(请参阅generalXindy 中的排序)。

为了获取字母列表(字母组、字母顺序),我使用:

lualatex typesetme.tex

我运行以下几行来获取索引:

lualatex mal-icelandicmal.tex
xindy -M texindy -M icelandicmal-test -M mal-style mal-icelandicmal.idx
lualatex mal-icelandicmal.tex

这是字母列表(代码,预览)和索引示例(代码,预览)。请测试它是否符合您的需求。

%! lualatex typesetme.tex
\documentclass[a4paper]{article}
\pagestyle{empty}
\usepackage{luatextra}
\newenvironment{alphabet}{\begin{tabular}{*{16}{l}}%
   }{\end{tabular}}
\addtolength{\voffset}{-1in}
\addtolength{\textheight}{1in}

\begin{document}
\section{Icelandicmal}
\subsection{Alphabet}
\begin{alphabet}
a\,A\\
á\,Á & à\,À & ă\,Ă & â\, & ã\,à & ä\,Ä & ą\,Ą\\
b\,B\\
c\,C & č\,Č & ć\,Ć & ĉ\,Ĉ & ç\,Ç\\
d\,D & ď\,Ď\\
ð\,Ð & đ\,Đ\\
e\,E\\
é\,É & è\,È & ě\,Ě & ê\,Ê & ë\,Ë & ę\,Ę\\
f\,F\\
g\,G & ĝ\,Ĝ & ğ\,Ğ\\
h\,H & ĥ\,Ĥ & ı\,I\\
i\,I\\
í\,Í & ì\,Ì & î\,Î & ï\,Ï\\
j\,J & ĵ\,Ĵ\\
k\,K\\
l\,L & ĺ\,Ĺ & ľ\,Ľ & ł\,Ł\\
m\,M\\
n\,N & ń\,Ń & ň\,Ň & ñ\,Ñ\\
o\,O\\
ó\,Ó & ő\,Ő & ò\,Ò\\
p\,P\\
q\,Q\\
r\,R & ŕ\,Ŕ & ř\,Ř\\
s\,S & ś\,Ś & š\,Š & ŝ\,Ŝ & ş\,Ş\\
t\,T & ť\,Ť\\
u\,U\\
ú\,Ú & ù\,Ù & ŭ\,Ŭ & ů\,Ů & û\,Û & ü\,Ü & ű\,Ű\\
v\,V\\
w\,W\\
x\,X\\
y\,Y\\
ý\,Ý & ÿ\,Ÿ\\
z\,Z & ź\,Ź & ż\,Ż & ž\,Ž\\
þ\,Þ\\
æ\,Æ & ǽ\,Ǽ & œ\,Œ\\
ö\,Ö & ø\,Ø & ǿ\,Ǿ & ô\,Ô & õ\,Õ\\
å\,Å
\end{alphabet}
\subsection{Ligatures}
\begin{flushleft}
`ß' is sorted like `s\,s', but \emph{after} it in otherwise equal words.
\end{flushleft}
\subsection{Upper-/lowercase words}
Capitalized or uppercase words are sorted \emph{before} otherwise equal lowercase words.
\subsection{Special characters}
The order of special characters and letters is:
\begin{flushleft}
?\hspace{4mm}!\hspace{4mm}.\hspace{4mm}letters\hspace{4mm}-\hspace{4mm}'
\end{flushleft}
\end{document}

带字母的集合预览

%! lualatex mal-icelandicmal.tex
%! xindy -M texindy -M icelandicmal-test -M mal-style mal-icelandicmal.idx 
%! lualatex mal-icelandicmal.tex
% or with two changes: +xltxtra and -luatextra, we run xelatex
\documentclass{article}
%\usepackage[T1]{fontenc}
%\usepackage[utf8]{inputenc}
%\usepackage[icelandic,czech]{babel}
\usepackage{luatextra} % for lualatex run
%\usepackage{xltxtra} % for xelatex run
\usepackage[colorlinks]{hyperref}%hyperindex=false
\usepackage{makeidx}
\makeindex
\usepackage{filecontents}
\def\mygroup#1{\textbf{#1}}
\begin{filecontents*}{mal-style.xdy}
;; mal-style.xdy
(markup-letter-group :open-head "~n\mygroup{" :close-head "}")
\end{filecontents*}

\begin{document}
The first paragraph of text\ldots
\index{Þari -- Franz Eugen Kohler, Public Domain}
\index{Þistill -- Šaržík František, COPYRIGHT/PD}
\index{Önd -- Karney, Lee, PD}
\index{Æðarkóngur -- Whitehouse, Laura L., PD}
\index{Avókadó -- Forest \& Kim [[p:2684;Starr]], CC-BY}
\index{Auðnutittlingur -- Arnstein Rønning, CC BY-SA 3.0}
\index{Asni -- Zicha Ondřej, COPYRIGHT/CC-BY-NC}
\index{Á -- hvalur.org, CC Unported Licence}
\index{Álft -- Bukovský Jiří, COPYRIGHT/CC-BY-NC}
\index{Álka -- Jack Spellingbacon from Scotland, CC BY-SA 3.0}
\index{Å -- a fake index entry}
\index{Ð -- another fake index entry}
\index{E -- a testing index entry}
\index{Ǽ a fake}
\begingroup
\pagestyle{empty}
\def\thispagestyle#1{}
\printindex
\endgroup
\end{document}

姆韦


编辑 2:简约版本(仅冰岛语排序及其 32 个字母)

请下载两个新文件:

wget http://striz7.fame.utb.cz/tex-sx/is-min/icelandicmalmin.xdy  
wget http://striz7.fame.utb.cz/tex-sx/is-min/icelandicmalmin-test.xdy  

我们运行以下四行:

pdflatex mal-icelandicmalmin.tex  
texlua iec2utf.lua <mal-icelandicmalmin.idx >mal-temp.idx  
xindy -M texindy -M icelandicmalmin-test -M mal-style -o mal-icelandicmalmin.ind mal-temp.idx  
pdflatex mal-icelandicmalmin.tex  

iec2utf.lua由 michal.h21 编写的库运行良好。如果您想查看包含的字母,请运行:

pdflatex typesetme.tex

我附上了两个经过测试的新文件pdflatex、一份 Xindy 风格的字母列表和一份冰岛语索引样本预览。

文件typesetme.tex

%! pdflatex typesetme.tex
\documentclass[a4paper]{article}
\pagestyle{empty}
%\usepackage{luatextra}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\newenvironment{alphabet}{\begin{tabular}{*{16}{l}}%
   }{\end{tabular}}
\addtolength{\voffset}{-1in}
\addtolength{\textheight}{1in}

\begin{document}
\section{Icelandicmalmin}
\subsection{Alphabet}
\begin{alphabet}
a\,A\\
á\,Á\\
b\,B\\
d\,D\\
ð\,Ð\\
e\,E\\
é\,É\\
f\,F\\
g\,G\\
h\,H\\
i\,I\\
í\,Í\\
j\,J\\
k\,K\\
l\,L\\
m\,M\\
n\,N\\
o\,O\\
ó\,Ó\\
p\,P\\
r\,R\\
s\,S\\
t\,T\\
u\,U\\
ú\,Ú\\
v\,V\\
x\,X\\
y\,Y\\
ý\,Ý\\
þ\,Þ\\
æ\,Æ\\
ö\,Ö
\end{alphabet}
%\subsection{Ligatures}
%\begin{flushleft}
%`ß' is sorted like `s\,s', but \emph{after} it in otherwise equal words.
%\end{flushleft}
\subsection{Upper-/lowercase words}
Capitalized or uppercase words are sorted \emph{before} otherwise equal lowercase words.
\subsection{Special characters}
The order of special characters and letters is:
\begin{flushleft}
?\hspace{4mm}!\hspace{4mm}.\hspace{4mm}letters\hspace{4mm}-\hspace{4mm}'
\end{flushleft}
\end{document}

文件mal-icelandicmalmin.tex

%! pdflatex mal-icelandicmalmin.tex
%! texlua iec2utf.lua <mal-icelandicmalmin.idx >mal-temp.idx
%! xindy -M texindy -M icelandicmalmin-test -M mal-style -o mal-icelandicmalmin.ind mal-temp.idx
%! pdflatex mal-icelandicmalmin.tex
%
% iec2utf.lua <--- https://github.com/michal-h21/iec2utf
\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage[icelandic,czech,english]{babel}
%\usepackage{luatextra} % for lualatex run
%\usepackage{xltxtra} % for xelatex run
\usepackage[colorlinks]{hyperref}%hyperindex=false
\usepackage{makeidx}
\makeindex
\usepackage{filecontents}
\def\mygroup#1{\textbf{#1}}
\begin{filecontents*}{mal-style.xdy}
;; mal-style.xdy
(markup-letter-group :open-head "~n\mygroup{" :close-head "}")
\end{filecontents*}

\begin{document}
The first paragraph of text\ldots
\index{Þari -- Franz Eugen Kohler, Public Domain}
\index{Þistill -- Šaržík František, COPYRIGHT/PD}
\index{Önd -- Karney, Lee, PD}
\index{Æðarkóngur -- Whitehouse, Laura L., PD}
\index{Avókadó -- Forest \& Kim [[p:2684;Starr]], CC-BY}
\index{Auðnutittlingur -- Arnstein Rønning, CC BY-SA 3.0}
\index{Asni -- Zicha Ondřej, COPYRIGHT/CC-BY-NC}
\index{Á -- hvalur.org, CC Unported Licence}
\index{Álft -- Bukovský Jiří, COPYRIGHT/CC-BY-NC}
\index{Álka -- Jack Spellingbacon from Scotland, CC BY-SA 3.0}
\index{Ð -- another fake index entry}
\index{E -- a testing index entry}
\index{É -- a testing index entry}
\begingroup
\pagestyle{empty}
\def\thispagestyle#1{}
\printindex
\endgroup
\end{document}

姆韦

相关内容