使用 xindy 包中的 make-rules 对越南语 utf8 索引进行排序

使用 xindy 包中的 make-rules 对越南语 utf8 索引进行排序

我正在使用该xindy包编译包含越南语的索引。除了单词内的顺序外,它运行良好。接受的“标准”顺序如下,这与当前xindy默认顺序不同:

我去了 bcd 的 e 频道,发现它是一个很棒的频道,而且它非常漂亮,我很喜欢它,虽然它不是我想要的,但它确实让我想起了 pqrstu,它是我最喜欢的 vwxy 频道。

我编辑了越南语/utf8.pl.in(附在下面)并使用此版本进行测试。但是,有些单词的排序方式出乎意料。例如,预期的顺序应该是,

  1. 希恩

但它不是按该顺序排序的,即最后两个数字 5 和 6 现在位于 3 和 4 的位置。单词也是如此:Lan Làn Lản Lãn Lán Lạn(注意声调变音符号)。

这里的问题是:为什么岚,岚排序于兰, 兰,鉴于脚本中提供的规则?

任何人?

以下是针对越南语修改的 make-rules perl 脚本:

#!/usr/bin/perl

$language = "Vietnamese";
$prefix = "vi";
$script = "latin";

$alphabet = [
['A',  ['a','A'],['à','À'],['ả','Ả'],['ã','Ã'],['á','Á'],['ạ','Ạ']],
['Ă',  ['ă','Ă'],['ằ','Ằ'],['ẳ','Ẳ'],['ẵ','Ẵ'],['ắ','Ắ'],['ặ','Ặ']],
['Â',  ['â','Â'],['ầ','Ầ'],['ẩ','Ẩ'],['ẫ','Ẫ'],['ấ','Ấ'],['ậ','Ậ']],
               [], # a with ogonek (polish)
['B',  ['b','B']],
               [], # b with hook (hausa)
['C',  ['c','C']],
               [], # ch (spanish/traditonal)
               [], # cs (hungarian)
               [], # c with caron (many)
               [], # c with acute (croatian, lower sorbian, polish)
               [], # c with circumflex (esperanto)
               [], # c with cedilla (albanian, kurdish, turkish)
['D',  ['d','D']],
               [], # dh (albanian)
               [], # dz (hungarian)
               [], # dzs (hungarian)
               [], # d+z with caron (croatian)
               [], # d+z with acute (upper sorbian)
               [], # d with caron (slovak/large)
['Đ',  ['đ','Đ']],
               [], # d with hook (hausa)
               [], # eth (icelandic)
['E',  ['e','E'],['è','È'],['ẻ','Ẻ'],['ẽ','Ẽ'],['é','É'],['ẹ','Ẹ']],
               [], # e with caron (lower/upper sorbian)
['Ê',  ['ê','Ê'],['ề','Ề'],['ể','Ể'],['ễ','Ễ'],['ế','Ế'],['ệ','Ệ']],
               [], # e with diaeresis (albanian)
               [], # e with ogonek (polish)
['F',  ['f','F']],
['G',  ['g','G']],
               [], # gj (albanian)
               [], # gy (hungarian)
               [], # g with circumflex (esperanto)
               [], # g with breve (turkish)
               [], # g with cedilla/comma (latvian)
               [], # postpalatal fricative (gypsy/northrussian)
['H',  ['h','H']],
               [], # h with circumflex (esperanto)
               [], # ch (many)
               [], # dotless i (turkish)
['I',  ['i','I'],['ì','Ì'],['ỉ','Ỉ'],['ĩ','Ĩ'],['í','Í'],['ị','Ị']],
               [], # i with inverted breve below (gypsy/northrussian)
               [], # i with circumflex (kurdish, romanian)
               [], # i with diaeresis (gypsy/northrussian)
['J',  ['j','J']],
               [], # j with circumflex (esperanto)
['K',  ['k','K']],
               [], # kh (gypsy/northrussian)
               [], # k with cedilla/comma (latvian)
               [], # k with hook (hausa)
               [], # x (gypsy/northrussian)
               [], # l with stroke (lower/upper sorbian)
['L',  ['l','L']],
               [], # lj (croatian)
               [], # ll (albanian, spanish/traditonal)
               [], # ly (hungarian)
               [], # l with cedilla/comma (latvian)
               [], # l with stroke (polish)
               [], # l with caron (slovak/large)
['M',  ['m','M']],
['N',  ['n','N']],
               [], # nj (albanian, croatian)
               [], # ny (hungarian)
               [], # n with caron (slovak/large)
               [], # n with acute (lower/upper sorbian, polish)
               [], # n with tilde (spanish/modern, spanish/traditional)
               [], # n with cedilla/comma (latvian)
['O',  ['o','O'],['ò','Ò'],['ỏ','Ỏ'],['õ','Õ'],['ó','Ó'],['ọ','Ọ']],
               [], # o with acute (polish, upper sorbian)
['Ô',  ['ô','Ô'],['ồ','Ồ'],['ổ','Ổ'],['ỗ','Ỗ'],['ố','Ố'],['ộ','Ộ']],
['Ơ',  ['ơ','Ơ'],['ờ','Ờ'],['ở','Ở'],['ỡ','Ỡ'],['ớ','Ớ'],['ợ','Ợ']],
               [], # o with diaeresis (hungarian, turkish)
['P',  ['p','P']],
               [], # ph (gypsy/northrussian)
['Q',  ['q','Q']],
['R',  ['r','R']],
               [], # rr (albanian)
               [], # r with caron (czech, slovak/large, upper sorbian)
               [], # r with acute (lower sorbian)
               [], # r with cedilla/comma (latvian)
['S',  ['s','S']],
               [], # sh (albanian)
               [], # sz (hungarian)
               [], # s with caron (many)
               [], # s with acute (lower sorbian, polish)
               [], # s with circumflex (esperanto)
               [], # s with comma below (romanian)
               [], # s with cedilla (kurdish, turkish)
               [], # z (estonian)
               [], # z with caron (estonian)
['T',  ['t','T']],
               [], # th (albanian)
               [], # ty (hungarian)
               [], # t with caron (slovak/large)
               [], # t with comma below (romanian)
               [], # c with acute (upper sorbian) @@
['U',  ['u','U'],['ù','Ù'],['ủ','Ủ'],['ũ','Ũ'],['ú','Ú'],['ụ','Ụ']],
               [], # u with breve (esperanto)
               [], # u with circumflex (kurdish)
['Ư',  ['ư','Ư'],['ừ','Ừ'],['ử','Ử'],['ữ','Ữ'],['ứ','Ứ'],['ự','Ự']],
               [], # u with diaeresis (hungarian, turkish)
['V',  ['v','V']],
['W',  ['w','W']],
               [], # o with tilde (estonian)
               [], # a with diaeresis (estonian)
               [], # o with diaeresis (estonian)
               [], # u with diaeresis (estonian)
['X',  ['x','X']],
               [], # xh (albanian)
['Y',  ['y','Y'],['ỳ','Ỳ'],['ỷ','Ỷ'],['ỹ','Ỹ'],['ý','Ý'],['ỵ','Ỵ']],
               [], # y preceded by apostrophe (hausa)
               [], # yogh (english)
['Z',  ['z','Z']],
               [], # zh (albanian)
               [], # zs (hungarian)
               [], # z with caron (many)
               [], # z with acute (lower sorbian, polish)
               [], # z with dot above (polish)
               [], # thorn (icelandic)
               [], # wynn (english)
               [], # ligature ae (danish, icelandic, norwegian)
               [], # o with stroke (danish, norwegian)
               [], # a with ring above (danish, norwegian, swedish)
               [], # a with diaeresis (finnish, swedish)
               [], # o with diaeresis (finnish, swedish)
               [], # a with ring above (icelandic)
];

$sortcase = 'Aa';
#$sortcase = 'aA';

$ligatures = [
];

@special = ('?', '!', '.', 'letters', '-', '\'');

do 'make-rules.pl';

答案1

较长的文章

我可以确认,例如我们可以比较http://vietunicode.sourceforge.net/charset/quytacABC_en.html由越南本地人撰写,并附有来自 xindy 的字母表,http://xindy.sourceforge.net/doc/make-rules-alphabets-doc.pdf,第41页,有区别,正如问题所说。

我认为造成这种错误的原因是西欧语言(generalstyle in xindy)不能识别所有的声调变音符号。我们知道的那些已经排序,而这两个无法识别的标记被放在排序顺序的末尾。这些事情发生了。

如何修复?仅仅编辑vietnamese/utf8.pl.in是不够的,它是一个 Perl 脚本,用于生成必要的文件,尤其是文件。我强烈不建议直接编辑此文件,因为它是官方版本,并且会在 TeXLive/MikTeX 更新时xdy自动用新版本覆盖。xindy

这是我在工作目录中本地修复它的方法:

  • xindy-make-rules我从下载了最新版本http://sourceforge.net/projects/xindy/files/xindy-make-rules/0.2/到我的工作目录。
  • 我解压了该文件。
  • alphabets/vietnamese我通过将目录复制到名为 的新文件夹来进行备份alphabets/vietnamesemal。我进入了该新文件夹。
  • 只有一个文件。我复制了一份并命名,utf8.pl-mal.in以便将原始版本保存在同一文件夹中,以备日后需要。
  • 我编辑了这个文件gEdithttp://wiki.gnome.org/Apps/Gedit。它是众多能够处理全套 UTF-8 字符的编辑器之一。
  • 我按照越南语的排序规则重新排列了字符。这是新文件。
#!/usr/bin/perl

$language = "Vietnamese";
$prefix = "vi";
$script = "latin";

$alphabet = [
['A',  ['a','A'],['à','À'],['ả','Ả'],['ã','Ã'],['á','Á'],['ạ','Ạ']],
['Ă',  ['ă','Ă'],['ằ','Ằ'],['ẳ','Ẳ'],['ẵ','Ẵ'],['ắ','Ắ'],['ặ','Ặ']],
['Â',  ['â','Â'],['ầ','Ầ'],['ẩ','Ẩ'],['ẫ','Ẫ'],['ấ','Ấ'],['ậ','Ậ']],
                   [], # a with ogonek (polish)
['B',  ['b','B']],
                   [], # b with hook (hausa)
['C',  ['c','C']],
                   [], # ch (spanish/traditonal)
                   [], # cs (hungarian)
                   [], # c with caron (many)
                   [], # c with acute (croatian, lower sorbian, polish)
                   [], # c with circumflex (esperanto)
                   [], # c with cedilla (albanian, kurdish, turkish)
['D',  ['d','D']],
                   [], # dh (albanian)
                   [], # dz (hungarian)
                   [], # dzs (hungarian)
                   [], # d+z with caron (croatian)
                   [], # d+z with acute (upper sorbian)
                   [], # d with caron (slovak/large)
['Đ',  ['đ','Đ']],
                   [], # d with hook (hausa)
                   [], # eth (icelandic)
['E',  ['e','E'],['è','È'],['ẻ','Ẻ'],['ẽ','Ẽ'],['é','É'],['ẹ','Ẹ']],
                   [], # e with caron (lower/upper sorbian)
['Ê',  ['ê','Ê'],['ề','Ề'],['ể','Ể'],['ễ','Ễ'],['ế','Ế'],['ệ','Ệ']],
                   [], # e with diaeresis (albanian)
                   [], # e with ogonek (polish)
['F',  ['f','F']],
['G',  ['g','G']],
                   [], # gj (albanian)
                   [], # gy (hungarian)
                   [], # g with circumflex (esperanto)
                   [], # g with breve (turkish)
                   [], # g with cedilla/comma (latvian)
                   [], # postpalatal fricative (gypsy/northrussian)
['H',  ['h','H']],
                   [], # h with circumflex (esperanto)
                   [], # ch (many)
                   [], # dotless i (turkish)
['I',  ['i','I'],['ì','Ì'],['ỉ','Ỉ'],['ĩ','Ĩ'],['í','Í'],['ị','Ị']],
                   [], # i with inverted breve below (gypsy/northrussian)
                   [], # i with circumflex (kurdish, romanian)
                   [], # i with diaeresis (gypsy/northrussian)
['J',  ['j','J']],
                   [], # j with circumflex (esperanto)
['K',  ['k','K']],
                   [], # kh (gypsy/northrussian)
                   [], # k with cedilla/comma (latvian)
                   [], # k with hook (hausa)
                   [], # x (gypsy/northrussian)
                   [], # l with stroke (lower/upper sorbian)
['L',  ['l','L']],
                   [], # lj (croatian)
                   [], # ll (albanian, spanish/traditonal)
                   [], # ly (hungarian)
                   [], # l with cedilla/comma (latvian)
                   [], # l with stroke (polish)
                   [], # l with caron (slovak/large)
['M',  ['m','M']],
['N',  ['n','N']],
                   [], # nj (albanian, croatian)
                   [], # ny (hungarian)
                   [], # n with caron (slovak/large)
                   [], # n with acute (lower/upper sorbian, polish)
                   [], # n with tilde (spanish/modern, spanish/traditional)
                   [], # n with cedilla/comma (latvian)
['O',  ['o','O'],['ò','Ò'],['ỏ','Ỏ'],['õ','Õ'],['ó','Ó'],['ọ','Ọ']],
                   [], # o with acute (polish, upper sorbian)
['Ô',  ['ô','Ô'],['ồ','Ồ'],['ổ','Ổ'],['ỗ','Ỗ'],['ố','Ố'],['ộ','Ộ']],
['Ơ',  ['ơ','Ơ'],['ờ','Ờ'],['ở','Ở'],['ỡ','Ỡ'],['ớ','Ớ'],['ợ','Ợ']],
                   [], # o with diaeresis (hungarian, turkish)
['P',  ['p','P']],
                   [], # ph (gypsy/northrussian)
['Q',  ['q','Q']],
['R',  ['r','R']],
                   [], # rr (albanian)
                   [], # r with caron (czech, slovak/large, upper sorbian)
                   [], # r with acute (lower sorbian)
                   [], # r with cedilla/comma (latvian)
['S',  ['s','S']],
                   [], # sh (albanian)
                   [], # sz (hungarian)
                   [], # s with caron (many)
                   [], # s with acute (lower sorbian, polish)
                   [], # s with circumflex (esperanto)
                   [], # s with comma below (romanian)
                   [], # s with cedilla (kurdish, turkish)
                   [], # z (estonian)
                   [], # z with caron (estonian)
['T',  ['t','T']],
                   [], # th (albanian)
                   [], # ty (hungarian)
                   [], # t with caron (slovak/large)
                   [], # t with comma below (romanian)
                   [], # c with acute (upper sorbian) @@
['U',  ['u','U'],['ù','Ù'],['ủ','Ủ'],['ũ','Ũ'],['ú','Ú'],['ụ','Ụ']],
                   [], # u with breve (esperanto)
                   [], # u with circumflex (kurdish)
['Ư',  ['ư','Ư'],['ừ','Ừ'],['ử','Ử'],['ữ','Ữ'],['ứ','Ứ'],['ự','Ự']],
                   [], # u with diaeresis (hungarian, turkish)
['V',  ['v','V']],
['W',  ['w','W']],
                   [], # o with tilde (estonian)
                   [], # a with diaeresis (estonian)
                   [], # o with diaeresis (estonian)
                   [], # u with diaeresis (estonian)
['X',  ['x','X']],
                   [], # xh (albanian)
['Y',  ['y','Y'],['ỳ','Ỳ'],['ỷ','Ỷ'],['ỹ','Ỹ'],['ý','Ý'],['ỵ','Ỵ']],
                   [], # y preceded by apostrophe (hausa)
                   [], # yogh (english)
['Z',  ['z','Z']],
                   [], # zh (albanian)
                   [], # zs (hungarian)
                   [], # z with caron (many)
                   [], # z with acute (lower sorbian, polish)
                   [], # z with dot above (polish)
                   [], # thorn (icelandic)
                   [], # wynn (english)
                   [], # ligature ae (danish, icelandic, norwegian)
                   [], # o with stroke (danish, norwegian)
                   [], # a with ring above (danish, norwegian, swedish)
                   [], # a with diaeresis (finnish, swedish)
                   [], # o with diaeresis (finnish, swedish)
                   [], # a with ring above (icelandic)
];

$sortcase = 'Aa';
#$sortcase = 'aA';

$ligatures = [
];

@special = ('?', '!', '.', 'letters', '-', '\'');

do 'make-rules.pl';
  • 我进入alphabets/目录cd ..
  • 然后我运行perl vietnamesemal/utf8.pl-mal.in vietnamesemal/vietnamesemal。我收到一条消息,说Alphabet has 126 elements.我在那里收到了几个新文件。
  • 我再次输入。我将vietnamesemal第一行改为。这意味着我们稍后的工作不需要子目录。vietnamesemal-test.xdy(require "vietnamesemal.xdy")

  • 我做了vietnamesemal-doc.tex一点修改:\subsection-> \section;\subsubsection改为\subsection(四次);我注释掉了\idef\fdef(两次)和\newpage(一次)。这是该文件的修改版本。

\section{Vietnamese}

\subsection{Alphabet}
%\icod\fcod
\begin{alphabet}
a\,A & à\,À & ả\,Ả & ã\,Ã & á\,Á & ạ\,Ạ\\
ă\,Ă & ằ\,Ằ & ẳ\,Ẳ & ẵ\,Ẵ & ắ\,Ắ & ặ\,Ặ\\
â\, & ầ\,Ầ & ẩ\,Ẩ & ẫ\,Ẫ & ấ\,Ấ & ậ\,Ậ\\
b\,B\\
c\,C\\
d\,D\\
đ\,Đ\\
e\,E & è\,È & ẻ\,Ẻ & ẽ\,Ẽ & é\,É & ẹ\,Ẹ\\
ê\,Ê & ề\,Ề & ể\,Ể & ễ\,Ễ & ế\,Ế & ệ\,Ệ\\
f\,F\\
g\,G\\
h\,H\\
i\,I & ì\,Ì & ỉ\,Ỉ & ĩ\,Ĩ & í\,Í & ị\,Ị\\
j\,J\\
k\,K\\
l\,L\\
m\,M\\
n\,N\\
o\,O & ò\,Ò & ỏ\,Ỏ & õ\,Õ & ó\,Ó & ọ\,Ọ\\
ô\,Ô & ồ\,Ồ & ổ\,Ổ & ỗ\,Ỗ & ố\,Ố & ộ\,Ộ\\
ơ\,Ơ & ờ\,Ờ & ở\,Ở & ỡ\,Ỡ & ớ\,Ớ & ợ\,Ợ\\
p\,P\\
q\,Q\\
r\,R\\
s\,S\\
t\,T\\
u\,U & ù\,Ù & ủ\,Ủ & ũ\,Ũ & ú\,Ú & ụ\,Ụ\\
ư\,Ư & ừ\,Ừ & ử\,Ử & ữ\,Ữ & ứ\,Ứ & ự\,Ự\\
v\,V\\
w\,W\\
x\,X\\
y\,Y & ỳ\,Ỳ & ỷ\,Ỷ & ỹ\,Ỹ & ý\,Ý & ỵ\,Ỵ\\
z\,Z
\end{alphabet}
%\idef\fdef

\subsection{Ligatures}
\begin{flushleft}
None.

\end{flushleft}

\subsection{Upper-/lowercase words}
Capitalized or uppercase words are sorted \emph{before} otherwise equal lowercase words.

\subsection{Special characters}
The order of special characters and letters is:
\begin{flushleft}
?\hspace{4mm}!\hspace{4mm}.\hspace{4mm}letters\hspace{4mm}-\hspace{4mm}'
\end{flushleft}
%\newpage
  • 我上了一个层次,对alphabets-doc.tex文件进行了大量修改,使之成为这种形式。我使用拉丁现代字体(其默认字体系列)运行了它,并确保所有音调符号都排版正确。处理该文件lualatex会很容易。xelatex

我附上了新文件和预览。我们可以检查一切是否正常,经过这些更改后应该就正常了。

%! lualatex alphabets-doc.tex

\documentclass[a4paper]{article}
\pagestyle{empty}
%\usepackage{a4wide}
\usepackage{luatextra}
%\usepackage[TS1,LGR,T2A,T1]{fontenc}
%\usepackage[colorlinks]{hyperref}
%\usepackage[latin2,cp1251,cp1252]{inputenc}
%\newcommand{\idef}{\inputencoding{cp1252}}
%\newcommand{\fdef}{\fontencoding{T1}\selectfont}
%\newcommand{\icod}{\inputencoding{cp1252}}
%\newcommand{\fcod}{\fontencoding{T1}\selectfont}
%\newcommand{\ienc}[1]{\renewcommand{\icod}{\inputencoding{#1}}}
%\newcommand{\fenc}[1]{\renewcommand{\fcod}{\fontencoding{#1}\selectfont}}
\newenvironment{alphabet}{\begin{tabular}{*{16}{l}}%
% &
% \small (\v{}) & \small (\'{}) & \small (\`{}) & \small (\u{}) &
% \small (\^{}) & \small (\~{}) & \small (\r{}) & \small (\"{}) &
% \small (,) & \small (\c{}) & \small (k{}) & \small (\.{}) &
% \small (-) & \small (\={}) & \small (?)\\
}{\end{tabular}}
\setlength{\topskip}{0mm}
\setlength{\topmargin}{-15mm}
\setlength{\textheight}{260mm}
\setcounter{tocdepth}{2}
\begin{document}
\title{Alphabets}
\author{Generated by \tt make-rules.pl}
%\maketitle
\begin{center}
  {\LARGE Alphabets} --
  \texttt{\Large Generated by \tt make-rules.pl} \par\medskip
  \large \today
\end{center}
%\tableofcontents
%\newpage
%\input{alphabets-inc.tex}
\input{vietnamesemal/vietnamesemal-doc.tex}
\end{document}

mwe1

  • vietnemasemal文件夹中,我创建了一个名为的新 TeX 文件mal-vietnamese.tex来测试它。我使用了几个资源来获得更长的索引。链接在 TeX 文件中提到。

  • 我还通过创建文件添加了词组malstyle.xdy。它是文件的一部分mal-vietnamese.tex。它使用一个名为的命令\malgroup,我们可以在 TeX 文件中直接控制其样式,而无需额外运行xindy

这是我提到的文件:

%! lualatex mal-vietnamese.tex

\documentclass[a4paper]{article}
\pagestyle{empty}
\usepackage{multicol}
\usepackage{luatextra}
\usepackage{makeidx}
\makeindex
% Rather new tool... %  makeindex, xindy
%\usepackage[xindy]{imakeidx} 
%\makeindex[columns=3, options={-M texindy -M vietnamesemal -M malstyle}]

% The sources of data...
% http://tex.stackexchange.com/questions/142345/sorting-vietnamese-utf8-index-with-make-rules-in-xindy-package
% http://vietunicode.sourceforge.net/charset/quytacABC_en.html
% http://tex.stackexchange.com/questions/131334/how-to-sort-alphabet-index-in-vietnamese-correctly
\usepackage{filecontents}

\begin{document}

\begin{filecontents*}{malstyle.xdy}
;; malstyle.xdy
(markup-letter-group :open-head "~n\malgroup{ " :close-head " }" :capitalize)
\end{filecontents*}

\def\malgroup#1{\par\medskip\textbf{\large$\sim$#1$\sim$}\par\nopagebreak}
\begingroup
\def\thispagestyle#1{}
\printindex
\endgroup

\index{Hiên}
\index{Hiền}
\index{Hiển}
\index{Hiễn}
\index{Hiến}
\index{Hiện}

\index{Lản}
\index{Lãn}
\index{Lán}
\index{Lạn}

\index{Apple}\index{Apricot}\index{Avocado}\index{Banana}
\index{Bilberry}\index{Blackberry}\index{Blackcurrant}\index{Blueberry}
\index{Currant}\index{Cherry}\index{Cherimoya}\index{Clementine}
\index{Date}\index{Damson}\index{Dragonfruit}\index{Durian}
\index{Eggplant}\index{Elderberry}\index{Feijoa}\index{Gooseberry}
\index{Grape}\index{Grapefruit}\index{Guava}\index{Huckleberry}
\index{Jackfruit}\index{Jambul}\index{Kiwi fruit}\index{Kumquat}
\index{Legume}\index{Lemon}\index{Lime}\index{Lychee}\index{Mandarine}
\index{Mango}\index{Melon}\index{Nectarine}\index{Orange}\index{Peach}
\index{Pear}\index{Pitaya}\index{Physalis}\index{Plum}\index{Pineapple}
\index{Pomegranate}\index{Purple Mangosteen}\index{Raisin}\index{Raspberry}
\index{Rambutan}\index{Redcurrant}\index{Salal berry}\index{Satsuma}
\index{Star fruit}\index{Strawberry}\index{Tangerine}\index{Tomato}
\index{Ugli fruit}\index{Watermelon}\index{Ziziphus mauritiana}
\index{Đồng biến}
\index{Nghịch biến}
\index{Dao động điều hòa}
\index{Ếch}

Plain text\ldots
\end{document}

我运行了这三个命令,我们得到了一个符合越南语排序规则的索引。我正在将文件提交utf8.pl-mal.in给的作者xindy,也许他们会在的某个新版本中使用它xindy。我们拭目以待!好好享受,再见!\index{tạm biệt}

lualatex mal-vietnamese.tex
xindy -M texindy -M vietnamesemal-test -M malstyle mal-vietnamese.idx
lualatex mal-vietnamese.tex

在此处输入图片描述


一个从头开始的工作示例,无需进行任何更改

我无法为您提供如何全局设置新版本的通用解决方案,我宁愿向您展示它在我的工作目录中的工作原理,如果需要,您可以管理其余部分。

我已收到社区的第一批回复xindy,并获得了一个测试文件(感谢 Hien Pham)。让我向您展示我所做的工作:

  • 我创建了一个明亮的新文件夹,hien-example我将在其中工作。
  • 我在那里创建了一个新的 TeX 文件,IndexVietnamese.tex。因为我无法使用texindy(我的 Windows 主目录包含变音字母,我无法更改它,否则会有风险),因此我无法以imakeidx最佳方式使用:参数xindytexindy运行texindy,我将从命令行运行此示例。
  • vietnamesemal-test.xdy从我的服务器进入该目录。
  • vietnamesemal.xdy也请从我的服务器获取。
  • 运行这四行。如果您希望获得没有组字母的版本,请删除-M malstyle(两次):

lualatex --shell-escape IndexVietnamese.tex
xindy -M vietnamesemal-test -M malstyle authors.idx
xindy -M vietnamesemal-test -M malstyle IndexVietnamese.idx
lualatex --shell-escape IndexVietnamese.tex

我附上了这三个文件(一个文件在帖子中,两个文件可以从我的服务器下载)以及第 3 页(作者索引)和第 5 页和第 6 页(主题索引)的预览。

这是IndexVietnamese.tex文件:

%! lualatex --shell-escape IndexVietnamese.tex
%! xindy -M vietnamesemal-test -M malstyle authors.idx
%! xindy -M vietnamesemal-test -M malstyle IndexVietnamese.idx
%! lualatex --shell-escape IndexVietnamese.tex

\documentclass{book}
%\usepackage[utf8]{vietnam}
\usepackage{luatextra}
\usepackage[noautomatic]{imakeidx} % [noautomatic]
\makeindex[title=Topic index]
% columns=3, options={-M vietnamesemal-test},
% from lang/vietnamese/utf8-lang to a testing version
\makeindex[name=authors, title=Author index]
% columns=3, options={-M vietnamesemal-test},
% from lang/vietnamese/utf8-lang to a testing version
\makeindex
\usepackage{natbib}  

% \usepackage[noautomatic]{imakeidx}
% \makeindex[columns=3,options={-M lang/vietnamese/utf8-lang}]   
%\renewcommand\indexname{Topic index}
\usepackage{filecontents}

% For purpose of the post...
\usepackage{fancyhdr}
\pagestyle{fancy}
\fancyhf{}
\renewcommand{\headrulewidth}{0pt}


\begin{document}

\begin{filecontents*}{malstyle.xdy}
;; malstyle.xdy
(markup-letter-group :open-head "~n\malgroup{ " :close-head " }" :capitalize)
\end{filecontents*}
\def\malgroup#1{\par\medskip\textbf{\large$\sim$#1$\sim$}\par\nopagebreak}

\citeindextrue  

\chapter{Hello}

And it has fancy authors. \index[authors]{Doreian, Patrick} \index[authors]{Batagelj, Vladimir} \index[authors]{Ferligoj, Anuška}
key words: network\index{network}, blockmodeling\index{blockmodeling}, and so on.  

% I am making this index shorter to fit onto two pages...
Some text.\index{Apple}\index{Apricot}\index{Avocado}%\index{Banana}
\index{Bilberry}\index{Blackberry}\index{Blackcurrant}%\index{Blueberry}
\index{Currant}\index{Cherry}\index{Cherimoya}%\index{Clementine}
\index{Date}\index{Damson}%\index{Dragonfruit}\index{Durian}
\index{Eggplant}\index{Elderberry}\index{Feijoa}%\index{Gooseberry}
\index{Grape}\index{Grapefruit}\index{Guava}\index{Huckleberry}
\index{Jackfruit}\index{Jambul}\index{Kiwi fruit}\index{Kumquat}
\index{Legume}\index{Lemon}\index{Lime}\index{Lychee}\index{Mandarine}
\index{Mango}\index{Melon}\index{Nectarine}\index{Orange}\index{Peach}
\index{Pear}\index{Pitaya}\index{Physalis}\index{Plum}\index{Pineapple}
\index{Pomegranate}\index{Purple Mangosteen}\index{Raisin}\index{Raspberry}
\index{Rambutan}\index{Redcurrant}\index{Salal berry}\index{Satsuma}
\index{Star fruit}\index{Strawberry}\index{Tangerine}\index{Tomato}
\index{Ugli fruit}\index{Watermelon}\index{Ziziphus mauritiana}
\index{Đồng biến}
\index{Ông bà}
\index{Ăn ở}
\index{Ương ngạnh}
\index{Nghịch biến}
\index{Dao động điều hòa}
\index{Ếch}
\index{Ẩm kế}
\index{Ấm áp}
\index{Ầm ĩ}
\index{Ậm ừ}
\index{Âm hưởng}
\index{Ẫm ờ}
\index{Ơn huệ}
\index{Lan}
\index{Làn}
\index{Lãn}
\index{Lạn}
\index{Lản}
\index{Lán}
\index{Ãn}\index{An}\index{Án}\index{Ạn}\index{Àn}\index{Ản}
\index{Hiện}\index{Hiển}\index{Hiến}\index{Hiên}\index{Hiễn}\index{Hiền}
\index{choảng}\index{choạng}\index{choang}\index{choáng}\index{choãng}\index{choàng}
\index{provinces!Ontario}
\index{provinces!Saskatchewan}
\index{provinces!British Columbia}
\index{provinces!Alberta!Edmonton}
\index{territories@\textbf{territories}|see{vùng đất}}
\index{vùng đất} % a correction

% For purpose of this post...
\begingroup
\def\thispagestyle#1{}
\printindex[authors]

\def\indexname{Topic index} % The imakeidx is not working for me at its best...
\printindex
\endgroup

\end{document}

这是vietnamesemal-test.xdy文件,最后这是文件vietnamesemal.xdy(均由 生成xindy-make-rules)。我尽力将它们发布到这里,但第二个生成的文件包含一些无法在此处正确显示的字符,因此无法正常工作xindy,请下载它们,例如通过:

获得http://striz7.fame.utb.cz/tex-sx/vietnamesemal-test.xdy
获得http://striz7.fame.utb.cz/tex-sx/vietnamesemal.xdy

我还发布了 TeX 文件(以防万一)和 PDF 文件(如果您想放大页面),您可以通过以下方式获取文件:

获得http://striz7.fame.utb.cz/tex-sx/IndexVietnamese.tex
获得http://striz7.fame.utb.cz/tex-sx/IndexVietnamese.pdf

作者索引,一个工作示例

主题索引,一个工作示例

相关内容