xindy:如何向字母表添加新字母和字母组?

xindy:如何向字母表添加新字母和字母组?

如何定义一个样式文件,在拉丁字母表中添加几个额外的字母以及相应的字母组?

我正在使用该index包并编译.idx文件,texindy如下所示:

texindy -o book.main.ind -L english -M lang/english/utf8 -M style1 book.main.idx

我正在编制一个包含巴利语词汇的索引,例如ānāpānasati、saṃsāra、ñāṇavimutti。

这些\index命令使用 TeX 重音符号给出这些单词,因此在文本中上述单词将被索引:

\index[general]{\=an\=ap\=anasati}
\index[general]{sa\d{m}s\=ara}
\index[general]{\~n\=a\d{n}avimutti}

我尝试添加字母组,但 ā、ñ 等字母没有开始新的组,它们只是包含在拉丁字母 a、n 等下。

(define-letter-group "ā" :after "a" :before "b")
(define-letter-group "ñ" :after "n" :before "o")

(merge-rule "\=a" "ā" :string)
(merge-rule "\={a}" "ā" :string)
(merge-rule "\d{m}" "ṃ" :string)
(merge-rule "\d {m}" "ṃ" :string)
(merge-rule "\~n" "ñ" :string)
(merge-rule "\~ n" "ñ" :string)

答案1

较长的文章

我已经准备了两个巴利语版本,供xindy:通用排序规则(英文排序规则加上几个新字母和一部分巴利文排序规则)和最小排序规则(仅适用于巴利文)。

我正在使用 运行示例lualatex,这些示例xelatex也适用于 。引擎使用拉丁现代排版。几个语音字母无法正确呈现,所以我使用了代码2000字体。如果您想运行pdflatex,请考虑使用iec2utf,这是由 michal-h21 编写的 Lua 脚本。

我使用了几种资源(如下所列)来获取巴利语单词作为测试用例。如果我拼错了一些单词,我深感抱歉。


1. 通用版本(部分西方语言加上巴利语的排序规则)

我想确保不仅字母会被添加,而且巴利语单词也会被正确排序。我从http://www.omniglot.com/writing/pali.htmhttp://en.wikipedia.org/wiki/Pali。此外,我还记住了这个文件http://pratyeka.org/narada/pali_alphabets.pdf并且我在字母表中添加了几个音标符号,作为实验看看 Xindy 是否能够处理(它可以)。

我已经下载并安装代码2000字体以确保所有字母,特别是那些不常见的字母,都能正确显示。

我附上了一份包含 36 个字母组的列表:

% run: lualatex or xelatex typeset-paligeneral.tex
% Alphabet has 36 elements.
\documentclass[a4paper]{article}
\pagestyle{empty}
%\usepackage{luatextra}
%\usepackage{xltxtra}
\newenvironment{alphabet}{\begin{tabular}{*{16}{l}}}{\end{tabular}}
\addtolength{\voffset}{-0.5in}
\addtolength{\textheight}{2in}
\usepackage{fontspec}
% http://web.archive.org/web/20101122142710/http://code2000.net/code2000_page.htm
\setmainfont{Code2000}

\begin{document}
\section{Paligeneral}
\subsection{Alphabet}
\begin{alphabet}
a\,A & á\,Á & à\,À & ă\,Ă & â\, & ã\,à & ä\,Ä & ą\,Ą & å\,Å & æ\,Æ & ǽ\,Ǽ\\
ā\,Ā\\
b\,B\\
c\,C & ć\,Ć & ĉ\,Ĉ & ç\,Ç & č\,Č\\
d\,D & ð\,ð & đ\,đ & ď\,Ď\\
ḍ\,Ḍ & ɖ\,Ɖ\\
e\,E & é\,É & è\,È & ě\,Ě & ê\,Ê & ë\,Ë & ę\,Ę & þ\,Þ\\
f\,F\\
g\,G & ĝ\,Ĝ & ğ\,Ğ\\
h\,H & ĥ\,Ĥ & ı\,I\\
i\,I & í\,Í & ì\,Ì & î\,Î & ï\,Ï\\
ī\,Ī\\
j\,J & ĵ\,Ĵ\\
k\,K\\
l\,L & ĺ\,Ĺ & ł\,Ł & ľ\,Ľ\\
ḷ\,Ḷ & ɭ\,ɭ\\
m\,M\\
ṃ\,Ṃ & ŋ\,Ŋ & ɱ\,ɱ\\
n\,N & ń\,Ń & ň\,Ň\\
ṅ\,Ṅ & ɲ\,Ɲ\\
ñ\,Ñ\\
ṇ\,Ṇ & ɳ\,ɳ\\
o\,O & ó\,Ó & ő\,Ő & ò\,Ò & ö\,Ö & ø\,Ø & ǿ\,Ǿ & ô\,Ô & õ\,Õ & œ\,Œ\\
p\,P\\
q\,Q\\
r\,R & ŕ\,Ŕ & ř\,Ř\\
s\,S & ś\,Ś & ŝ\,Ŝ & ş\,Ş & š\,Š\\
t\,T & ť\,Ť\\
ṭ\,Ṭ & ʈ\,Ʈ\\
u\,U & ú\,Ú & ù\,Ù & ŭ\,Ŭ & ů\,Ů & û\,Û & ü\,Ü & ű\,Ű\\
ū\,Ū\\
v\,V\\
w\,W\\
x\,X\\
y\,Y & ý\,Ý & ÿ\,Ÿ\\
z\,Z & ź\,Ź & ż\,Ż & ž\,Ž
\end{alphabet}

\subsection{Ligatures}
\begin{flushleft}
`ß' is sorted like `s\,s', but \emph{after} it in otherwise equal words.
\end{flushleft}

\subsection{Upper-/lowercase words}
Capitalized or uppercase words are sorted \emph{before} otherwise equal lowercase words.

\subsection{Special characters}
The order of special characters and letters is:
\begin{flushleft}
?\hspace{4mm}!\hspace{4mm}.\hspace{4mm}letters\hspace{4mm}-\hspace{4mm}'
\end{flushleft}

\end{document}

mwe,通用版本,字母列表

用法

请将这两个文件下载到您的工作目录:

wget http://striz7.fame.utb.cz/tex-sx/pi/utf8.pl-paligeneral.in.xdy  
wget http://striz7.fame.utb.cz/tex-sx/pi/utf8.pl-paligeneral.in-test.xdy  

我们在第一个示例上运行以下三行(mal-paligeneral.tex):

lualatex mal-paligeneral.tex
xindy -M texindy -M utf8.pl-paligeneral.in-test -M bonus mal-paligeneral.idx
lualatex mal-paligeneral.tex

如果您想查看矢量版本,请下载:

wget http://striz7.fame.utb.cz/tex-sx/pi/typeset-paligeneral.pdf
wget http://striz7.fame.utb.cz/tex-sx/pi/mal-paligeneral.pdf

这是 TeX 文件的内容,我将在本文末尾附上它的预览,以便我们可以比较这两个版本。

% run: lualatex or xelatex mal-paligeneral.tex
% uncomment line 9 or 10, respectively
%
%lualatex mal-paligeneral.tex
%xindy -M texindy -M utf8.pl-paligeneral.in-test -M bonus mal-paligeneral.idx
%lualatex mal-paligeneral.tex
%
\documentclass[a4paper]{article}
\usepackage{luatextra} % for lualatex engine
%\usepackage{xltxtra} % for xelatex engine
\pagestyle{empty}
\usepackage{makeidx}
\makeindex
\usepackage[colorlinks]{hyperref}
\usepackage{filecontents}
\begin{filecontents*}{bonus.xdy}
(markup-letter-group :open-head "~n  \textbf{" :close-head "}")
\end{filecontents*}

\begin{document}
Regular text.
\index{pāḷi}
\index{ānāpānasati}
\index{saṃsāra}
\index{ñāṇavimutti}
% Pali Buddhist Dictionary
\index{insight}
\index{paññā}
\index{vipassanā}
\index{ñāṇa}
\index{nirvāṇa}
\index{nibbāna}
\index{permanency}
%\index{vipallāsa}
%\index{personality}
%\index{sakkāya}
%\index{diṭṭhi}
% sortedCanonList.txt
\index{aādiparokkhāyañca}
\index{sammāyojitajālavāta}
\index{ṃsati}
\index{ḷukhappa}
\index{homaidaṃ}
\index{sumanosmīti}
\index{sāmākanīvāre}
\index{vīriyārambhassa}
\index{jeṭṭhasissa}
\index{ñattikammavācāpi}
\index{ñattikammaṃ}
\index{ārammaṇamariyādā}
\index{aāgamo}
\begingroup
\def\thispagestyle#1{}
\printindex
\endgroup
\end{document}

2. 精简版本 (仅限巴利语)

这是字母表中使用的 41 个字母组的列表。

% run: lualatex or xelatex typeset-paliminimal.tex
% Alphabet has 41 elements.
\documentclass[a4paper]{article}
\pagestyle{empty}
%\usepackage{luatextra}
%\usepackage{xltxtra}
\newenvironment{alphabet}{\begin{tabular}{*{16}{l}}}{\end{tabular}}
\addtolength{\voffset}{-0.75in}
\addtolength{\textheight}{2in}
\usepackage{fontspec}
% http://web.archive.org/web/20101122142710/http://code2000.net/code2000_page.htm
\setmainfont{Code2000}

\begin{document}
\section{Paliminimal}
\subsection{Alphabet}
\begin{alphabet}
a\,A\\
ā\,Ā\\
i\,I\\
ī\,Ī\\
u\,U\\
ū\,Ū\\
e\,E\\
o\,O\\
k\,K\\
kh\,Kh\,KH\\
g\,G\\
gh\,Gh\,GH\\
ṅ\,Ṅ & ɲ\,Ɲ\\
c\,C\\
ch\,Ch\,CH\\
j\,J\\
jh\,Jh\,JH\\
ñ\,Ñ\\
ṭ\,Ṭ & ʈ\,Ʈ\\
ṭh\,Ṭh\,ṬH\\
ḍ\,Ḍ & ɖ\,Ɖ\\
ḍh\,Ḍh\,ḌH\\
ṇ\,Ṇ & ɳ\,ɳ\\
t\,T\\
th\,Th\,TH\\
d\,D\\
dh\,Dh\,DH\\
n\,N\\
p\,P\\
ph\,Ph\,PH\\
b\,B\\
bh\,Bh\,BH\\
m\,M\\
y\,Y\\
r\,R\\
l\,L\\
v\,V\\
s\,S\\
h\,H\\
ḷ\,Ḷ & ɭ\,ɭ\\
ṃ\,Ṃ & ŋ\,Ŋ & ɱ\,ɱ
\end{alphabet}
\subsection{Ligatures}
\begin{flushleft}
`ß' is sorted like `s\,s', but \emph{after} it in otherwise equal words.
\end{flushleft}
\subsection{Upper-/lowercase words}
Capitalized or uppercase words are sorted \emph{before} otherwise equal lowercase words.
\subsection{Special characters}
The order of special characters and letters is:
\begin{flushleft}
?\hspace{4mm}!\hspace{4mm}.\hspace{4mm}letters\hspace{4mm}-\hspace{4mm}'
\end{flushleft}
\end{document}

mwe,第 2 部分,最小版本

用法

请下载这两个文件:

wget http://striz7.fame.utb.cz/tex-sx/pi/utf8.pl-paliminimal.in.xdy
wget http://striz7.fame.utb.cz/tex-sx/pi/utf8.pl-paliminimal.in-test.xdy

我们运行这三行(为了进行比较,这些术语与第一个示例中使用的术语相同):

lualatex mal-paliminimal.tex
xindy -M texindy -M utf8.pl-paliminimal.in-test -M bonus mal-paliminimal.idx
lualatex mal-paliminimal.tex

如果您想查看矢量版本,可以按如下方式下载 PDF 文件:

wget http://striz7.fame.utb.cz/tex-sx/pi/typeset-paliminimal.pdf
wget http://striz7.fame.utb.cz/tex-sx/pi/mal-paliminimal.pdf

这是第二个 TeX 文件的内容,并附上两个版本的预览。左侧是常规版本,右侧是最小版本,即单词列。

% run: lualatex or xelatex mal-paliminimal.tex
% uncomment line 9 or 10, respectively
%
%lualatex mal-paliminimal.tex
%xindy -M texindy -M utf8.pl-paliminimal.in-test -M bonus mal-paliminimal.idx
%lualatex mal-paliminimal.tex
%
\documentclass[a4paper]{article}
\usepackage{luatextra} % for lualatex engine
%\usepackage{xltxtra} % for xelatex engine
\pagestyle{empty}
\usepackage{makeidx}
\makeindex
\usepackage[colorlinks]{hyperref}
\usepackage{filecontents}
\begin{filecontents*}{bonus.xdy}
(markup-letter-group :open-head "~n  \textbf{" :close-head "}")
\end{filecontents*}

\begin{document}
Regular text.
\index{pāḷi}
\index{ānāpānasati}
\index{saṃsāra}
\index{ñāṇavimutti}
% Pali Buddhist Dictionary
\index{insight}
\index{paññā}
\index{vipassanā}
\index{ñāṇa}
\index{nirvāṇa}
\index{nibbāna}
\index{permanency}
%\index{vipallāsa}
%\index{personality}
%\index{sakkāya}
%\index{diṭṭhi}
% sortedCanonList.txt
\index{aādiparokkhāyañca}
\index{sammāyojitajālavāta}
\index{ṃsati}
\index{ḷukhappa}
\index{homaidaṃ}
\index{sumanosmīti}
\index{sāmākanīvāre}
\index{vīriyārambhassa}
\index{jeṭṭhasissa}
\index{ñattikammavācāpi}
\index{ñattikammaṃ}
\index{ārammaṇamariyādā}
\index{aāgamo}
\begingroup
\def\thispagestyle#1{}
\printindex
\endgroup
\end{document}

这是索引的预览,两个版本(左侧为通用版本,右侧为最小版本)

答案2

明白了!

因此,在我们的资料来源的序言中.tex,我们说

\usepackage{index}
\newindex{general}{general-idx}{general-ind}{General Index}

在文中,我们以这种方式定义索引条目:

\index[general]{\=an\=ap\=anasati}
\index[general]{sa\d{m}s\=ara}
\index[general]{\~n\=a\d{n}avimutti}

当我们编译.tex文档时,我们将得到book.main.general-idx我们必须处理的文件xindy

主要问题是在到达我们的自定义之前texindy加载许多模块。.xdystyle1.xdy

我们必须xindy直接调用扩展的style1.xdy,它只定义我们需要定义的内容。

因此我打开了模块(/usr/share/xindy/...),它texindy按照输出结果进行操作,

Opening logfile "/dev/null" (done)
Reading indexstyle...
Loading module "/tmp/1DxWUsIHQG"...
Loading module "lang/english/latin9-lang.xdy"...
Loading module "lang/english/latin9.xdy"...
Finished loading module "lang/english/latin9.xdy".
Finished loading module "lang/english/latin9-lang.xdy".
Loading module "tex/inputenc/latin.xdy"...
Finished loading module "tex/inputenc/latin.xdy".
Loading module "texindy.xdy"...
Loading module "numeric-sort.xdy"...
Finished loading module "numeric-sort.xdy".
Loading module "latex.xdy"...
Loading module "tex.xdy"...
Finished loading module "tex.xdy".
Finished loading module "latex.xdy".
Loading module "latex-loc-fmts.xdy"...
Finished loading module "latex-loc-fmts.xdy".
Loading module "makeindex.xdy"...
Finished loading module "makeindex.xdy".
Loading module "latin-lettergroups.xdy"...
Finished loading module "latin-lettergroups.xdy".
Finished loading module "texindy.xdy".
Loading module "page-ranges.xdy"...
Finished loading module "page-ranges.xdy".
Loading module "word-order.xdy"...
Finished loading module "word-order.xdy".
Loading module "lang/english/utf8.xdy"...
WARNING: define-letter-group: prefix "�" now maps to letter group "Þ"
Finished loading module "lang/english/utf8.xdy".
Loading module "book-order.xdy"...
Finished loading module "book-order.xdy".
Finished loading module "/tmp/1DxWUsIHQG".
Finished reading indexstyle.
Finalizing indexstyle... (done)

Reading raw-index "/tmp/4o3huWTPhU"...
Finished reading raw-index.

Processing index... [10%] [20%] [30%] [40%] [50%] [60%] [70%] [80%] [90%] [100%]
Finished processing index.

Writing markup... [10%] [20%] [30%] [40%] [50%] [60%] [70%] [80%] [90%] [100%]
Markup written into file "book.main.general-ind".

并将必要的部分复制到style1.xdy,在必要时扩展巴利语字母的定义。在下面的文件中,拉丁字母表扩展了字母 ā、ī、ṃ、ñ。巴利语中还有更多,但现在就够了。

style1.xdy

;; xindy style file for an index with extended alphabet

(define-attributes (("default" "textbf" "textit" "hyperpage")))

;; "see" and "see also"

(define-crossref-class "see")
(markup-crossref-list :class "see" :open "\see{" :sep "; " :close "}{}")
(define-crossref-class "seealso")
(markup-crossref-list :class "seealso" :open "\seealso{" :sep "; " :close "}{}")

(markup-crossref-layer-list :sep ", ")

(require "base/numeric-sort.xdy")

(define-location-class "arabic-page-numbers" ("arabic-numbers"))
(define-location-class "roman-page-numbers"  ("roman-numbers-lowercase"))
(define-location-class "Roman-page-numbers"  ("roman-numbers-uppercase"))
(define-location-class "alpha-page-numbers"  ("alpha"))
(define-location-class "Alpha-page-numbers"  ("ALPHA"))

(define-location-class-order ("roman-page-numbers"
                  "Roman-page-numbers"
                  "arabic-page-numbers"
                  "alpha-page-numbers"
                  "Alpha-page-numbers"
                  "see"
                  "seealso"))

(require "lang/english/utf8.xdy")

(define-alphabet "latin-pali"
("a" "ā" "b" "c" "d" "e" "f" "g" "h" "i" "ī" "j" "k" "l" "m" "ṃ" "n" "ñ" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"))
(define-letter-groups
("a" "ā" "b" "c" "d" "e" "f" "g" "h" "i" "ī" "j" "k" "l" "m" "ṃ" "n" "ñ" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"))

;; =======================
;; merge rules
;; =======================

;; for the pali words

;; TODO: some variations of the TeX accent markups could be combined using an :eregexp instead of :string replacements

(merge-rule "\=a" "ā" :string)
(merge-rule "\={a}" "ā" :string)
(merge-rule "\=A" "Ā" :string)
(merge-rule "\={A}" "Ā" :string)
(merge-rule "\={\i}" "ī" :string)
(merge-rule "\={\i }" "ī" :string)
(merge-rule "\d{m}" "ṃ" :string)
(merge-rule "\d {m}" "ṃ" :string)
(merge-rule "\~~n" "ñ" :string)
(merge-rule "\~~ n" "ñ" :string)

;; LaTeX and TeX conventions

(merge-rule "\\LaTeXe *" "LaTeX2e" :eregexp :again)
(merge-rule "\\BibTeX *" "BibTeX" :eregexp :again)
(merge-rule "\\AmSTeX *" "AmSTeX" :eregexp :again)
(merge-rule "\\AmSLaTeX *" "AmSLaTeX" :eregexp :again)
(merge-rule "\\XeT *" "XeT" :eregexp :again)

(require "base/tex.xdy")

(markup-locref :open "\textbf{" :close "}" :attr "textbf")
(markup-locref :open "\textit{" :close "}" :attr "textit")
(markup-locref :open "\hyperpage{" :close "}" :attr "hyperpage")

;; =======================
;; sort rules
;; =======================

;; list pali capitals under pali lowercase

(sort-rule "Ā" "ā")
(sort-rule "Ṃ" "ṃ")
(sort-rule "Ī" "ī")
(sort-rule "Ñ" "ñ")

;; list latin capitals under latin lowercase

(sort-rule "A" "a")
(sort-rule "B" "b")
(sort-rule "C" "c")
(sort-rule "D" "d")
(sort-rule "E" "e")
(sort-rule "F" "f")
(sort-rule "G" "g")
(sort-rule "H" "h")
(sort-rule "I" "i")
(sort-rule "J" "j")
(sort-rule "K" "k")
(sort-rule "L" "l")
(sort-rule "M" "m")
(sort-rule "N" "n")
(sort-rule "O" "o")
(sort-rule "P" "p")
(sort-rule "Q" "q")
(sort-rule "R" "r")
(sort-rule "S" "s")
(sort-rule "T" "t")
(sort-rule "U" "u")
(sort-rule "V" "v")
(sort-rule "W" "w")
(sort-rule "X" "x")
(sort-rule "Y" "y")
(sort-rule "Z" "z")

;; ======================
;; markup rules
;; ======================

(require "base/page-ranges.xdy")

(markup-index :open
"\begin{theindex}
  \providecommand*\lettergroupDefault[1]{}
  \providecommand*\lettergroup[1]{%
      \par\textbf{#1}\par
      \nopagebreak
  }
"
          :close "~n~n\end{theindex}~n"
          :tree)

(markup-indexentry :open "~n  \item "           :depth 0)
(markup-indexentry :open "~n    \subitem "      :depth 1)
(markup-indexentry :open "~n      \subsubitem " :depth 2)

(markup-locclass-list :open ", " :sep ", ")
(markup-locref-list   :sep ", ")

;; letter group markup

(markup-letter-group-list :sep "~n~n  \indexspace~n")

(markup-letter-group :open-head "~n  \lettergroupDefault{" :close-head "}" :group "default")
(markup-letter-group :open-head "~n  \lettergroup{" :close-head "}")

如果我们现在xindy调用

xindy -I latex -o book.main.general-ind -M style1 book.main.general-idx

我们将获得如下所示的文件,其中包含额外的字母组,我们可以在文档中的适当位置ind使用这些字母组。\printindex[general].tex

  \lettergroup{a}
  \item abiding, \hyperpage{33}
  \item absorption
    \subitem in daily activities, \hyperpage{132}
  \item accomplishments
    \subitem worldly, \hyperpage{66}

  \indexspace

  \lettergroup{ā}
  \item \=acariya, \hyperpage{96}
  \item \=Al\=ara K\=al\=ama, \hyperpage{51}

  \indexspace

...

  \lettergroup{n}
  \item names, \hyperpage{21}, \hyperpage{23}, \hyperpage{183}
  \item nature
    \subitem existing according to, \hyperpage{69}
  \item nibb\=ana, \hyperpage{32}, \hyperpage{33}
    \subitem causes of, \hyperpage{76}
    \subitem meaning of, \hyperpage{54}
  \item nimitta, \hyperpage{41}

  \indexspace

  \lettergroup{ñ}
  \item \~n\=ayapa\d {t}ipanno
    \subitem those who practice for realisation of the path, \hyperpage{58}

...

愿这会给众多技术人员带来幸福。

相关内容