使用 xindy 和 LaTeX 按照瑞典语排序的索引（...YZÅÄÖ...）

Question 1

问题在于，使用inputenc和utf8选项时，索引文件的编写方式xindy无法处理。此示例：

\documentclass[11pt]{article} 
\usepackage[T1]{fontenc}
\usepackage[swedish]{babel}
\usepackage[utf8]{inputenc} 

\usepackage{makeidx}
\makeindex

\begin{document}
This kind of index in text:

\index{Säker plats|textbf}Säker plats
foo\index{Aäö}\index{Aäö}
bar\index{äö}\index{aø}\index{ohne}\index{øæ}
\printindex
\end{document}

生成此.idx文件：

\indexentry{S\IeC {\"a}ker plats|textbf}{1}
\indexentry{A\IeC {\"a}\IeC {\"o}}{1}
\indexentry{A\IeC {\"a}\IeC {\"o}}{1}
\indexentry{\IeC {\"a}\IeC {\"o}}{1}
\indexentry{a\IeC {\o }}{1}
\indexentry{ohne}{1}
\indexentry{\IeC {\o }\IeC {\ae }}{1}

您需要将此代码转换为utf8。可以使用我的脚本执行此操作iec2utf. 将文件下载iec2utf.lua到您的文档所在目录并制作批处理脚本swedxindy：

#!/usr/bin/env sh
texlua iec2utf.lua T1 < `basename $1 .tex`.idx | texindy -i -M lang/swedish/utf8-lang -o `basename $1 .tex`.ind

我没有使用texshop，所以我不知道如何将这个脚本添加到菜单，但您可以从命令行调用它（您必须使它可执行，我认为它可以chmod -x swedxindy像在 Linux 中一样使用命令来制作）。

编辑

我简化了流程 - 现在存储库utftexindy中有一个调用脚本iec2utf。上述流程现在可以简化为：

texlua utftexindy.lua -L swedish sample.idx

结果：

在此处输入图片描述

Answer

问题在于，使用inputenc和utf8选项时，索引文件的编写方式xindy无法处理。此示例：

\documentclass[11pt]{article} 
\usepackage[T1]{fontenc}
\usepackage[swedish]{babel}
\usepackage[utf8]{inputenc} 

\usepackage{makeidx}
\makeindex

\begin{document}
This kind of index in text:

\index{Säker plats|textbf}Säker plats
foo\index{Aäö}\index{Aäö}
bar\index{äö}\index{aø}\index{ohne}\index{øæ}
\printindex
\end{document}

生成此.idx文件：

\indexentry{S\IeC {\"a}ker plats|textbf}{1}
\indexentry{A\IeC {\"a}\IeC {\"o}}{1}
\indexentry{A\IeC {\"a}\IeC {\"o}}{1}
\indexentry{\IeC {\"a}\IeC {\"o}}{1}
\indexentry{a\IeC {\o }}{1}
\indexentry{ohne}{1}
\indexentry{\IeC {\o }\IeC {\ae }}{1}

您需要将此代码转换为utf8。可以使用我的脚本执行此操作iec2utf. 将文件下载iec2utf.lua到您的文档所在目录并制作批处理脚本swedxindy：

#!/usr/bin/env sh
texlua iec2utf.lua T1 < `basename $1 .tex`.idx | texindy -i -M lang/swedish/utf8-lang -o `basename $1 .tex`.ind

我没有使用texshop，所以我不知道如何将这个脚本添加到菜单，但您可以从命令行调用它（您必须使它可执行，我认为它可以chmod -x swedxindy像在 Linux 中一样使用命令来制作）。

编辑

我简化了流程 - 现在存储库utftexindy中有一个调用脚本iec2utf。上述流程现在可以简化为：

texlua utftexindy.lua -L swedish sample.idx

结果：

在此处输入图片描述

Question 2

在他的回答michal.h21 已识别出问题，并发现问题在于写入 UTF-8 字符的方式，如果\usepackage[utf8]{inputenc}使用：

\IeC{<LICR>}% LICR = LaTeX Internal Character Representation

由于包inputenc使 8 位字节处于活动状态，因此可以重新定义它们以打印自身而不是打印\IeC内容。

还\index使用逐字类别代码作为其参数。LaTeX 不在其逐字类别代码中包含 8 位字符，因为它必须将 UTF-8 字节序列映射到字体编码的字符槽才能获得正确的字符。

下面的示例基于 michal.h21 的示例，修补了\index它不写入扩展的 UTF-8 字符的问题：

\documentclass[11pt]{article}
\usepackage[T1]{fontenc}
\usepackage[swedish]{babel}
\usepackage[utf8]{inputenc}

\usepackage{makeidx}
\makeindex

\usepackage{etoolbox}
\makeatletter
\patchcmd\index{\@sanitize}{\@sanitize\index@sanitize}{}{%
  \errmessage{Patching \noexpand\index failed}%
}
\let\index@sanitize\@empty
\begingroup
  \count@=127
  \@whilenum\count@<255 \do{%
    \advance\count@\@ne
    \lccode`\*=\count@
    \lccode`\~=\count@
    \lowercase{%
      \expandafter
      \g@addto@macro\expandafter\index@sanitize\expandafter{%
        % verbatim catcode
        \expandafter\@makeother\csname *\endcsname
        % active character expands to non-expandable itself
        \def~{*}%
      }%

    }%  
  }
\endgroup
\makeatother

\begin{document}
This kind of index in text:

\index{Säker plats|textbf}Säker plats
foo\index{Aäö}\index{Aäö}
bar\index{äö}\index{aø}\index{ohne}\index{øæ}

\textbf{foo\index{Aäö}\index{Aäö}}
\printindex
\end{document}

.idx写入以下文件：

\indexentry{Säker plats|textbf}{1}
\indexentry{Aäö}{1}
\indexentry{Aäö}{1}
\indexentry{äö}{1}
\indexentry{aø}{1}
\indexentry{ohne}{1}
\indexentry{øæ}{1}
\indexentry{Aäö}{1}
\indexentry{Aäö}{1}

Answer

在他的回答michal.h21 已识别出问题，并发现问题在于写入 UTF-8 字符的方式，如果\usepackage[utf8]{inputenc}使用：

\IeC{<LICR>}% LICR = LaTeX Internal Character Representation

由于包inputenc使 8 位字节处于活动状态，因此可以重新定义它们以打印自身而不是打印\IeC内容。

还\index使用逐字类别代码作为其参数。LaTeX 不在其逐字类别代码中包含 8 位字符，因为它必须将 UTF-8 字节序列映射到字体编码的字符槽才能获得正确的字符。

下面的示例基于 michal.h21 的示例，修补了\index它不写入扩展的 UTF-8 字符的问题：

\documentclass[11pt]{article}
\usepackage[T1]{fontenc}
\usepackage[swedish]{babel}
\usepackage[utf8]{inputenc}

\usepackage{makeidx}
\makeindex

\usepackage{etoolbox}
\makeatletter
\patchcmd\index{\@sanitize}{\@sanitize\index@sanitize}{}{%
  \errmessage{Patching \noexpand\index failed}%
}
\let\index@sanitize\@empty
\begingroup
  \count@=127
  \@whilenum\count@<255 \do{%
    \advance\count@\@ne
    \lccode`\*=\count@
    \lccode`\~=\count@
    \lowercase{%
      \expandafter
      \g@addto@macro\expandafter\index@sanitize\expandafter{%
        % verbatim catcode
        \expandafter\@makeother\csname *\endcsname
        % active character expands to non-expandable itself
        \def~{*}%
      }%

    }%  
  }
\endgroup
\makeatother

\begin{document}
This kind of index in text:

\index{Säker plats|textbf}Säker plats
foo\index{Aäö}\index{Aäö}
bar\index{äö}\index{aø}\index{ohne}\index{øæ}

\textbf{foo\index{Aäö}\index{Aäö}}
\printindex
\end{document}

.idx写入以下文件：

\indexentry{Säker plats|textbf}{1}
\indexentry{Aäö}{1}
\indexentry{Aäö}{1}
\indexentry{äö}{1}
\indexentry{aø}{1}
\indexentry{ohne}{1}
\indexentry{øæ}{1}
\indexentry{Aäö}{1}
\indexentry{Aäö}{1}

Question 3

这是另一种方法（至少需要 4.02 版）glossaries)：

% arara: pdflatex
% arara: makeglossaries
% arara: pdflatex
\documentclass[11pt]{article}
\usepackage[T1]{fontenc}
\usepackage[swedish]{babel}
\usepackage[utf8]{inputenc}

\usepackage[index,xindy]{glossaries}

\makeglossaries

\newterm[name={Säker plats}]{Saker plats}
\newterm[name={Aäö}]{Aao}
\newterm[name={äö}]{ao}
\newterm[name={aø}]{aoslash}
\newterm{ohne}
\newterm[name={øæ}]{oae}

\begin{document}
This kind of index in text:

\gls[format=textbf]{Saker plats}
foo\glsadd{Aao}\glsadd{Aao}
bar\glsadd{ao}\glsadd{aoslash}\glsadd{ohne}\glsadd{oae}

\printindex[style=indexgroup]
\end{document}

得出的结果为：

生成的文档的图像

这是有效的，因为的默认行为glossaries是在将排序键写入外部索引文件之前对其进行清理，因此.idx上述文件如下所示：

(indexentry :tkey (("Säker plats" "\\glossentry{Saker plats}") ) :locref "{}{1}" :attr "pagetextbf" )
(indexentry :tkey (("Aäö" "\\glossentry{Aao}") ) :locref "{}{1}" :attr "pageglsnumberformat" )
(indexentry :tkey (("Aäö" "\\glossentry{Aao}") ) :locref "{}{1}" :attr "pageglsnumberformat" )
(indexentry :tkey (("äö" "\\glossentry{ao}") ) :locref "{}{1}" :attr "pageglsnumberformat" )
(indexentry :tkey (("aø" "\\glossentry{aoslash}") ) :locref "{}{1}" :attr "pageglsnumberformat" )
(indexentry :tkey (("ohne" "\\glossentry{ohne}") ) :locref "{}{1}" :attr "pageglsnumberformat" )
(indexentry :tkey (("øæ" "\\glossentry{oae}") ) :locref "{}{1}" :attr "pageglsnumberformat" )

因此 xindy 能够正确地对条目进行排序。

缺点是每个索引条目必须先使用进行定义\newterm，并且标签不能包含 ø 等 Unicode 字符。（这是因为标签构成了存储条目信息的控制序列的名称。）

Answer