德语复合词中连字首选项设置指南

2024-5-26 • tag-icon

假设我们希望为德语长复合词提供不同的连字点优先级。以下是我们目前尝试的方法：

\documentclass[ngerman]{article}
\usepackage{iftex}
\ifTUTeX
\else
\usepackage[T1]{fontenc}
\fi
\usepackage[ngerman]{babel}
\babelprovide[hyphenrules=ngerman-x-latest]{ngerman}
\ifluatex
  %\hyphenpenalty is 50 by default
  \exceptionpenalty=51%%% ad-hoc value greater than \hyphenpenalty.
  \exhyphenpenalty=46%%% an ad-hoc value less than \hyphenpenalty (though usually they coincide). We choose the value such that it's one less than the minimum over the hyphenation penalties of the explicitly specified compounds.
  % \babelhyphenation[ngerman]{
  %   In{-}{}{}[3]ter{-}{}{}[1]ak{-}{}{}[2]ti{-}{}{}[4]ons-dia{-}{}{}[2]gram{-}{}{}[3]me
  %   In{-}{}{}[3]ter{-}{}{}[1]ak{-}{}{}[2]ti{-}{}{}[4]ons-ver{-}{}{}[1]fei{-}{}{}[3]ne{-}{}{}[3]rung
  %   In{-}{}{}[4]ter{-}{}{}[2]ak{-}{}{}[3]ti{-}{}{}[5]ons{-}{}{}[1]ver{-}{}{}[2]fei{-}{}{}[4]ne{-}{}{}[4]rungs-dia{-}{}{}[3]gramm
  % }%%% this way we get too large penalties.  Breaking the words specified this way is discouraged (and thus, breaking the unspecified words is implicitly encouraged).
  \babelposthyphenation{ngerman}{In|ter|ak|ti|ons|dia|gram|me}%% set potentially different preferences of the hyphenation points in “Interaktionsdiagramme”
  {
    {}, {},         % In
    {pre=-, penalty=51, data=2},
    {}, {}, {},     % ter
    {pre=-, penalty=49, data=6},
    {}, {},         % ak
    {pre=-, penalty=50, data=9},
    {}, {},         % ti
    {pre=-, penalty=52, data=12}, %%% higher penalty because of the pronunciation t͡si̯oː
    {}, {}, {},     % ons
    {pre=-, penalty=48, data=16},
    {}, {}, {},     % dia
    {pre=-, penalty=50, data=20},
    {}, {}, {}, {}, % gram
    {pre=-, penalty=51, data=25},
    {}, {}          % me
  }% the arithmetic mean of the penalties is ≈ 50.14.
  \babelposthyphenation{ngerman}{In|ter|ak|ti|ons|ver|fei|ne|rung}%% set potentially different preferences of the hyphenation points in “Interaktionsverfeinerung”
  {
    {}, {},        % In
    {pre=-, penalty=51, data=2},
    {}, {}, {},    % ter
    {pre=-, penalty=49, data=6},
    {}, {},        % ak
    {pre=-, penalty=50, data=9},
    {}, {},        % ti
    {pre=-, penalty=52, data=12}, %%% higher penalty because of the pronunciation t͡si̯oː
    {}, {}, {},    % ons
    {pre=-, penalty=48, data=16},
    {}, {}, {},    % ver
    {pre=-, penalty=49, data=20},
    {}, {}, {},    % fei
    {pre=-, penalty=51, data=24},
    {}, {},        % ne
    {pre=-, penalty=51, data=27},
    {}, {}, {}, {} % rung
  }% the arithmetic mean of the penalties is 50.125.
  \babelposthyphenation{ngerman}{In|ter|ak|ti|ons|ver|fei|ne|rungs|dia|gramm}%% set potentially different preferences of the hyphenation points in “Interaktionsverfeinerungsdiagramm”
  {
    {}, {},             % In
    {pre=-, penalty=51, data=2},
    {}, {}, {},         % ter
    {pre=-, penalty=49, data=6},
    {}, {},             % ak
    {pre=-, penalty=50, data=9},
    {}, {},             % ti
    {pre=-, penalty=52, data=12}, %%% higher penalty because of the pronunciation t͡si̯oː
    {}, {}, {},         % ons
    {pre=-, penalty=48, data=16},
    {}, {}, {},         % ver
    {pre=-, penalty=49, data=20},
    {}, {}, {},         % fei
    {pre=-, penalty=51, data=24},
    {}, {},             % ne
    {pre=-, penalty=51, data=27},
    {}, {}, {}, {}, {}, % rungs
    {pre=-, penalty=47, data=33},
    {}, {}, {},         % dia
    {pre=-, penalty=50, data=37},
    {}, {}, {}, {}, {}  % gramm
  }%the arithmetic mean of the penalties is 49.8.
\fi
\showoutput
\begin{document}
Interaktionsdiagramme
Interaktionsverfeinerung
Interaktionsverfeinerungsdiagramm
\end{document}

此示例仅供演示，因此比较小。在现实世界中，完整的示例中，我们目前有 77 个\babelposthyphenation[…]{…}{…}条目，每个单词一个条目。结果非常棒：这些单词\hyphenpenalty在控制台输出中获得了标准的连字符惩罚，并且我们指定了如何拆分复合词的有意义的首选项。

我们做出了一些不太确定的一般选择：

我们将\exhyphenpenalty所有明确指定的惩罚设置为最小值 - 1。这样做的理由是，首选的断行点始终是在复合词中的破折号之后，但没有必要在不必要的情况下鼓励在这样的破折号之后换行。
每个单词的惩罚值尽量接近\hyphenpenalty平均值（默认为 50）。为了衡量“平均而言尽可能接近”，我们任意选择计算算术平均值（只是因为这种计算很简单，没有其他原因）。其他选择可以是取几何平均值、对数平均值、二次平均值、调和平均值、中位数或其他某种平均值。我们的目标是：对于通过明确指定的每个单词\babelposthyphenation，LuaLaTeX 应该尝试移动单词内断词的位置，而不是鼓励或阻止断掉整个指定的单词，或者鼓励或阻止断掉包含指定单词的段落中的其他单词（与没有指定且所有惩罚都是标准的的情况相比\hyphenpenalty）。当然，我们知道这是一个不精确的、可能是理想的、无法实现的目标。

我们是否需要总体调整这两个选择？如果是的话：

为什么
如何？

当然，这些选择只是一种指导方针，一种经验法则，我们打算在一般情况下遵守。（毫无疑问，我们可能会假设自己在未来因未知的特殊、特定词语而偏离这样的指导方针，原因尚不清楚。）

相关内容