自动将 unicode 双下标 aᵢⱼ = a_{i}_{j} 合并为 a_{ij}

我想使用unicode 下标。如何才能使双下标自动合并为一个下标?

$a_{ijkl}$ \\$aᵢⱼₖₗ$  % Error: Double subscript.


$a_{ijkl}$ \\$aᵢⱼₖₗ$


使用 Jinwen 的建议进行更多测试


% spacing test subscript
   $a_{ijkl}$ \\ $aᵢⱼₖₗ$

% spacing test supscript
   $a^{ijkl}$ \\ $aⁱʲᵏˡ$

% comined test
   $a^{i}_{j}$ \\ $aⁱⱼ$

% reverse comined test
   $a_{j}^{i}$ \\ $aⱼⁱ$

% long sub+supscript
   $a^{ijkl}_{ijkl}$ \\ $aⁱʲᵏˡᵢⱼₖₗ$

% multiple sub+supscripts
   $a^{ij}_{kl}$ \\ $aⁱₗʲₗ$   % Error: Double subscript. (fair enough!)




  • \@unisupA,插入\sp\bgroup在开头;
  • \@unisupB,它检查下一个宏是否为\@unisupA,如果是,则后面还有另一个上标,这种情况下无需执行任何操作;如果不是,则意味着我们已经到达末尾,这种情况下应该插入\egroup
  • 为了使逻辑起作用,还需要一个条件\if@unisup


\newunicodechar{ⁱ}{\@unisupA i \expandafter\@unisupB}
\newunicodechar{ʲ}{\@unisupA j \expandafter\@unisupB}
\newunicodechar{ᵏ}{\@unisupA k \expandafter\@unisupB}
\newunicodechar{ˡ}{\@unisupA l \expandafter\@unisupB}
\newunicodechar{ᵢ}{\@unisubA i \expandafter\@unisubB}
\newunicodechar{ⱼ}{\@unisubA j \expandafter\@unisubB}
\newunicodechar{ₖ}{\@unisubA k \expandafter\@unisubB}
\newunicodechar{ₗ}{\@unisubA l \expandafter\@unisubB}

经过多次尝试和研究,以及深入研究 Unicode 在 8 位引擎中的工作原理后,我找到了一种在 LuaTeX 和 PDFTeX 中均有效的解决方案。关键是使用


存储一次扩展的后继标记。使用 PDFTeX 时,如果后继是 unicode,这将导致它成为\UTFviii@four@octets\UTFviii@three@octets或之一\UTFviii@two@octets。然后我们可以调度一个函数来检查接下来的 1+n 个标记并将它们组合成 Unicode 字符。之后,我们将此字符扩展一次并与 进行比较\subscript


% region preamble --------------------------------------------------------------
% IMPLEMENTATION BASED ON \expandafter + \futurelet
% Provides public command: `\subscript{arg}`
% Internally uses the namespaces`\usubscript@`
\ProvidesPackage{unicode-subscript}[2024/02/21 Combining Subscripts]
% Usage: \newunicodechar{ᵢ}{\subscript{i}}
% This allows to use multiple unicode subscripts in succession:
% - `xᵢⱼₖ` ⇝ `x\textsubscript{ijk}`
% - `$xᵢⱼₖ$` ⇝ `$x_{ijk}$`
% The package is designed to work with both pdftex and luatex.
% Note: Usage of the form `x\subscript{i}\subscript{j}' is not supported.
% endregion preamble -----------------------------------------------------------

% region Package Options -------------------------------------------------------
\newif\ifusubscript@debug\usubscript@debugfalse%  Debug flag
\newif\ifusubscript@testing\usubscript@testingfalse%  Testing flag
% endregion Package Options ----------------------------------------------------

% region globals and helper functions ------------------------------------------\
% global subscript list variable
\newcommand{\usubscript@start}{\relax}%  marker for the start of a subscript
\usubscript@list@reset% initialize the list

% Prints the given message if the debug flag is set.

% stores the first token of #2 in #1

% select the correct dispatch function
% endregion globals and helper functions ---------------------------------------

% region public interface ------------------------------------------------------
% 1. If we are already in a subscript, \subscript appends the given tokens to the \usubscript@list
%    Else, it resets the \subscriptlist
% 2. Executes \usubscript@check@successor which determines if the next character is also a subscript.
%    In this case, we go back to 1, else we stop the process.
    % Initialize the list with the frst token.
    \usubscript@log{Initializing list with '\meaning#1'}%
    % Append token to existing list.
    \usubscript@log{Appending '\meaning#1' to '\usubscript@list'}%
% Check the next token to determine whether to continue the subscript or to terminate it
% Expands successor first before \futurelet, this is important to handle unicode in pdftex
% endregion public interface ---------------------------------------------------

% region private implementation ------------------------------------------------
% Test whether to terminate the subscript
\usubscript@log{Testing against '\meaning#1'}%
    \usubscript@log{ >>> Successor is another subscript!}%
    \usubscript@log{ >>> Successor is not a subscript!}%

% Terminate the subscript and insert the result
\usubscript@log{Terminating with current list '\meaning\usubscript@list'}%
    \usubscript@log{ >>> Inserting '_{\meaning\usubscript@list}{}'}%
    \usubscript@log{ >>> Inserting '\textsubscript{\meaning\usubscript@list}'}%

% There are 2 cases we consider:
% 1. The next token is a subscript, in which case we continue the process.
% 2. The next token is some unicode character, in which case:
%   2.1. We grab the necessary number of tokens if using an 8-bit engine
%   2.2. We expand the unicode character once to get the replacement tokens.
%   2.3. We compare the first token of the replacement tokens to the subscript token.
\usubscript@log{>>> Dispatching on \meaning\usubscript@successor'}%
    \usubscript@log{ >>> Detected Unicode 4 octets}%
    \usubscript@log{ >>> Detected Unicode 3 octets}%
    \usubscript@log{ >>> Detected Unicode 2 octets}%
    \usubscript@log{ >>> Detected non-Unicode}%
% dispatch the selected command.

\newcommand{\usubscript@check@unicode@four}[5]{% grabs 1+4 tokens
\usubscript@log{>>> Expand Unicode Quadruplet}%
\unless\ifcsname u8:#1#2#3#4#5\endcsname%
    \PackageError{subscript}{Detected undefined unicode.}%
\expandafter\let\expandafter\usubscript@token\csname u8:#1#2#3#4#5\endcsname%
\usubscript@log{Detected unicode '\meaning\usubscript@token'}%
\usubscript@log{Reinserting '\meaning#1#2#3#4#5'}%

\newcommand{\usubscript@check@unicode@three}[4]{% grabs 1+3 tokens
\usubscript@log{>>> Expand Unicode Triplet}%
\unless\ifcsname u8:#1#2#3#4\endcsname%
    \PackageError{subscript}{Detected undefined unicode.}%
\expandafter\let\expandafter\usubscript@token\csname u8:#1#2#3#4\endcsname%
\usubscript@log{Detected unicode '\meaning\usubscript@token'}%
\usubscript@log{Reinserting '\meaning#1#2#3#4'}%

\newcommand{\usubscript@check@unicode@two}[3]{% grabs 1+2 tokens
\usubscript@log{>>> Expand Unicode Duplet}%
\unless\ifcsname u8:#1#2#3\endcsname%
    \PackageError{subscript}{Detected undefined unicode.}%
\expandafter\let\expandafter\usubscript@token\csname u8:#1#2#3\endcsname%
\usubscript@log{Detected unicode '\meaning\usubscript@token'}%
\usubscript@log{Reinserting '\meaning#1#2#3'}%
% endregion private implementation ---------------------------------------------










% test mathmode
\\ $aᵢⱼₖₗₘₙ$
\\ $a\subscript{ijklmn}$

% test textmode
\\ aᵢⱼₖₗₘₙ
\\ a\subscript{ijklmn}





