如何检查 \newcommand 内字母后的空格？

Question 1

如果为 ə 赋予类别代码 12（=其他）而不是 11（=字母），则\ə它将不是控制字标记而是控制符号标记。除非控制符号标记的名称由类别代码为 10（空格）的字符组成，否则 TeX 不会删除控制符号标记后的空格。Ə 也
可以这样做。

缺点：如果 ə/Ə 的类别代码是 12，那么在对 .tex 输入文件的行进行标记时，TeX 将不会将 ə/Ə 视为可以作为控制字标记的多字母名称的一部分的字符，除非控制字标记是通过\csname..\endcsname扩展在将 ə/Ə 的类别代码切换为 12 之前定义的宏构造的，或者来自在将 ə \the/Ə 的类别代码切换为 12 之前分配内容的标记寄存器的扩展。

% Compile with lualatex
\documentclass[paper=7in:10in,DIV=calc,12pt]{scrbook}
\usepackage[inner=0.653in, outer=0.4in]{geometry}

\usepackage{pdfpages}

\newcommand\Ə{\char"04D8 }%<-space terminates the number of the char and gets discarded
\newcommand\ə{\char"04D9 }%<-space terminates the number of the char and gets discarded
\catcode`\Ə=12
\catcode`\ə=12

\usepackage[none]{hyphenat}

\usepackage{fontspec}
\setmainfont[Ligatures={TeX,NoCommon}]{CMU Serif}
\setsansfont[Ligatures={TeX,NoCommon}]{CMU Sans Serif}

\begin{document}

\ə h\ə rfinin sonundakı boşluğu görmür.

\əh\ərfinin sonundakı boşluğu görmür.

\end{document}

或者将\Ə/定义\ə为由显式非类别 11（字母）和非类别 10（空格）字符标记分隔的宏，以便

\Ə...非类别 11（字母）字符将不会被视为以/开头的控制字标记名称的一部分\ə...。
分隔非类别 10（空格）字符标记后面的空格将不会被丢弃，因为在对分隔非类别 10（空格）字符标记进行标记之后，TeX 的读取设备将处于状态 M（行中间）而不是 S（跳过空格）。

例如，您可以使用!以下分隔符：

% Compile with lualatex
\documentclass[paper=7in:10in,DIV=calc,12pt]{scrbook}
\usepackage[inner=0.653in, outer=0.4in]{geometry}

\usepackage{pdfpages}

\makeatletter
\@ifdefinable\Ə{\def\Ə!{\char"04D8 }}%<-space terminates the number of the char and gets discarded
\@ifdefinable\ə{\def\ə!{\char"04D9 }}%<-space terminates the number of the char and gets discarded
\makeatother

\usepackage[none]{hyphenat}

\usepackage{fontspec}
\setmainfont[Ligatures={TeX,NoCommon}]{CMU Serif}
\setsansfont[Ligatures={TeX,NoCommon}]{CMU Sans Serif}

\begin{document}

\ə! h\ə! rfinin sonundakı boşluğu görmür.

\ə!h\ə!rfinin sonundakı boşluğu görmür.

\end{document}

\@ifnextchar不适合检查标记流中的下一个标记是否是空间标记。

原因：

-mechanism的实现\@ifnextchar包含许多技巧，以确保 TeX “查看”下一个非空格标记。
\@ifnextchar如果其第一个参数是空格标记，则实现失败：
\@ifnextchar定义如下：
```
\long\def\@ifnextchar#1#2#3{%
  \let\reserved@d=#1\def\reserved@a{#2}\def\reserved@b{#3}\futurelet\@let@token\@ifnch
}%
```
如果#1是空格标记，则必须根据空格标记的语法规则，后面的将被丢弃，因此这与相同，即，将具有-primitive的含义，并且将执行后续操作，而不是定义为yield 。后续错误行为取决于的当前定义/含义。
\let\reserved@d=⟨space-token⟩\def\reserved@a{#2}...
\let=
\let\reserved@d=\def\reserved@a{#2}...
\reserved@d\def\reserved@a{#2}\reserved@a#2
\reserved@a

Answer

如果为 ə 赋予类别代码 12（=其他）而不是 11（=字母），则\ə它将不是控制字标记而是控制符号标记。除非控制符号标记的名称由类别代码为 10（空格）的字符组成，否则 TeX 不会删除控制符号标记后的空格。Ə 也
可以这样做。

缺点：如果 ə/Ə 的类别代码是 12，那么在对 .tex 输入文件的行进行标记时，TeX 将不会将 ə/Ə 视为可以作为控制字标记的多字母名称的一部分的字符，除非控制字标记是通过\csname..\endcsname扩展在将 ə/Ə 的类别代码切换为 12 之前定义的宏构造的，或者来自在将 ə \the/Ə 的类别代码切换为 12 之前分配内容的标记寄存器的扩展。

% Compile with lualatex
\documentclass[paper=7in:10in,DIV=calc,12pt]{scrbook}
\usepackage[inner=0.653in, outer=0.4in]{geometry}

\usepackage{pdfpages}

\newcommand\Ə{\char"04D8 }%<-space terminates the number of the char and gets discarded
\newcommand\ə{\char"04D9 }%<-space terminates the number of the char and gets discarded
\catcode`\Ə=12
\catcode`\ə=12

\usepackage[none]{hyphenat}

\usepackage{fontspec}
\setmainfont[Ligatures={TeX,NoCommon}]{CMU Serif}
\setsansfont[Ligatures={TeX,NoCommon}]{CMU Sans Serif}

\begin{document}

\ə h\ə rfinin sonundakı boşluğu görmür.

\əh\ərfinin sonundakı boşluğu görmür.

\end{document}

或者将\Ə/定义\ə为由显式非类别 11（字母）和非类别 10（空格）字符标记分隔的宏，以便

\Ə...非类别 11（字母）字符将不会被视为以/开头的控制字标记名称的一部分\ə...。
分隔非类别 10（空格）字符标记后面的空格将不会被丢弃，因为在对分隔非类别 10（空格）字符标记进行标记之后，TeX 的读取设备将处于状态 M（行中间）而不是 S（跳过空格）。

例如，您可以使用!以下分隔符：

% Compile with lualatex
\documentclass[paper=7in:10in,DIV=calc,12pt]{scrbook}
\usepackage[inner=0.653in, outer=0.4in]{geometry}

\usepackage{pdfpages}

\makeatletter
\@ifdefinable\Ə{\def\Ə!{\char"04D8 }}%<-space terminates the number of the char and gets discarded
\@ifdefinable\ə{\def\ə!{\char"04D9 }}%<-space terminates the number of the char and gets discarded
\makeatother

\usepackage[none]{hyphenat}

\usepackage{fontspec}
\setmainfont[Ligatures={TeX,NoCommon}]{CMU Serif}
\setsansfont[Ligatures={TeX,NoCommon}]{CMU Sans Serif}

\begin{document}

\ə! h\ə! rfinin sonundakı boşluğu görmür.

\ə!h\ə!rfinin sonundakı boşluğu görmür.

\end{document}

\@ifnextchar不适合检查标记流中的下一个标记是否是空间标记。

原因：

-mechanism的实现\@ifnextchar包含许多技巧，以确保 TeX “查看”下一个非空格标记。
\@ifnextchar如果其第一个参数是空格标记，则实现失败：
\@ifnextchar定义如下：
```
\long\def\@ifnextchar#1#2#3{%
  \let\reserved@d=#1\def\reserved@a{#2}\def\reserved@b{#3}\futurelet\@let@token\@ifnch
}%
```
如果#1是空格标记，则必须根据空格标记的语法规则，后面的将被丢弃，因此这与相同，即，将具有-primitive的含义，并且将执行后续操作，而不是定义为yield 。后续错误行为取决于的当前定义/含义。
\let\reserved@d=⟨space-token⟩\def\reserved@a{#2}...
\let=
\let\reserved@d=\def\reserved@a{#2}...
\reserved@d\def\reserved@a{#2}\reserved@a#2
\reserved@a

Question 2

Unicode 中有两个相似的字符（就形状而言）

U+0259 拉丁小写字母 SCHWA ə
U+04D9 西里尔小写字母 SCHWA ә

（以及它们的大写对应部分）。从您的代码来看，您似乎想要输入前者并获取后者。

我不知道为什么，因为根据维基百科关于阿塞拜疆字母的页面，使用的是拉丁字母版本，而不是西里尔字母。

如果您想要获取西里尔字母，无论如何输入字符，您都可以使用newunicodechar：

\documentclass{article}
\usepackage{fontspec}
\usepackage{newunicodechar}

\setmainfont[Ligatures={TeX,NoCommon}]{CMU Serif}
\setsansfont[Ligatures={TeX,NoCommon}]{CMU Sans Serif}

\newunicodechar{Ə}{Ә} % U+018F -> U+04D8
\newunicodechar{ә}{ә} % U+0259 -> U+04D9

\begin{document}

ə hə rfinin sonundakı boşluğu görmür.

Ə hə rfinin

\end{document}

另一种情况是，如果您希望 ə (U+0259) 生成自身，但\ə生成西里尔文类似字符。在这种情况下，请转换ə为类别代码 12。

\documentclass{article}
\usepackage{fontspec}

\setmainfont[Ligatures={TeX,NoCommon}]{CMU Serif}
\setsansfont[Ligatures={TeX,NoCommon}]{CMU Sans Serif}

\catcode`Ə=12
\newcommand{\Ə}{\symbol{"04D8}} % U+018F -> U+04D8
\catcode`ə=12
\newcommand{\ə}{\symbol{"04D9}} % U+0259 -> U+04D9

\begin{document}

ə hə rfinin sonundakı boşluğu görmür.

\ə h\ə rfinin sonundakı boşluğu görmür.

Ə hə rfinin

\Ə hə rfinin

\end{document}

尝试从生成的 PDF 中复制和粘贴，您将看到使用了预期的字符。

Answer