如何修改自定义环境中的单个（范围）字符？

Question 1

您可以进行正则表达式搜索和替换。具体来说，s和S被更改为字符后跟 U+030C（组合卡罗），XeTeX 会将其分别规范化为和š；Š对于h和，也是同样H，其后跟 U+032E（组合下面的短音符），并被规范化为ḫ和Ḫ。

|...|我借此机会建议\textsuperscript{...}添加替代品来代替。

\documentclass{article}
\usepackage{fontspec}
\usepackage{xparse}

\setmainfont{Libertinus Serif}

\ExplSyntaxOn
\NewDocumentCommand{\hitt}{m}
 {
  \tl_set:Nn \l_xander_hitt_tl { #1 }
  % change every run of lowercase letters into italic
  \regex_replace_all:nnN
   { [a-z]+ }
   { \c{textit}\cB\{\0\cE\} }
   \l_xander_hitt_tl
  % change every h/H into h/H U+032E
  \regex_replace_all:nnN
   { [hH] }
   { \0 \x{032e} }
   \l_xander_hitt_tl
  % change every s/S into s/S U+032E
  \regex_replace_all:nnN
   { [sS] }
   { \0 \x{030c} }
   \l_xander_hitt_tl
  % change |...| into \textsuperscript{...}
  \regex_replace_all:nnN
   { \|([^|]+)\| }
   { \c{textsuperscript}\cB\{\1\cE\} }
   \l_xander_hitt_tl
  % print the result
  \tl_use:N \l_xander_hitt_tl
 }
\ExplSyntaxOff

\begin{document}

\hitt{|DUG|ha-ne-es-n[a-as]}

\end{document}

笔记

[a-z]+代表一个或多个小写字母的连续
\c{textit}表示“控制序列\textit”
\cB\{并\cE\}代表小组制作 TeX 括号
\x{<hex digits>}代表相应代码的字符
\0在替换字符串中表示匹配；\1是第一个捕获组
\|([^|]*)\|搜索|（必须在搜索字符串中进行转义），然后搜索任何不同于|直到找到匹配的标记|

Answer

您可以进行正则表达式搜索和替换。具体来说，s和S被更改为字符后跟 U+030C（组合卡罗），XeTeX 会将其分别规范化为和š；Š对于h和，也是同样H，其后跟 U+032E（组合下面的短音符），并被规范化为ḫ和Ḫ。

|...|我借此机会建议\textsuperscript{...}添加替代品来代替。

\documentclass{article}
\usepackage{fontspec}
\usepackage{xparse}

\setmainfont{Libertinus Serif}

\ExplSyntaxOn
\NewDocumentCommand{\hitt}{m}
 {
  \tl_set:Nn \l_xander_hitt_tl { #1 }
  % change every run of lowercase letters into italic
  \regex_replace_all:nnN
   { [a-z]+ }
   { \c{textit}\cB\{\0\cE\} }
   \l_xander_hitt_tl
  % change every h/H into h/H U+032E
  \regex_replace_all:nnN
   { [hH] }
   { \0 \x{032e} }
   \l_xander_hitt_tl
  % change every s/S into s/S U+032E
  \regex_replace_all:nnN
   { [sS] }
   { \0 \x{030c} }
   \l_xander_hitt_tl
  % change |...| into \textsuperscript{...}
  \regex_replace_all:nnN
   { \|([^|]+)\| }
   { \c{textsuperscript}\cB\{\1\cE\} }
   \l_xander_hitt_tl
  % print the result
  \tl_use:N \l_xander_hitt_tl
 }
\ExplSyntaxOff

\begin{document}

\hitt{|DUG|ha-ne-es-n[a-as]}

\end{document}

笔记

[a-z]+代表一个或多个小写字母的连续
\c{textit}表示“控制序列\textit”
\cB\{并\cE\}代表小组制作 TeX 括号
\x{<hex digits>}代表相应代码的字符
\0在替换字符串中表示匹配；\1是第一个捕获组
\|([^|]*)\|搜索|（必须在搜索字符串中进行转义），然后搜索任何不同于|直到找到匹配的标记|

Question 2

各种可能性不断涌现。

这不是一个答案，而是一个延伸的潜力：赫梯文本的红字化（还有古波斯语、乌加里特语等等）。

使用具有“纯”文本输入方法的映射文件，以便 egreg 的环境处理“s”和“h”，并且另外双元音变成单字母尖音，然后使用 xelatex 进行编译：

梅威瑟：

\documentclass[12pt]{article}
\usepackage{fontspec}
\usepackage{graphicx}
\usepackage{xcolor}
\usepackage{stackengine}


\setmainfont{Linux Libertine} 
\newfontface\hitt{Noto Sans Cuneiform}


\newfontface\fmhitt[Mapping=latin-to-hittite2,Colour=blue]{Noto Sans Cuneiform}



\DeclareTextFontCommand{\texthitt}{\Large\fmhitt}



\newenvironment{thitt}
  {\fmhitt\Large\ignorespaces}
  {\ignorespacesafterend}


\ExplSyntaxOn
\NewDocumentCommand{\ehitt}{m}
 {
  \tl_set:Nn \l_xander_hitt_tl { #1 }
  % change every run of lowercase letters into italic
  \regex_replace_all:nnN
   { [a-z]+ }
   { \c{textit}\cB\{\0\cE\} }
   \l_xander_hitt_tl
  % change every h/H into h/H U+032E
  \regex_replace_all:nnN
   { [hH] }
   { \0 \x{032e} }
   \l_xander_hitt_tl
  % change every s/S into s/S U+032E
  \regex_replace_all:nnN
   { [sS] }
   { \0 \x{030c} }
   \l_xander_hitt_tl
  % change every double vowel aa/ into a/ U+0301
  \regex_replace_all:nnN
   { ([a,e,i,u]){2,2} }
   { \1 \x{0301} }
   \l_xander_hitt_tl
  % change |...| into \textsuperscript{...}
  \regex_replace_all:nnN
   { \|([^|]+)\| }
   { \c{textsuperscript}\cB\{\c{textsc}\cB\{\1\cE\}\cE\} 
    }
   \l_xander_hitt_tl
  % print the result
  \tl_use:N \l_xander_hitt_tl
 }
\ExplSyntaxOff

  
  
\newcommand\hstackon[2]{\stackon{\begin{thitt}#1\end{thitt}}{\ehitt{#2}}}  

\begin{document}
\section{Hittite}
\hstackon{|dug|}{|dug|}
\hstackon{ha}{ha}
\hstackon{ne}{ne}
\hstackon{es}{es}
\hstackon{na}{na}
\hstackon{as}{as}



Random text from the UChicago \textit{Hittite Dictionary}, volume \textbf{P}, p 144, sv \ehitt{parh-} (chase): "and may the oath deities pursue you continuously"


\begin{center}
\hstackon{niis}{|niis|}
\hstackon{dingir}{|dingir|}
\hstackon{mes}{|mes|}
\hstackon{paar}{paar}
\hstackon{he}{he}
\hstackon{es}{es}
\hstackon{kaan}{kaan}
\hstackon{du}{du}
\end{center}


\begin{center}
\hstackon{|niis.dingir.mes|paar-he-es-kaan-du}{|niis.dingir.mes|paar-he-es-kaan-du}
\end{center}

Mapping file (based on Wikipedia article) is incomplete, having the syllables (V, VC, VC, CVC) and a few determiners. Glyph for \textsc{\ehitt{niis}} needs to be tracked down; keyed in as \texttt{niis}, it is obviously incorrect as it is being parsed as the two syllables \texttt{ni} + \texttt{is}:  \hstackon{ni}{ni}\hstackon{is}{is} by the font map.

Otherwise, some general formatting required (e.g., struts or similar to line up the ruby text).



\end{document}

映射文件，使用 teckit_compile.exe 进行编译：

; TECkit mapping for TeX input conventions <-> Unicode characters

LHSName "latin-to-hittite"
RHSName "UNICODE"

pass(Unicode)

; ligatures from Knuth's original CMR fonts
U+002D U+002D           <>  U+2013  ; -- -> en dash
U+002D U+002D U+002D    <>  U+2014  ; --- -> em dash

U+0027          <>  U+2019  ; ' -> right single quote
U+0027 U+0027   <>  U+201D  ; '' -> right double quote
U+0022           >  U+201D  ; " -> right double quote

U+0060          <>  U+2018  ; ` -> left single quote
U+0060 U+0060   <>  U+201C  ; `` -> left double quote

U+0021 U+0060   <>  U+00A1  ; !` -> inverted exclam
U+003F U+0060   <>  U+00BF  ; ?` -> inverted question

; additions supported in T1 encoding
U+002C U+002C   <>  U+201E  ; ,, -> DOUBLE LOW-9 QUOTATION MARK
U+003C U+003C   <>  U+00AB  ; << -> LEFT POINTING GUILLEMET
U+003E U+003E   <>  U+00BB  ; >> -> RIGHT POINTING GUILLEMET


;U+0020    >  U+0020 ;  space maps to space
U+002D    >  U+200D ;  hyphen as Zero Width Joiner
U+002E    >  U+200D ;  dot as Zero Width Joiner
U+007C    >  U+200C ;  pipe as Zero Width Non-Joiner


U+0061    <>  U+12000 ;  a

Answer