如何修复 glyphtounicode.tex 中缺失或不正确的映射

Question 1

您可以添加自己的定义。例如，这里有一个如何将“a”复制为“A”的示例：

\documentclass[a4paper,12pt]{article}

\usepackage[ansinew]{inputenc}
\usepackage[T1]{fontenc}
\input{glyphtounicode}

\pdfglyphtounicode{a}{0041} %0041=A
\pdfgentounicode=1
\begin{document}
aaaaa 
\end{document}

主要问题自然是找到您正在使用的字形的名称。如果您知道字体，您可以在或中找到名称afm。pfb您也可以添加\pdfcompresslevel=0到您的文档，然后检查pdf。查找以开头的行/CharSet（如果您使用多种字体，则会有多个行）。例如，如果我添加\int到示例中，我会找到/CharSet (/integraltext)和integraltext是字形的名称。

如果符号不是单个字形或者其名称不是唯一的或者从一个字体系列更改为下一个字体系列，则可能需要使用-package accsupp。从 PDF 复制文本时是否可以提供替代文本？。

Answer

您可以添加自己的定义。例如，这里有一个如何将“a”复制为“A”的示例：

\documentclass[a4paper,12pt]{article}

\usepackage[ansinew]{inputenc}
\usepackage[T1]{fontenc}
\input{glyphtounicode}

\pdfglyphtounicode{a}{0041} %0041=A
\pdfgentounicode=1
\begin{document}
aaaaa 
\end{document}

主要问题自然是找到您正在使用的字形的名称。如果您知道字体，您可以在或中找到名称afm。pfb您也可以添加\pdfcompresslevel=0到您的文档，然后检查pdf。查找以开头的行/CharSet（如果您使用多种字体，则会有多个行）。例如，如果我添加\int到示例中，我会找到/CharSet (/integraltext)和integraltext是字形的名称。

如果符号不是单个字形或者其名称不是唯一的或者从一个字体系列更改为下一个字体系列，则可能需要使用-package accsupp。从 PDF 复制文本时是否可以提供替代文本？。

Question 2

以下具体解决方案基于 Ulrike Fischer 的回答：

解决方案，第 1 部分（使用\pdfglyphtounicode）：以下几行对第一批符号有帮助：

\pdfglyphtounicode{notsubsetdbl}{22D0 0338}
\pdfglyphtounicode{simequal}{2245}
\pdfglyphtounicode{notsimequal}{2247}
\pdfglyphtounicode{uniontext}{22C3}
\pdfglyphtounicode{nelement}{2209}
\pdfglyphtounicode{nequal}{2260}
\pdfglyphtounicode{llbracket}{27E6}
\pdfglyphtounicode{rrbracket}{27E7}
\pdfglyphtounicode{llparenthesis}{0028 007C}
\pdfglyphtounicode{rrparenthesis}{007C 0029}
\pdfglyphtounicode{colonequal}{2254}

宏\models、\Rsh、\textlengthmark、\blackdiamond、\sqbullet似乎\square需要该accsupp包。在 pdf 文件中，它们分别使用以下字形名称进行处理：bar + equal、eacute、colon、ogonek、quotesinglbase、hungarumlaut。这解释了它们的粘贴行为；这些名称通常具有不同的含义，即粘贴内容所显示的含义。

解决方案，第 2 部分（使用包accsupp）：\models以下代码创建新的“Unicode 兼容”命令。用户当然需要用这些新命令（ by等）替换旧命令\Umodels。此处使用的数学字符类（mathord等）基于我的独特需求。

\RequirePackage{accsupp} % Unicode-pastable versions of symbols
  \newcommand*{\Umodels}{\BeginAccSupp{method=hex,unicode,ActualText=22A7}\mathrel{\models}\EndAccSupp{}}
  \newcommand*{\URsh}{\BeginAccSupp{method=hex,unicode,ActualText=21B1}\mathord{\Rsh}\EndAccSupp{}}
  \newcommand*{\Utextlengthmark}{\BeginAccSupp{method=hex,unicode,ActualText=02D0}\textlengthmark\EndAccSupp{}}
  \newcommand*{\Ublackdiamond}{\BeginAccSupp{method=hex,unicode,ActualText=2B29}\mathord{\blackdiamond}\EndAccSupp{}}
  \newcommand*{\Usqbullet}{\BeginAccSupp{method=hex,unicode,ActualText=25AA}\mathord{\sqbullet}\EndAccSupp{}}
  \newcommand*{\Usquare}{\BeginAccSupp{method=hex,unicode,ActualText=25AB}\mathord{\square}\EndAccSupp{}}

（对于那些感到疑惑的人来说，的值ActualText也可以是一个空格分隔的列表十六进制 UTF-16 值。请注意，这些是不是Unicode 代码点但它们的 UTF-16 表示形式（对于 Unicode 基本多语言平面 BMP 之外的字符来说这些并不相同）。有关如何粘贴 BMP 之外的 Unicode 字符的更多信息，请参阅问题/答案。）

附加奖金：如何修复现有\pdfglyphtounicode任务：如果你想改变现有的分配，如 U+25C1 \lhd（glyphtounicode.tex包含行\pdfglyphtounicode{triangleleft}{25C1}），只需重新调用\pdfglyphtounicode宏后行\input glyphtounicode；例如，您可以写\pdfglyphtounicode{triangleleft}{22B2}，这将覆盖原始定义。

Answer