具体来说,我希望能够完成这张表:
symbol html entity LaTeX command
\deg ° \begin{alltt}\deg{alltt}
这样我就可以为官方规格。
我知道可能没有针对所有事物的映射,但越多越好。
我需要它来检查将要以 HTML 和 LaTeX 格式发布的源文档的一致性;源文档可能已经包含 HTML 命名实体。因此我需要一个映射图表来提供转换,或者对未映射的映射发出一致性警告。
答案1
在 ConTeXt 中,char-ent.lua
文件包含所有 HTML 实体的列表。您可以使用表格访问它们characters.entities
。例如,以下代码将所有实体及其值打印到屏幕上。
\starttext
\startluacode
local entities = characters.entities
for name, value in next, entities do
print(name,value)
end
\stopluacode
\stoptext
这些实体不会被翻译成相应的 TeX 命令,而是会被翻译成相应的 unicode 符号。如果你想要符号对应的 TeX 名称,你可以搜索表格characters.data
(定义在char-def.lua
)
\starttext
\startluacode
local data = characters.data
local function context_name(value)
value = data[value]
if value then
if value.contextname then
return value.contextname
elseif value.mathname then
return value.mathname
elseif value.mathspec then
return value.mathspec[1].name
else
return "not defined"
end
else
return "not defined"
end
end
local entities = characters.entities
for name, value in next, entities do
print(name,value, context_name(value))
end
\stopluacode
\stoptext
结果列表如下这里。我不知道 LaTeX 是否对所有命令都遵循相同的命名约定。
答案2
只是为了结束这个问题,我最初的问题是将 HTML 4.0 实体映射到 LaTeX。Aditya 的回复提供了映射范围更广的命名字符实体的能力。
我使用 Aditya 的数据集生成了仅针对 HTML 4.0 的映射,如下所示。如果它不在列表中,则没有映射(或者我搞乱了数据集缩减)。
请参阅评论——该表仅在 ConTeXt 中才有很大价值。
HTML 4.0 / LaTeX
Aacute / Aacute
aacute / aacute
Acirc / Acircumflex
acirc / acircumflex
acute / textacute
AElig / AEligature
aelig / aeligature
Agrave / Agrave
agrave / agrave
alefsym / aleph
Alpha / greekAlpha
alpha / greekalpha
and / wedge
ang / angle
Aring / Aring
aring / aring
asymp / approx
Atilde / Atilde
atilde / atilde
Auml / Adiaeresis
auml / adiaeresis
bdquo / quotedblbase
Beta / greekBeta
beta / greekbeta
brvbar / textbrokenbar
bull / textbullet
cap / cap
Ccedil / Ccedilla
ccedil / ccedilla
cedil / textcedilla
cent / textcent
Chi / greekChi
chi / greekchi
circ / textcircumflex
clubs / clubsuit
cong / approxEq
copy / copyright
crarr / carriagereturn
cup / cup
curren / textcurrency
dagger / textdag
Dagger / textddag
darr / downarrow
dArr / Downarrow
deg / textdegree
Delta / greekDelta
delta / greekdelta
diams / blacklozenge
divide / textdiv
Eacute / Eacute
eacute / eacute
Ecirc / Ecircumflex
ecirc / ecircumflex
Egrave / Egrave
egrave / egrave
empty / emptyset
emsp / emspace
ensp / enspace
Epsilon / greekEpsilon
epsilon / greekepsilon
equiv / equiv
Eta / greekEta
eta / greeketa
ETH / Eth
eth / eth
Euml / Ediaeresis
euml / ediaeresis
exist / exists
fnof / fhook
forall / forall
frac12 / onehalf
frac14 / onequarter
frac34 / threequarter
frasl / textfraction
Gamma / greekGamma
gamma / greekgamma
ge / geq
gt / gt
harr / leftrightarrow
hArr / Leftrightarrow
hellip / textellipsis
Iacute / Iacute
iacute / iacute
Icirc / Icircumflex
icirc / icircumflex
iexcl / exclamdown
Igrave / Igrave
igrave / igrave
image / Im
infin / infty
int / intop
Iota / greekIota
iota / greekiota
iquest / questiondown
isin / in
Iuml / Idiaeresis
iuml / idiaeresis
Kappa / greekKappa
kappa / greekkappa
Lambda / greekLambda
lambda / greeklambda
lang / langle
laquo / leftguillemot
larr / leftarrow
lArr / Leftarrow
lceil / lceiling
ldquo / quotedblleft
le / leq
lfloor / lfloor
lowast / ast
loz / lozenge
lsaquo / guilsingleleft
lsquo / quoteleft
lt / lt
macr / textmacron
mdash / emdash
micro / textmu
middot / periodcentered
Mu / greekMu
mu / greekmu
nbsp / nobreakspace
ndash / endash
ne / neq
ni / ni
not / textlognot
notin / nin
nsub / nsubset
Ntilde / Ntilde
ntilde / ntilde
Nu / greekNu
nu / greeknu
Oacute / Oacute
oacute / oacute
Ocirc / Ocircumflex
ocirc / ocircumflex
OElig / OEligature
oelig / oeligature
Ograve / Ograve
ograve / ograve
Omega / greekOmega
omega / greekomega
Omicron / greekOmicron
omicron / greekomicron
oplus / oplus
or / vee
ordf / ordfeminine
ordm / ordmasculine
Oslash / Ostroke
oslash / ostroke
Otilde / Otilde
otilde / otilde
otimes / otimes
Ouml / Odiaeresis
ouml / odiaeresis
para / paragraphmark
part / partial
permil / perthousand
perp / bot
Phi / greekPhi
phi / greekphi
Pi / greekPi
pi / greekpi
piv / greekpialt
plusmn / textpm
pound / textsterling
prime / prime
Prime / doubleprime
prod / prod
prop / propto
Psi / greekPsi
psi / greekpsi
radic / surd
rang / rangle
raquo / rightguillemot
rarr / rightarrow
rArr / Rightarrow
rceil / rceiling
rdquo / quotedblright
real / Re
reg / registered
rfloor / rfloor
Rho / greekRho
rho / greekrho
rsaquo / guilsingleright
rsquo / quoteright
sbquo / quotesinglebase
Scaron / Scaron
scaron / scaron
sdot / cdot
sect / sectionmark
shy / softhyphen
Sigma / greekSigma
sigma / greeksigma
sigmaf / greekfinalsigma
sim / sim
spades / spadesuit
sub / subset
sube / subseteq
sum / sum
sup / supset
sup1 / onesuperior
sup2 / twosuperior
sup3 / threesuperior
supe / supseteq
szlig / ssharp
Tau / greekTau
tau / greektau
there4 / therefore
Theta / greekTheta
theta / greektheta
thetasym / greekthetaalt
thinsp / breakablethinspace
THORN / Thorn
thorn / thorn
tilde / texttilde
times / textmultiply
trade / trademark
Uacute / Uacute
uacute / uacute
uarr / uparrow
uArr / Uparrow
Ucirc / Ucircumflex
ucirc / ucircumflex
Ugrave / Ugrave
ugrave / ugrave
uml / textdiaeresis
Upsilon / greekUpsilon
upsilon / greekupsilon
Uuml / Udiaeresis
uuml / udiaeresis
weierp / wp
Xi / greekXi
xi / greekxi
Yacute / Yacute
yacute / yacute
yen / textyen
yuml / ydiaeresis
Yuml / Ydiaeresis
Zeta / greekZeta
zeta / greekzeta
zwj / zwj
zwnj / zwnj
答案3
这里有一张图表:ISO 字符实体及其 LATEX 等效项作者:Vidar Bronken Gundersen 和 Rune Mathisen。源材料在这里:http://www.bitjungle.com/isoent/- 包括一个具有各种格式之间映射的大型 XML 文件。
同样的人们把它变成了一个 Perl 程序,可以从这里获取:http://llg.cubic.org/docs/ent2latex.html(从这个答案到如何查找符号或识别数学符号或字符?。
这里还有另一个列表:http://www.w3.org/Math/characters/unicode.xml,并将同样的数据编译成python:https://gist.github.com/piquadrat/798549
答案4
类似这样的内容应该可以涵盖最常见的情况。
\documentclass[a4paper]{article}
\usepackage[T1]{fontenc}
\usepackage{array,booktabs}
\newcommand{\entity}[2]{#1 & \ & \verb}
\begin{document}
\begin{tabular}{l>{\ttfamily}ll}
\toprule
Symbol &\multicolumn{1}{l}{HTML entity} & \LaTeX\ command \\
\midrule
\entity{\'e}{eacute}|\'e| \\
\entity{\TH}{THORN}|\TH| \\
\bottomrule
\end{tabular}
\end{document}