背景
希望用使用宏创建的等效字符串替换文档中的字符串。例如,我想用“\Mac Anulty”替换“McAnulty”。
问题
当使用 XML 文档作为输入源时,替换字符串意味着无法刷新内部 XML 元素。结果:
代码
以下代码创建一个 XHTML 文档,将“McAnulty”替换为“\Mac Anulty”,但无法刷新 XML。
\startbuffer[main]
<html>
<p>“Mr. McAnulty, I presume?”</p>
<p>Regular text. <em>Irregular text.</em></p>
</html>
\stopbuffer
\startxmlsetups xml:xhtml
\xmlsetsetup{\xmldocument}{*}{-}
\xmlsetsetup{\xmldocument}{html|p|em}{xml:*}
\stopxmlsetups
\startxmlsetups xml:html
\startdocument
\xmlflush{#1}
\stopdocument
\stopxmlsetups
\startxmlsetups xml:p
\xmlfunction{#1}{p}
\par
\stopxmlsetups
\startxmlsetups xml:em
\dontleavehmode{\em\xmlflush{#1}}
\stopxmlsetups
\startluacode
function xml.functions.p( t )
rep = { [1] = { "McAnulty", "\\Mac Anulty" } }
x = lpeg.replacer( rep ):match( tostring( xml.text( t ) ) )
buffers.assign( "p", context( x ) )
context.getbuffer{ "p" }
end
\stopluacode
\xmlregistersetup{xml:xhtml}
\def\Mac{%
% Determine the sizes of 'M' and 'c'.
\newbox\MacMBox%
\setbox\MacMBox\hbox{M}%
\newbox\MacCBox%
\setbox\MacCBox\hbox{c}%
%
% Cheat to dynamically derive the kerning size by putting Mc in a box.
%
\newbox\MacKernBox%
\setbox\MacKernBox\hbox{\inframed[offset=\zeropoint, width=fit]{Mc}}%
\def\MacDelta{\dimexpr\wd\MacKernBox-\wd\MacMBox-\wd\MacCBox\relax}%
\def\MacUWidth{\dimexpr\wd\MacCBox-.75\MacDelta\relax}%
\def\MacRule{\vrule width \MacUWidth height .04em depth \zeropoint \relax}%
\def\MacKern{\dimexpr\wd\MacKernBox-\wd\MacMBox-\wd\MacCBox\relax}%
\def\MacHeight{\dimexpr\ht\MacMBox-\ht\MacCBox\relax}%
%
% Write Mc, where c has a macron, to the document.
%
M{%
\dontleavehmode{\raisebox{\MacHeight}\hbox{c}}%
\kern-1.04\MacUWidth
\MacRule
\kern.08\MacUWidth
}%
}%
\xmlprocessbuffer{main}{main}{}
问题
如何进行预处理(例如字符串替换)同时仍然能够将其他 XML 设置应用于 XHTML 元素?
有关的
答案1
使用\cldcontext
允许恶意用户通过滥用字符串转义来执行任意命令。不要在 Lua 字符串中扩展 TeX 参数,而是定义一个以字符串作为参数的新 TeX 命令,例如:
\startluacode
local function processmac( t )
rep = { [1] = { "McAnulty", "\\Mac Anulty" } }
context(lpeg.replacer( rep ):match( t ))
end
interfaces.implement {
name = "processmac",
arguments = { "string" },
public = true,
actions = processmac,
}
\stopluacode
\startxmlsetups xml:p
\expandafter\processmac{\xmlflush{#1}}
\par
\stopxmlsetups
\startxmlsetups xml:em
\dontleavehmode{\em\xmlflush{#1}}
\stopxmlsetups
您可以用这个 HTML 输入进行测试:
<html>
<p>“Mr. McAnulty, I presume?”</p>
<p>]] .. (os.execute("touch /tmp/filename.txt") and '') .. [[</p>
<p>Regular text. <em>Irregular text.</em></p>
</html>
查看恶意内容是否被执行。
更通用的解决方案是在 中定义替换值userdata
,然后可以为不同的文档重载这些替换值。以下是框架:
\startluacode
userdata = userdata or {}
userdata.TextReplacements = {}
local function TextReplacement( text )
context( lpeg.replacer( userdata.TextReplacements ):match( text ) )
end
interfaces.implement {
name = "TextReplacement",
arguments = { "string" },
public = true,
actions = TextReplacement,
}
\stopluacode
然后,在其他地方充实替换,例如:
\startluacode
userdata = userdata or {}
userdata.TextReplacements = {
[1] = { "McGenius", "\\Mac Genius" },
[2] = { "a.m.", "\\cap{am}" },
[3] = { "p.m.", "\\cap{pm}" },
}
\stopluacode
使用这种方法,替换文本可以根据特定文档的需要而变化。