背景

背景

背景

希望用使用宏创建的等效字符串替换文档中的字符串。例如,我想用“\Mac Anulty”替换“McAnulty”。

问题

当使用 XML 文档作为输入源时,替换字符串意味着无法刷新内部 XML 元素。结果:

刷新输出

代码

以下代码创建一个 XHTML 文档,将“McAnulty”替换为“\Mac Anulty”,但无法刷新 XML。

\startbuffer[main]
<html>
  <p>“Mr. McAnulty, I presume?”</p>
  <p>Regular text. <em>Irregular text.</em></p>
</html>
\stopbuffer

\startxmlsetups xml:xhtml
  \xmlsetsetup{\xmldocument}{*}{-}
  \xmlsetsetup{\xmldocument}{html|p|em}{xml:*}
\stopxmlsetups

\startxmlsetups xml:html
  \startdocument
    \xmlflush{#1}
  \stopdocument
\stopxmlsetups

\startxmlsetups xml:p
  \xmlfunction{#1}{p}
  \par
\stopxmlsetups

\startxmlsetups xml:em
  \dontleavehmode{\em\xmlflush{#1}}
\stopxmlsetups

\startluacode
function xml.functions.p( t )
  rep = { [1] = { "McAnulty", "\\Mac Anulty" } }
  x = lpeg.replacer( rep ):match( tostring( xml.text( t ) ) )

  buffers.assign( "p", context( x ) )
  context.getbuffer{ "p" }
end
\stopluacode

\xmlregistersetup{xml:xhtml}

\def\Mac{%
  % Determine the sizes of 'M' and 'c'.
  \newbox\MacMBox%
  \setbox\MacMBox\hbox{M}%
  \newbox\MacCBox%
  \setbox\MacCBox\hbox{c}%
  %
  % Cheat to dynamically derive the kerning size by putting Mc in a box.
  %
  \newbox\MacKernBox%
  \setbox\MacKernBox\hbox{\inframed[offset=\zeropoint, width=fit]{Mc}}%
  \def\MacDelta{\dimexpr\wd\MacKernBox-\wd\MacMBox-\wd\MacCBox\relax}%
  \def\MacUWidth{\dimexpr\wd\MacCBox-.75\MacDelta\relax}%
  \def\MacRule{\vrule width \MacUWidth height .04em depth \zeropoint \relax}%
  \def\MacKern{\dimexpr\wd\MacKernBox-\wd\MacMBox-\wd\MacCBox\relax}%
  \def\MacHeight{\dimexpr\ht\MacMBox-\ht\MacCBox\relax}%
  %
  % Write Mc, where c has a macron, to the document.
  %
  M{%
    \dontleavehmode{\raisebox{\MacHeight}\hbox{c}}%
    \kern-1.04\MacUWidth
    \MacRule
    \kern.08\MacUWidth
  }%
}%

\xmlprocessbuffer{main}{main}{}

问题

如何进行预处理(例如字符串替换)同时仍然能够将其他 XML 设置应用于 XHTML 元素?

有关的

答案1

使用\cldcontext允许恶意用户通过滥用字符串转义来执行任意命令。不要在 Lua 字符串中扩展 TeX 参数,而是定义一个以字符串作为参数的新 TeX 命令,例如:

\startluacode
local function processmac( t )
  rep = { [1] = { "McAnulty", "\\Mac Anulty" } }
  context(lpeg.replacer( rep ):match( t ))
end

interfaces.implement {
  name      = "processmac",
  arguments = { "string" },
  public    = true,
  actions   = processmac,
}
\stopluacode

\startxmlsetups xml:p
  \expandafter\processmac{\xmlflush{#1}}
  \par
\stopxmlsetups

\startxmlsetups xml:em
  \dontleavehmode{\em\xmlflush{#1}}
\stopxmlsetups

您可以用这个 HTML 输入进行测试:

<html>
  <p>“Mr. McAnulty, I presume?”</p>
  <p>]] .. (os.execute("touch /tmp/filename.txt") and '') .. [[</p>
  <p>Regular text. <em>Irregular text.</em></p>
</html>

查看恶意内容是否被执行。


更通用的解决方案是在 中定义替换值userdata,然后可以为不同的文档重载这些替换值。以下是框架:

\startluacode
userdata = userdata or {}

userdata.TextReplacements = {}

local function TextReplacement( text )
  context( lpeg.replacer( userdata.TextReplacements ):match( text ) )
end

interfaces.implement {
  name      = "TextReplacement",
  arguments = { "string" },
  public    = true,
  actions   = TextReplacement,
}
\stopluacode

然后,在其他地方充实替换,例如:

\startluacode
userdata = userdata or {}

userdata.TextReplacements = { 
  [1] = { "McGenius", "\\Mac Genius" },
  [2] = { "a.m.", "\\cap{am}" },
  [3] = { "p.m.", "\\cap{pm}" },
}
\stopluacode

使用这种方法,替换文本可以根据特定文档的需要而变化。

相关内容