我之前的问题是Lua 脚本中的全局正则表达式
基于https://tex.stackexchange.com/users/2891/michal-h21回答我已经创建了这个简单的 Lua 脚本,但无法定义新功能。
我已经对以下XML
文件进行了编码。
<p>The investigations of cylindrically symmetric spacetimes can be traced back as far as to 1919 when Levi-Civita (LC) discovered a class of solutions of Einstein’s vacuum field equations, corresponding to static cylindrical spacetimes [1]. The extension of the LC spacetimes to stationary ones was obtained independently by Lanczos in 1924 [3] and Lewis in 1932 [9]. In 1925, Beck studied a class of exact solutions and interpreted them as representing the propagation of cylindrical gravitational waves (GWs) [4].</p>
<statement content-type="theorem" id="stat1"><label>Theorem 1.</label><p>Let <inline-formula><mml:math display="inline" overflow="scroll"><mml:mfenced open="(" close=")"><mml:mrow><mml:mi mathvariant="script">M</mml:mi><mml:mo>,</mml:mo><mml:mi>g</mml:mi></mml:mrow></mml:mfenced></mml:math><inline-graphic xlink:href="cqgab7bbaieqn7.gif"/></inline-formula> be a four-dimensional Riemannian spacetime obeying Einstein’s field equations, <italic>R</italic><sub><italic>μν</italic></sub> − (<italic>R</italic>/2)<italic>g</italic><sub><italic>μν</italic></sub> − Λ<italic>g</italic><sub><italic>μν</italic></sub> = ϰ<italic>T</italic><sub><italic>μν</italic></sub>. There [a,b] and [z;y] (<italic>c</italic>,<italic>d</italic>) are 134 rose in this [41] garden with <math><mi>x</mi><mo>=</mo><mn>2</mn></math> and some more text with number 1,2,3, etc. and some [45] etc.</p></statement>
<p>He is supported in part by the National Natural Science Foundation of China (NNSCF) with the Grants Nos. 11675145 and 11975203.</p>
我的Lua
脚本是:
local xml = "XML INPUT TEXT SHOULD BE HERE" --<p>The investigations of ... and 11975203.</p>
local rgx = ""
local reg = "([^%(%[%)%]0-9,:;]*)([%(%[%)%]0-9,:;]+)"
for w in string.gmatch(xml, "([^%(%[%)%]0-9,]*)([%(%[%)%]0-9,]+)") do
rgx = rgx .. reg
end
local m = {string.match(xml, rgx)}
local n = {}
for i,v in ipairs(m) do
j = i%2
if j==0 then
table.insert(n,"<rom>"..v.."</rom>")
else
table.insert(n,v)
end
end
print(table.concat(n,""))
此脚本在固定值的情况下运行良好local xml
。如何从 XML 读取全局内容?我只需要这个<statement content-type="theorem">
,不需要<p>
标签。
答案1
这不是那么简单。您不能只使用字符串模式来处理 XML 文件。您需要使用 XML 库(如)来处理它luaxml-domobject
,并且只对元素的文本内容使用模式<statement>
。
这是<statement>
示例中重新格式化的元素的样子:
<statement content-type="theorem" id="stat1">
<label>Theorem 1.</label>
<p>Let
<inline-formula><mml:math display="inline" overflow="scroll"><mml:mfenced open="(" close=")"><mml:mrow><mml:mi mathvariant="script">M</mml:mi><mml:mo>,</mml:mo><mml:mi>g</mml:mi></mml:mrow></mml:mfenced></mml:math>
<inline-graphic xlink:href="cqgab7bbaieqn7.gif"/>
</inline-formula>
be a four-dimensional Riemannian spacetime obeying Einstein’s field equations,
<italic>R</italic><sub><italic>μν</italic></sub> − (<italic>R</italic>/2)<italic>g</italic><sub><italic>μν</italic></sub> − Λ<italic>g</italic><sub><italic>μν</italic></sub> = ϰ<italic>T</italic><sub><italic>μν</italic></sub>.
There [a,b] and [z;y] (<italic>c</italic>,<italic>d</italic>) are 134 rose in this [41] garden with
<math><mi>x</mi><mo>=</mo><mn>2</mn></math>
and some more text with number 1,2,3, etc. and some [45] etc.</p>
</statement>
您会发现它实际上相当复杂。
现在,如果我理解正确的话,您想<rom>
在每个[](),;:
字符周围添加元素。 因此,您需要递归处理所有子元素,查找文本并添加<rom>
元素。
这是一个库statement-theorem.lua
。它导出一个接受 DOM 对象并处理元素的函数statement
:
local special_pattern = "[%(%[%)%]0-9%,%:%;.]+"
local function split_text(child, newchildren)
local text = child:get_text()
local parent = child:get_parent()
--
local function make_text_node(text)
if text ~= "" then
table.insert(newchildren, parent:create_text_node(text))
end
end
local function make_rom(text)
-- make <rom> element
local rom = parent:create_element("rom")
rom:add_child_node(rom:create_text_node(text))
table.insert(newchildren, rom)
end
local start = 0
local length = 0
local prev = 0
local function read_next()
-- loop over text and find special characters
start, stop = text:find(special_pattern, prev)
if start then
-- part of text between special characers
local normal = text:sub(prev, start - 1)
local special = text:sub(start, stop)
make_text_node(normal)
make_rom(special)
prev = stop + 1
return true
else
-- process text after the last special character
make_text_node(text:sub(prev, text:len()))
return false
end
end
while read_next() do
end
end
local function add_roman(element)
-- process all child elements of statement, find text content and add <rom>
-- elements to numbers and braces
local newchildren = {}
for _, child in ipairs(element:get_children()) do
if child:is_text() then
local text = child:get_text()
-- detect if text contains special characters
if text:match(special_pattern) then
-- process only text that contain special characters
split_text(child, newchildren)
else
table.insert(newchildren, child)
end
else
if child:is_element() then
-- recursivelly process child elements, but ignore mathml
if not child:get_element_name():match(":?math$") then
add_roman(child)
end
end
table.insert(newchildren, child)
end
end
element._children = newchildren
end
local function process_theorems(dom)
-- we want to process all <statement> elements
for _, statement in ipairs(dom:query_selector "statement[content-type='theorem']") do
add_roman(statement)
end
end
-- return the processing function
return process_theorems
我预计您不想处理 MathML,所以它不会处理<math>
元素。
它可以通过如下脚本使用:
kpse.set_program_name "luatex"
-- require LuaXML DOM library and load XML file from the standard input
local domobject = require "luaxml-domobject"
local process_theorems = require "statement-theorem"
local input = io.read("*all")
local dom = domobject.parse(input)
process_theorems(dom)
print(dom:serialize())
它可以像这样使用:
texlua addrom.lua < sample.xml
请注意,您必须在 XML 文件中使用根元素,因此我添加了一个虚拟<root>
元素来使其工作。以下是生成的 XML:
<root>
<p>The investigations of cylindrically symmetric spacetimes can be traced back as far as to 1919 when Levi-Civita (LC) discovered a class of solutions of Einstein’s vacuum field equations, corresponding to static cylindrical spacetimes [1]. The extension of the LC spacetimes to stationary ones was obtained independently by Lanczos in 1924 [3] and Lewis in 1932 [9]. In 1925, Beck studied a class of exact solutions and interpreted them as representing the propagation of cylindrical gravitational waves (GWs) [4].</p>
<statement id='stat1' content-type='theorem'>
<label>Theorem <rom>1.</rom></label>
<p>Let <inline-formula><mml:math display='inline' overflow='scroll'><mml:mfenced close=')' open='('><mml:mrow><mml:mi mathvariant='script'>M</mml:mi><mml:mo>,</mml:mo><mml:mi>g</mml:mi></mml:mrow></mml:mfenced></mml:math><inline-graphic xlink:href='cqgab7bbaieqn7.gif'></inline-graphic></inline-formula>
be a four-dimensional Riemannian spacetime obeying Einstein’s field equations<rom>,</rom>
<italic>R</italic><sub><italic>μν</italic></sub> −
<rom>(</rom><italic>R</italic>/<rom>2)</rom><italic>g</italic><sub><italic>μν</italic></sub> − Λ<italic>g</italic><sub><italic>μν</italic></sub> = ϰ<italic>T</italic><sub><italic>μν</italic></sub><rom>.</rom>
There <rom>[</rom>a<rom>,</rom>b<rom>]</rom> and <rom>[</rom>z<rom>;</rom>y<rom>]</rom> <rom>(</rom><italic>c</italic><rom>,</rom><italic>d</italic><rom>)</rom> are <rom>134</rom> rose in this <rom>[41]</rom> garden with <math><mi>x</mi><mo>=</mo><mn>2</mn></math> and some more text with number <rom>1,2,3,</rom> etc<rom>.</rom> and some <rom>[45]</rom> etc<rom>.</rom></p></statement>
<p>He is supported in part by the National Natural Science Foundation of China (NNSCF) with the Grants Nos. 11675145 and 11975203.</p>
</root>