我有一个很大的 xml 文件,如果子元素不包含嵌套子元素内容的正确开头,我需要删除一些子元素。
我的 xml 文件如下所示:
<product>
<catalogEntry>
<idPath><![CDATA[K212/G425638/G425649/G426239/G426265/G601769]]></idPath>
<namePath><![CDATA[Web Katalog DK/Solar Plus/Solar Plus EL/Afsnit 12 - Kommunikations- & sikringsmateriel/Racks/Vægracks]]></namePath>
<ImagePath><![CDATA[K212-{\pics\_catalogmanager\sz2\ikon_solarplus.jpg}{\pics\_catmandk_kampagner\sz2\ikon solar plus_el.jpg}{\pics\_catmandk_solar plus\sz2\solarplusel_afs.13.jpg}{\pics\cubic cabinet\sz2\5709832021591p.jpg}{\pics\mass creation\sz2\0000101760-6he2060020med20plade20a.jpg}]]></ImagePath>
</catalogEntry>
<catalogEntry>
<idPath><![CDATA[K352/G600248/G600247]]></idPath>
<namePath><![CDATA[Solar plus mini guide/Rack og tilbehør/Vægrack]]></namePath>
<ImagePath><![CDATA[K352-{}{}]]></ImagePath>
</catalogEntry>
<catalogEntry>
<idPath><![CDATA[K212/G425642/G444580/G444590/G444598]]></idPath>
<namePath><![CDATA[Web Katalog DK/Kommunikation/Rack, tilbehør, kabel management/Vægrack/Solar Plus Vægrack]]></namePath>
<ImagePath><![CDATA[K212-{\pics\_catalogmanager\sz2\ikon_kommunikation.jpg}{\pics\_catalogmanager\sz2\kommunikation_rack-skabe_.jpg}{\pics\lk dataconnect\sz2\5703302138918p.jpg}{\pics\mass creation\sz2\0000101760-6he2060020med20plade20a.jpg}]]></ImagePath>
</catalogEntry>
<catalogEntry>
<idPath><![CDATA[K193/G389888/G395066/G585958/G586999/G600567]]></idPath>
<namePath><![CDATA[PRODUCTS NOT VISIBLE IN WEB KATALOG DK/Grp7 - Kabel § Føringsveje § Data/157R - Rune Agersnap/Kampagnemails/Afsluttede kampagner/Nye Solar plus vægrack - Gældende til op med d. 05.05.19]]></namePath>
<ImagePath><![CDATA[K193-{}{}{}{}{\pics\mass creation\sz2\0000101760-10he2050020med20plade20fri.jpg}]]></ImagePath>
</catalogEntry>
<catalogEntry>
<idPath><![CDATA[K212/G425639/G426577/G426699/G426927/G426940/G600572]]></idPath>
<namePath><![CDATA[Web Katalog DK/EL/(10.00 - 29.99) Stærkstrømsmateriel/12.00 Kapslings- og tavlemateriel/12.30 Rack-skabe inkl. tilbehør/Vægrack/Solar plus vægracks]]></namePath>
<ImagePath><![CDATA[K212-{\pics\_catalogmanager\sz2\ikon_el.jpg}{\pics\_catalogmanager\sz2\10.00_29.99.jpg}{\pics\_catalogmanager\sz2\12.00.jpg}{\pics\cubic cabinet\sz2\5709832045535p.jpg}{\pics\cubic cabinet\sz2\5709832045399p.jpg}{\pics\mass creation\sz2\0000101760-6he2060020med20plade20a.jpg}]]></ImagePath>
</catalogEntry>
我只需要保留元素包含我需要删除的<![CDATA[K212
其他元素的元素<catelogEntry>
我尝试在查找和替换中对此语句进行一些修改
<catalogEntry>(?:(?!</catalogEntry>.)+[^K212](?:(?!<catalogEntry>).)+</catalogEntry>\R
但我得到的是无效的表达。
答案1
- Ctrl+H
- 找什么:
<(catalogEntry)>(?:(?!\1)(?!\[K212).)+</\1>\R?
- 用。。。来代替:
LEAVE EMPTY
- 查看 相符
- 查看 环绕
- 查看 正则表达式
- 查看
. matches newline
- Replace all
解释:
<(catalogEntry)> # open tag and capture tag name in group 1
# Tempered Greedy Token
(?: # non capture group
(?!\1) # negative lookahead, make sure we haven't catalogEntry after
(?!\[K212) # negative lookahead, make sure we haven't [K212 after
. # any character
)+ # end group, must appear 1 or more times
</\1> # close tag
\R? # optional linebreak
截图(之前):
截图(之后):