如何删除包含在特定模式之后首先找到的模式的行

Question 1

awk如果格式与您提供的示例不不同，我会使用：

awk -F'[<>="[:blank:]]+' '
  $2 == "domain" {group = $(NF-1)}
  !(group == "group1" && $2 == "node" && $(NF-1) == "PQR")
  ' < dest.xml > new-dest.xml

删除“group1”域中的“PQR”节点。

$ diff -u dest.xml new-dest.xml
--- dest.xml    2013-02-22 07:01:48.732227421 +0000
+++ new-dest.xml        2013-02-22 07:02:16.111512820 +0000
@@ -1,6 +1,5 @@
 <domain id="1" group_name="group1">
     <node id="ABC">
-    <node id="PQR">
     <node id="XYZ">
 </domain>
 <domain id="2" group_name="group2">

如果您的意思是希望从 XML 文件中删除该节点，那么这是不可能的。您至少需要重写该节点之后的部分，以将数据向后移动尽可能多的字节。

或者，您可以用空白替换该节点，这意味着您可以只替换这些字节。

perl -ne '
  if (/<domain.*group_name="(.*?)"/) {
    $in = $1 eq "group1"
  } elsif ($in && /<node id="PQR"/) {
    s/./ /g;
    seek STDOUT,tell(STDIN)-length$_,0;
    print
  }' < dest.xml 1<> dest.xml

如果只有一个这样的节点并且您想在找到它后立即停止处理，请在上面;exit添加。print

Answer

awk如果格式与您提供的示例不不同，我会使用：

awk -F'[<>="[:blank:]]+' '
  $2 == "domain" {group = $(NF-1)}
  !(group == "group1" && $2 == "node" && $(NF-1) == "PQR")
  ' < dest.xml > new-dest.xml

删除“group1”域中的“PQR”节点。

$ diff -u dest.xml new-dest.xml
--- dest.xml    2013-02-22 07:01:48.732227421 +0000
+++ new-dest.xml        2013-02-22 07:02:16.111512820 +0000
@@ -1,6 +1,5 @@
 <domain id="1" group_name="group1">
     <node id="ABC">
-    <node id="PQR">
     <node id="XYZ">
 </domain>
 <domain id="2" group_name="group2">

如果您的意思是希望从 XML 文件中删除该节点，那么这是不可能的。您至少需要重写该节点之后的部分，以将数据向后移动尽可能多的字节。

或者，您可以用空白替换该节点，这意味着您可以只替换这些字节。

perl -ne '
  if (/<domain.*group_name="(.*?)"/) {
    $in = $1 eq "group1"
  } elsif ($in && /<node id="PQR"/) {
    s/./ /g;
    seek STDOUT,tell(STDIN)-length$_,0;
    print
  }' < dest.xml 1<> dest.xml

如果只有一个这样的节点并且您想在找到它后立即停止处理，请在上面;exit添加。print

Question 2

我写了一个快速的python脚本，不知道它是否足够简单。

将此脚本与您的dest.xml.

#!/usr/bin/python
import re
FILENAME = 'dest.xml'
GROUPNAME = 'group1'
NODEID = 'PQR'

with open(FILENAME) as f:
    in_group = False
    for line in f:
        line = line.strip()
        group_pattern = 'group_name="{0}">'.format(GROUPNAME)
        end_group_pattern = '</domain>'
        node_pattern = '<node id="{0}">'.format(NODEID)
        if re.search(group_pattern, line):
            in_group = True
        if re.search(end_group_pattern, line):
            in_group = False
        if re.search(node_pattern, line) and in_group:
            pass
        else:
            print line

现在是 awk 版本。

#!/usr/bin/awk -f
BEGIN {
    GROUPNAME = "group1"
    NODEID = "PQR"
    in_group = 0
    group_pattern =  ".*group_name=\"" GROUPNAME "\""
    end_group_pattern = "</domain>"
    node_pattern = "<node id=\"" NODEID "\">"
}
$0 ~ group_pattern {
   in_group = 1
}
$0 !~ node_pattern || in_group == 0 {
    print $0
}
$0 ~ end_group_pattern {
    in_group = 0
}

运行此 awk 脚本，参数是您的文件名dest.xml。看起来它比 python 版本更简单。

Answer

我写了一个快速的python脚本，不知道它是否足够简单。

将此脚本与您的dest.xml.

#!/usr/bin/python
import re
FILENAME = 'dest.xml'
GROUPNAME = 'group1'
NODEID = 'PQR'

with open(FILENAME) as f:
    in_group = False
    for line in f:
        line = line.strip()
        group_pattern = 'group_name="{0}">'.format(GROUPNAME)
        end_group_pattern = '</domain>'
        node_pattern = '<node id="{0}">'.format(NODEID)
        if re.search(group_pattern, line):
            in_group = True
        if re.search(end_group_pattern, line):
            in_group = False
        if re.search(node_pattern, line) and in_group:
            pass
        else:
            print line

现在是 awk 版本。

#!/usr/bin/awk -f
BEGIN {
    GROUPNAME = "group1"
    NODEID = "PQR"
    in_group = 0
    group_pattern =  ".*group_name=\"" GROUPNAME "\""
    end_group_pattern = "</domain>"
    node_pattern = "<node id=\"" NODEID "\">"
}
$0 ~ group_pattern {
   in_group = 1
}
$0 !~ node_pattern || in_group == 0 {
    print $0
}
$0 ~ end_group_pattern {
    in_group = 0
}

运行此 awk 脚本，参数是您的文件名dest.xml。看起来它比 python 版本更简单。

如何删除包含在特定模式之后首先找到的模式的行

答案1

答案2

相关内容