我在名为 test2.txt 的文件中有这个 xml 文本
<This is a line of text with a year=2020 month=12 in it
This line of text does not have a year or month in it
This year=2021 is the current year the current month=1
This is the year=2021 the month=2/>
<This is a line of text with a year=33020 month=12 in it
This line of text does not have a year or month in it
This year=33020 is the current year the current month=1
This is the year=33020 the month=2/>
我在文件上运行此正则表达式:我喜欢对第一段进行通信,但将文件的其余部分保留原样
sed -i -En '/./{H;$!d} ; x ; s/<(This.*2020.*)\/>/<!--\1-->/p' test2.txt
但结果是 sed 命令删除文件中字符串的所有其余部分并放入 regexp init 的结果,所以现在 test2.txt 看起来像这样:
<!--This is a line of text with a year=2020 month=12 in it
This line of text does not have a year or month in it
This year=2021 is the current year the current month=1
This is the year=2021 the month=2-->
我如何运行正则表达式但将其他文本保留在文件中?
答案1
你明确告诉 sed不是除非该行与模式匹配,否则打印。因此,只需删除运算符后的-n
和,它就会按您的预期工作:p
s///
$ sed -E '/./{H;$!d} ; x ; s/<(This.*2020.*)\/>/<!--\1-->/' file
<!--This is a line of text with a year=2020 month=12 in it
This line of text does not have a year or month in it
This year=2021 is the current year the current month=1
This is the year=2021 the month=2-->
<This is a line of text with a year=33020 month=12 in it
This line of text does not have a year or month in it
This year=33020 is the current year the current month=1
This is the year=33020 the month=2/>
然而,这仍然在开头添加了额外的换行符。幸运的是,@Philippos告诉我如何解决这个问题,所以使用这个:
$ sed -E '/./{H;1h;$!d} ; x ; s/<(This.*2020.*)\/>/<!--\1-->/' file
<!--This is a line of text with a year=2020 month=12 in it
This line of text does not have a year or month in it
This year=2021 is the current year the current month=1
This is the year=2021 the month=2-->
<This is a line of text with a year=33020 month=12 in it
This line of text does not have a year or month in it
This year=33020 is the current year the current month=1
This is the year=33020 the month=2/>
或者,编辑原始文件:
sed -i.bak -E '/./{H;1h;$!d} ; x ; s/<(This.*2020.*)\/>/<!--\1-->/' file
答案2
假设您的数据代表一些通用 XML 文档:通用 XML 节点不能以您建议的方式注释掉,因为属性可以包含 substring --
,这会过早结束注释,破坏文档的结构。直接删除节点会更安全,这对于 XML 解析器来说是微不足道的。
假设您有该文档
<?xml version="1.0"?>
<root>
<thing alt="--" year="2019" month="1" day="1"/>
<thing alt="--" year="2020" month="5" day="13"/>
<thing year="2021" month="7" day="3"/>
</root>
...并且您想要删除属性中thing
具有该值的节点,使用:2020
year
xmlstarlet
$ xmlstarlet ed -d '//thing[@year = "2020"]' file.xml
<?xml version="1.0"?>
<root>
<thing alt="--" year="2019" month="1" day="1"/>
<thing year="2021" month="7" day="3"/>
</root>
xmlstarlet
-L
通过其( ) 选项支持就地编辑--inplace
。