sed 命令成功在文件中查找并替换,位擦除新文件中的所有内容

sed 命令成功在文件中查找并替换,位擦除新文件中的所有内容

我在名为 test2.txt 的文件中有这个 xml 文本

<This is a line of text with a year=2020 month=12 in it
This line of text does not have a year or month in it
This year=2021 is the current year the current month=1
This is the year=2021 the month=2/>


<This is a line of text with a year=33020 month=12 in it
This line of text does not have a year or month in it
This year=33020 is the current year the current month=1
This is the year=33020 the month=2/>

我在文件上运行此正则表达式:我喜欢对第一段进行通信,但将文件的其余部分保留原样

sed -i -En '/./{H;$!d} ; x ; s/<(This.*2020.*)\/>/<!--\1-->/p' test2.txt

但结果是 sed 命令删除文件中字符串的所有其余部分并放入 regexp init 的结果,所以现在 test2.txt 看起来像这样:

<!--This is a line of text with a year=2020 month=12 in it
This line of text does not have a year or month in it
This year=2021 is the current year the current month=1
This is the year=2021 the month=2-->

我如何运行正则表达式但将其他文本保留在文件中?

答案1

你明确告诉 sed不是除非该行与模式匹配,否则打印。因此,只需删除运算符后的-n和,它就会按您的预期工作:ps///

$ sed  -E '/./{H;$!d} ; x ; s/<(This.*2020.*)\/>/<!--\1-->/'  file

<!--This is a line of text with a year=2020 month=12 in it
This line of text does not have a year or month in it
This year=2021 is the current year the current month=1
This is the year=2021 the month=2-->


<This is a line of text with a year=33020 month=12 in it
This line of text does not have a year or month in it
This year=33020 is the current year the current month=1
This is the year=33020 the month=2/>

然而,这仍然在开头添加了额外的换行符。幸运的是,@Philippos告诉我如何解决这个问题,所以使用这个:

$ sed -E '/./{H;1h;$!d} ; x ; s/<(This.*2020.*)\/>/<!--\1-->/'  file
<!--This is a line of text with a year=2020 month=12 in it
This line of text does not have a year or month in it
This year=2021 is the current year the current month=1
This is the year=2021 the month=2-->


<This is a line of text with a year=33020 month=12 in it
This line of text does not have a year or month in it
This year=33020 is the current year the current month=1
This is the year=33020 the month=2/>

或者,编辑原始文件:

sed -i.bak -E '/./{H;1h;$!d} ; x ; s/<(This.*2020.*)\/>/<!--\1-->/'  file

答案2

假设您的数据代表一些通用 XML 文档:通用 XML 节点不能以您建议的方式注释掉,因为属性可以包含 substring --,这会过早结束注释,破坏文档的结构。直接删除节点会更安全,这对于 XML 解析器来说是微不足道的。

假设您有该文档

<?xml version="1.0"?>
<root>
  <thing alt="--" year="2019" month="1" day="1"/>
  <thing alt="--" year="2020" month="5" day="13"/>
  <thing year="2021" month="7" day="3"/>
</root>

...并且您想要删除属性中thing具有该值的节点,使用:2020yearxmlstarlet

$ xmlstarlet ed -d '//thing[@year = "2020"]' file.xml
<?xml version="1.0"?>
<root>
  <thing alt="--" year="2019" month="1" day="1"/>
  <thing year="2021" month="7" day="3"/>
</root>

xmlstarlet-L通过其( ) 选项支持就地编辑--inplace

相关内容