我正在处理一个项目的一组报告,这些报告都是 txt 文件。但是这些文本文件在实际报告的前后包含几行不需要的文本。每个文本文件的必需部分都以字符串“报告开始于”开头,以“报告结束于”结尾。我需要同时删除所有 txt 文件中“报告开始于”之前的所有文本和“报告结束于”之后的所有文本。我尝试使用 .*(?=报告开始于) 删除“报告开始于”之前的所有文本,但它只删除了同一行中的文本。我不是技术人员,并且不擅长正则表达式。有人可以指导我吗?
示例文本文件:
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
In eget semper eros. Fusce efficitur elit quis vestibulum pretium.
Curabitur tristique commodo dui sed molestie.
***Start of Report on -------***
Vivamus porttitor dolor felis, at varius dolor placerat vehicula. Donec non dictum nulla. Maecenas vitae dolor quis ligula scelerisque accumsan. Vestibulum vehicula dolor dolor, id porta orci maximus a.
Aenean finibus enim in magna tristique bibendum. Suspendisse eleifend purus nibh, eget tincidunt est venenatis vitae. Morbi venenatis massa at lectus tincidunt, eget faucibus neque sollicitudin.
Morbi feugiat erat eros, fringilla convallis nulla euismod in. Fusce consectetur dapibus libero, nec vestibulum est feugiat a. Vivamus nec commodo purus, sit amet egestas nunc. Nulla ac ipsum nec risus facilisis sollicitudin.
***End of report on ---------***
Sed euismod tristique nunc non suscipit. Nullam blandit justo sed erat placerat fringilla. Etiam felis nunc, aliquam sit amet fermentum quis, pellentesque ac nisi.
预期结果:
***Start of Report on -------***
Vivamus porttitor dolor felis, at varius dolor placerat vehicula. Donec non dictum nulla. Maecenas vitae dolor quis ligula scelerisque accumsan. Vestibulum vehicula dolor dolor, id porta orci maximus a.
Aenean finibus enim in magna tristique bibendum. Suspendisse eleifend purus nibh, eget tincidunt est venenatis vitae. Morbi venenatis massa at lectus tincidunt, eget faucibus neque sollicitudin.
Morbi feugiat erat eros, fringilla convallis nulla euismod in. Fusce consectetur dapibus libero, nec vestibulum est feugiat a. Vivamus nec commodo purus, sit amet egestas nunc. Nulla ac ipsum nec risus facilisis sollicitudin.
***End of report on ---------***
答案1
- Ctrl+H
- 找什么:
[\s\S]+(Start of report[\s\S]+?End of report on.*$)[\s\S]+
- 用。。。来代替:
$1
- 取消选中 相符
- 查看 环绕
- 查看 正则表达式
- 取消选中
. matches newline
- Replace all
解释:
[\s\S]+ # 1 or more any character
( # start group 1
Start of report # literally
[\s\S]+ # 1 or more any character
End of report on # literally
.* # 0 or more any character but newline
$ # end of line
) # end group 1
[\s\S]+ # 1 or more any character
替代品:
$1 # content of group 1 (i.e. the text to keep)
截图(之前):
截图(之后):