我的 sed RE 有什么问题吗?找不到图案并替换

我的 sed RE 有什么问题吗?找不到图案并替换

我在 Ubuntu 系统上有一个巨大的文本文件,其中有很多行“文档”模式,后跟 25 个随机字符,即

cussion. But we cancelled. That's correct," Fasel said.
The 2021 IIHF Women's World Championships is scheduled for the Russian city of Ufa.
Document TASS0000202asd07eg370012y
Fasel said that the IIHF had cancelled all women's international tournaments this year, including the IIHF Ice Hockey Women's World Championship Division I Group A in Angers, France on April 12-18.
Document TaSS0asfd0200307eg370012y
Nevertheless, the IIHF president pointed out that there was no decision yet about the men's world championships set to open in Switzerland in May.
Document aASS000020200307eg370012y
"We are working normally with the Swiss association and everybody is thinking and hoping that we can organize the world championship in May," Fasel said when asked about new information on that tournament.
Canada reported the first coronavirus case on January 26. Up to now, 54 cases have been confirmed in the country. In late December 2019, a pneumonia outbreak caused by the COVID-19 virus (previously known as 2019-nCoV) was reported in China's city of Wuhan, an economic and industrial megacity with a population of 12 million. The World Health Organization declared the new coronavirus outbreak a public health emergency of international concern, characterizing it as an epidemic with multiple locations. Outside China, the worst affected countries are Iran, Italy and South Korea. Overall, more than 90 other countries, including Russia, have reported confirmed coronavirus cases. WHO says that new coronavirus cases outside China have passed 21,000, and there are over 400 deaths.
Document TASS0fgs20200307eg370012y

我想找到所有匹配的行并用指定的字符串替换该模式,如下所示:

sed -i 's/^Document\s{1}\w{25}\n$/MYLINEBREAK/' textfile.txt

然而,它根本不起作用。

答案1

默认情况下,sed使用 POSIX 基本正则表达式,并且不理解\sor\w或 甚至{}。它也不知道如何匹配 a,\n因为那是行尾。执行此操作的便携式方法是:

sed 's/^Document [a-zA-Z0-9-]\{25\}$/MYLINEBREAK/' file

-E使用扩展正则表达式几乎同样可移植:

sed -E 's/^Document\s[a-zA-Z0-9-]{25}$/MYLINEBREAK/' file

至少在 GNU sed(Linux 上的 GNU)上,这可以让你进一步简化到几乎一开始的样子:

sed -E 's/^Document\s\w{25}$/MYLINEBREAK/' file

为什么我的正则表达式在 X 中有效但在 Y 中无效?有关不同正则表达式风格的更多详细信息。

相关内容