sed 段落标签

sed 段落标签

如何使用在每个段落{p}之前和之后使用段落标签以纯文本形式包装段落?每个段落由空行分隔。我可以用来查找文本文件中的每个空白行,但这总是会在任何地方插入 {p} 并且我不太明白如何改变它们。另外,最后一段之后没有空行,因此它不会对最后一段执行任何操作。{/p}sedsed -e 's/^\s*$/<r>/ somefile.txt

输入文本:

Section 5. General Information About Project Gutenberg-tm electronic
works.

Description

Professor Michael S. Hart is the originator of the Project Gutenberg-tm
concept of a library of electronic works that could be freely shared
with anyone.

Project Gutenberg-tm eBooks are often created from several printed
editions, all of which are confirmed as Public Domain in the U.S. unless
a copyright notice is included.

所需输出:

Section 5. General Information About Project Gutenberg-tm electronic
works.
{p}
Description
{/p}
{p}
Professor Michael S. Hart is the originator of the Project Gutenberg-tm
concept of a library of electronic works that could be freely shared
with anyone.
{/p}
{p}
Project Gutenberg-tm eBooks are often created from several printed
editions, all of which are confirmed as Public Domain in the U.S. unless
a copyright notice is included.
{/p}

答案1

正如您最初要求的sed解决方案,我附加一个:

sed '/./{H;1h;$! d}
g;/{p}$/d
s#^{p}.*#&\n{/p}#;p
s/.*/{p}/;h;d' somefile.txt

解释

  • 第 1 行:将非空行附加到保持缓冲区(复制而不是附加第一行以避免以换行符开头)。继续处理空行或文件末尾。
  • 第 2 行:忽略没有文本的缓冲区,以处理多个空行或缓冲区末尾的空行
  • 第 3 行:如果有开始标记,请添加结束标记。然后打印。
  • 第 4 行:用新的开始标记填充保持缓冲区。

答案2

我会建议awk方法:

awk 'NR>1 && NF{$0="{p}" RS $0 RS "{/p}"}1' file

输出:

Section 5. General Information About Project Gutenberg-tm electronic works.

{p}
Description
{/p}

{p}
Professor Michael S. Hart is the originator of the Project Gutenberg-tm concept of a library of electronic works that could be freely shared with anyone. For thirty years, he produced and distributed Project Gutenberg-tm eBooks with only a loose network of volunteer support.
{/p}

{p}
Project Gutenberg-tm eBooks are often created from several printed editions, all of which are confirmed as Public Domain in the U.S. unless a copyright notice is included. Thus, we do not necessarily keep eBooks in compliance with any particular paper edition.
{/p}

RS- awk的记录分隔符,默认为换行符\n

NR>1- 跳过第一个标头线

NF- 指向该行的字段总数(考虑非空行)

相关内容