如何使用在每个段落{p}
之前和之后使用段落标签以纯文本形式包装段落?每个段落由空行分隔。我可以用来查找文本文件中的每个空白行,但这总是会在任何地方插入 {p} 并且我不太明白如何改变它们。另外,最后一段之后没有空行,因此它不会对最后一段执行任何操作。{/p}
sed
sed -e 's/^\s*$/<r>/ somefile.txt
输入文本:
Section 5. General Information About Project Gutenberg-tm electronic
works.
Description
Professor Michael S. Hart is the originator of the Project Gutenberg-tm
concept of a library of electronic works that could be freely shared
with anyone.
Project Gutenberg-tm eBooks are often created from several printed
editions, all of which are confirmed as Public Domain in the U.S. unless
a copyright notice is included.
所需输出:
Section 5. General Information About Project Gutenberg-tm electronic
works.
{p}
Description
{/p}
{p}
Professor Michael S. Hart is the originator of the Project Gutenberg-tm
concept of a library of electronic works that could be freely shared
with anyone.
{/p}
{p}
Project Gutenberg-tm eBooks are often created from several printed
editions, all of which are confirmed as Public Domain in the U.S. unless
a copyright notice is included.
{/p}
答案1
正如您最初要求的sed
解决方案,我附加一个:
sed '/./{H;1h;$! d}
g;/{p}$/d
s#^{p}.*#&\n{/p}#;p
s/.*/{p}/;h;d' somefile.txt
解释
- 第 1 行:将非空行附加到保持缓冲区(复制而不是附加第一行以避免以换行符开头)。继续处理空行或文件末尾。
- 第 2 行:忽略没有文本的缓冲区,以处理多个空行或缓冲区末尾的空行
- 第 3 行:如果有开始标记,请添加结束标记。然后打印。
- 第 4 行:用新的开始标记填充保持缓冲区。
答案2
我会建议awk方法:
awk 'NR>1 && NF{$0="{p}" RS $0 RS "{/p}"}1' file
输出:
Section 5. General Information About Project Gutenberg-tm electronic works.
{p}
Description
{/p}
{p}
Professor Michael S. Hart is the originator of the Project Gutenberg-tm concept of a library of electronic works that could be freely shared with anyone. For thirty years, he produced and distributed Project Gutenberg-tm eBooks with only a loose network of volunteer support.
{/p}
{p}
Project Gutenberg-tm eBooks are often created from several printed editions, all of which are confirmed as Public Domain in the U.S. unless a copyright notice is included. Thus, we do not necessarily keep eBooks in compliance with any particular paper edition.
{/p}
RS
- awk的记录分隔符,默认为换行符\n
NR>1
- 跳过第一个标头线
NF
- 指向该行的字段总数(考虑非空行)