使用linux分隔段落

使用linux分隔段落

我有一个包含连接段落的文本文件。我需要用空行分隔每个段落。每个段落都应以一个>FP0模式开头,但是由于段落相互连接,因此在当前文件的行开头找不到该模式。我尝试过一个sed命令,但它根据包含>FP0模式的行将它们分开,但它没有出现在新段落的开头。

段落示例

>FP004340TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT>FP00598AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA>FP005521GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG

使用的 sed 代码是

sed '/>/s/^/\n/'

输出是

>FP004340TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT

TTT>FP00598AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

A>FP005521GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG

(在新段落开头的 >FP0 之前不需要任何字符。)

答案1

您可以使用 perl 代替:

$ perl -pe 's/>/\n\n>/g' file


>FP004340TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT

>FP00598AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

>FP005521GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG

但如果文件的第一个字符是>.因此,您可以将其限制为仅替换>前面有另一个字符的情况:

$ perl -pe 's/(.)>/$1\n\n>/g' file
>FP004340TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT

>FP00598AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

>FP005521GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG

或者,使用 GNU sed

$ sed -E 's/(.)>/\1\n\n>/g' file
>FP004340TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT

>FP00598AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

>FP005521GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG

并与任何sed

sed 's/\(.\)>/\1\
\
>/g' file
>FP004340TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT

>FP00598AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

>FP005521GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG

答案2

您的sed脚本将查找其中包含 an 的任何行>,但在该行的开头添加换行符(这就是^正则表达式中的含义)。

大概可以试试这个:

sed 's/>/\n&/g' file

但是否\n产生文字换行符取决于您的sed版本。所需的行为在许多 Linux 平台上很常见,但它们并不完全相同。 (也许澄清哪个发行版和/或sed版本,或者尝试更便携的解决方案,例如 Awk 或 Perl。)

awk -F '>' 'BEGIN { OFS="\n>" } { $1=$1 } 1' file

hack{ $1 = $1 }迫使 awk 分割线;如果线路上没有任何变化,它会通过简单地将输入复制到输出来优化其处理,但这会导致它认为确实发生了某些变化。

如果你需要多个换行符,显然要放置多个;更改\n\n\n在每个新行之前获取一个空行。

答案3

GNU sed

$ sed 's/>/\n\n&/2g' input_file

POSIXly sed

sed -e '
  y/>/\n/
  s/\n/>/
  s//&&>/g
' input_file

$ perl -pe 's/(?<!^)(?=>)/\n\n/g' input_file
awk -v RS=">" -v ORS= '
NR>1&&sub(/^/,(!k++ ? ORS : "\n\n") RS)
' input_file

相关内容