我有一个包含连接段落的文本文件。我需要用空行分隔每个段落。每个段落都应以一个>FP0
模式开头,但是由于段落相互连接,因此在当前文件的行开头找不到该模式。我尝试过一个sed
命令,但它根据包含>FP0
模式的行将它们分开,但它没有出现在新段落的开头。
段落示例
>FP004340TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT>FP00598AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA>FP005521GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
使用的 sed 代码是
sed '/>/s/^/\n/'
输出是
>FP004340TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
TTT>FP00598AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
A>FP005521GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
(在新段落开头的 >FP0 之前不需要任何字符。)
答案1
您可以使用 perl 代替:
$ perl -pe 's/>/\n\n>/g' file
>FP004340TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
>FP00598AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
>FP005521GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
但如果文件的第一个字符是>
.因此,您可以将其限制为仅替换>
前面有另一个字符的情况:
$ perl -pe 's/(.)>/$1\n\n>/g' file
>FP004340TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
>FP00598AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
>FP005521GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
或者,使用 GNU sed
:
$ sed -E 's/(.)>/\1\n\n>/g' file
>FP004340TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
>FP00598AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
>FP005521GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
并与任何sed
:
sed 's/\(.\)>/\1\
\
>/g' file
>FP004340TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
>FP00598AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
>FP005521GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
答案2
您的sed
脚本将查找其中包含 an 的任何行>
,但在该行的开头添加换行符(这就是^
正则表达式中的含义)。
大概可以试试这个:
sed 's/>/\n&/g' file
但是否\n
产生文字换行符取决于您的sed
版本。所需的行为在许多 Linux 平台上很常见,但它们并不完全相同。 (也许澄清哪个发行版和/或sed
版本,或者尝试更便携的解决方案,例如 Awk 或 Perl。)
awk -F '>' 'BEGIN { OFS="\n>" } { $1=$1 } 1' file
hack{ $1 = $1 }
迫使 awk 分割线;如果线路上没有任何变化,它会通过简单地将输入复制到输出来优化其处理,但这会导致它认为确实发生了某些变化。
如果你需要多个换行符,显然要放置多个;更改\n
为\n\n
在每个新行之前获取一个空行。
答案3
GNU sed
$ sed 's/>/\n\n&/2g' input_file
POSIXly sed
sed -e '
y/>/\n/
s/\n/>/
s//&&>/g
' input_file
$ perl -pe 's/(?<!^)(?=>)/\n\n/g' input_file
awk -v RS=">" -v ORS= '
NR>1&&sub(/^/,(!k++ ? ORS : "\n\n") RS)
' input_file