我有很多成绩单文本文件。我已经清理到一定程度了。最后一点清洁如下。
我在某些文件 *.txt 中有这个
Gary: I said something.
Larry: I said something else.
Mr. John: I said this. And maybe this
and I also said this.
Laura: did i say anything.
我需要这样的。
Gary: I said something.
Larry: I said something else.
Mr. John: I said this. And maybe this and I also said this.
Laura: did i say anything.
我想移动任何一行不是前一行包含冒号 (:)。最后,我希望每一行都有一个以换行符结尾的角色对话。
我看了这个问题但我不知道该怎么办。我对任何工具 sed/awk/python/bash/perl 持开放态度。
答案1
使用 Sed,您可以将一行附加到模式空间,检查附加部分(从添加的换行符到模式末尾)是否仅包含非冒号字符,如果是,则用空格替换最后一个换行符:
sed -e :a -e '$!N; s/\n\([^:]*\)$/ \1/;ta' -e 'P;D' file.txt
Gary: I said something.
Larry: I said something else.
Mr. John: I said this. And maybe this and I also said this.
Laura: did i say anything.
答案2
怎么样awk
?它保留最后一行的副本;如果没有找到冒号(NF == 1),它将实际行附加到最后一行以一次性打印这两行。 $0 设置为空字符串,因此不会被记住。
awk -F: 'NF == 1 {LAST = LAST " " $0; $0 = ""}; LAST {print LAST}; {LAST = $0} END {print LAST}' file
Gary: I said something.
Larry: I said something else.
Mr. John: I said this. And maybe this and I also said this.
Laura: did i say anything.
答案3
另一种awk
尝试:
BEGIN{RS=":";ORS=":"; # use ":", ie. change of speaker, to recognise end of record
FS="\n"} # OFS is still " ", so newlines in input converted to spaces in output
!$NF { ORS="" } # detect last line (no next speaker) and don't append a :
NF>1 {$NF = "\n" $NF} # restore the newline before the speaker's name
{print} # print the result
答案4
sed -e '
/:/{$!N;}
/\n.*:/!s/\n/ /
P;D
' file.txt
Gary: I said something.
Larry: I said something else.
Mr. John: I said this. And maybe this and I also said this.
Laura: did i say anything.