对于冒号分隔的文件中第二个字段跨越多行的行,替换末尾的新行

对于冒号分隔的文件中第二个字段跨越多行的行,替换末尾的新行

我有多个具有不同内容集的文件,但不同文件中多行的模式是相同的。

示例输入文件:cat -n test.txt

     1  adjfkhjdhfkjd: dfkjfkdljgklfjgklfjgkfjghk
     2  rtueitjrkldngfmdn: kldfjkdjgkdjgkldjgkfjhklfjkkjhkgjhklfk
     3  dkljfkdljgkldg: fgdkjgkfdljglkfdjgkldfjgkljkldfjgkljgkflj
     4  a12kjsdk232kjk445kjkjlk34323lkjkjlk3422
     5  98094kjhjkh23434hjhk32453242hkjhkjhkj
     6  bncvmcnbxmbvcnmxbvnxbcnxbnxcmbvnxc: xckdfjgklfdjgklfdjglkfdjgio
     7  iourtiourtioreutiorutoir: i3948j35hj4309457480jkh5jk450
     8  5ui43u3213435
     9  5io4p3i54op3i5op34i5po34i5
    10  54390859043860943853kj5h34jkh534jk543
    11  jkljkljkjkjkjklj: hgjjjjjjjjjjjjj
    12  4i34935kjtkrelthrket: 4539859435943hkjhkjhk34543ll4
    13  jg;lfdjglkfjlkfjghlkfd
    14  f;dgljdfl;hj;df
    15  fglkdjlkjgkldfjgklfdjhklfhjdflkj

预期输出:

adjfkhjdhfkjd: dfkjfkdljgklfjgklfjgkfjghk
rtueitjrkldngfmdn: kldfjkdjgkdjgkldjgkfjhklfjkkjhkgjhklfk
dkljfkdljgkldg: fgdkjgkfdljglkfdjgkldfjgkljkldfjgkljgkflj a12kjsdk232kjk445kjkjlk34323lkjkjlk3422 98094kjhjkh23434hjhk32453242hkjhkjhkj
bncvmcnbxmbvcnmxbvnxbcnxbnxcmbvnxc: xckdfjgklfdjgklfdjglkfdjgio
iourtiourtioreutiorutoir: i3948j35hj4309457480jkh5jk450 5ui43u3213435 5io4p3i54op3i5op34i5po34i5 54390859043860943853kj5h34jkh534jk543
jkljkljkjkjkjklj: hgjjjjjjjjjjjjj
4i34935kjtkrelthrket: 4539859435943hkjhkjhk34543ll4 lfdjglkfjlkfjghlkfd f;dgljdfl;hj;df fglkdjlkjgkldfjgklfdjhklfhjdflkj

答案1

您可以使用众所周知的 sed 配方的变体:

sed -e :a -e '$!N;/\n.*:/!s/\n/ /;ta' -e 'P;D' test.txt

即如果确实存在,则在前一行后面追加一行(通过用空格替换换行符)不是包含一个冒号。所以给出

$ cat -n test.txt
     1  adjfkhjdhfkjd: dfkjfkdljgklfjgklfjgkfjghk
     2  rtueitjrkldngfmdn: kldfjkdjgkdjgkldjgkfjhklfjkkjhkgjhklfk
     3  dkljfkdljgkldg: fgdkjgkfdljglkfdjgkldfjgkljkldfjgkljgkflj
     4  a12kjsdk232kjk445kjkjlk34323lkjkjlk3422
     5  98094kjhjkh23434hjhk32453242hkjhkjhkj
     6  bncvmcnbxmbvcnmxbvnxbcnxbnxcmbvnxc: xckdfjgklfdjgklfdjglkfdjgio
     7  iourtiourtioreutiorutoir: i3948j35hj4309457480jkh5jk450
     8  5ui43u3213435
     9  5io4p3i54op3i5op34i5po34i5
    10  54390859043860943853kj5h34jkh534jk543
    11  jkljkljkjkjkjklj: hgjjjjjjjjjjjjj
    12  4i34935kjtkrelthrket: 4539859435943hkjhkjhk34543ll4
    13  jg;lfdjglkfjlkfjghlkfd
    14  f;dgljdfl;hj;df
    15  fglkdjlkjgkldfjgklfdjhklfhjdflkj

然后

$ sed -e :a -e '$!N;/\n.*:/!s/\n/ /;ta' -e 'P;D' test.txt | cat -n
     1  adjfkhjdhfkjd: dfkjfkdljgklfjgklfjgkfjghk
     2  rtueitjrkldngfmdn: kldfjkdjgkdjgkldjgkfjhklfjkkjhkgjhklfk
     3  dkljfkdljgkldg: fgdkjgkfdljglkfdjgkldfjgkljkldfjgkljgkflj a12kjsdk232kjk445kjkjlk34323lkjkjlk3422 98094kjhjkh23434hjhk32453242hkjhkjhkj
     4  bncvmcnbxmbvcnmxbvnxbcnxbnxcmbvnxc: xckdfjgklfdjgklfdjglkfdjgio
     5  iourtiourtioreutiorutoir: i3948j35hj4309457480jkh5jk450 5ui43u3213435 5io4p3i54op3i5op34i5po34i5 54390859043860943853kj5h34jkh534jk543
     6  jkljkljkjkjkjklj: hgjjjjjjjjjjjjj
     7  4i34935kjtkrelthrket: 4539859435943hkjhkjhk34543ll4 jg;lfdjglkfjlkfjghlkfd f;dgljdfl;hj;df fglkdjlkjgkldfjgklfdjhklfhjdflkj

答案2

使用任何 awk:

$ awk '/:/{if (NR>1) print r; r=$0; next} {r=r OFS $0} END{print r}' file
adjfkhjdhfkjd: dfkjfkdljgklfjgklfjgkfjghk
rtueitjrkldngfmdn: kldfjkdjgkdjgkldjgkfjhklfjkkjhkgjhklfk
dkljfkdljgkldg: fgdkjgkfdljglkfdjgkldfjgkljkldfjgkljgkflj a12kjsdk232kjk445kjkjlk34323lkjkjlk3422 98094kjhjkh23434hjhk32453242hkjhkjhkj
bncvmcnbxmbvcnmxbvnxbcnxbnxcmbvnxc: xckdfjgklfdjgklfdjglkfdjgio
iourtiourtioreutiorutoir: i3948j35hj4309457480jkh5jk450 5ui43u3213435 5io4p3i54op3i5op34i5po34i5 54390859043860943853kj5h34jkh534jk543
jkljkljkjkjkjklj: hgjjjjjjjjjjjjj
4i34935kjtkrelthrket: 4539859435943hkjhkjhk34543ll4 jg;lfdjglkfjlkfjghlkfd f;dgljdfl;hj;df fglkdjlkjgkldfjgklfdjhklfhjdflkj

答案3

使用awk

$ awk '{printf "%s", (($1 ~ /:$/) ? ((NR==1) ? "" : ORS) : " ")$0 } END{print ""}' file
# Or
$ awk -v ORS=' ' '{NR!=1 && gsub(/[[:alnum:]]+:/, "\n&")}1; END{printf "\n"}' file | sed 's/ $//'

正如@EdMorton 所建议的:

始终执行printf "%s", $n而不是 ,printf $n因为当输入包含 printf 格式化字符(例如 , )时,后者会失败,%s并且比前者使用 ORS 具有的任何值而后者使用您希望 ORS 具有的硬编码值print ""更好。printf "\n"

相关内容