我有多个具有不同内容集的文件,但不同文件中多行的模式是相同的。
示例输入文件:cat -n test.txt
1 adjfkhjdhfkjd: dfkjfkdljgklfjgklfjgkfjghk
2 rtueitjrkldngfmdn: kldfjkdjgkdjgkldjgkfjhklfjkkjhkgjhklfk
3 dkljfkdljgkldg: fgdkjgkfdljglkfdjgkldfjgkljkldfjgkljgkflj
4 a12kjsdk232kjk445kjkjlk34323lkjkjlk3422
5 98094kjhjkh23434hjhk32453242hkjhkjhkj
6 bncvmcnbxmbvcnmxbvnxbcnxbnxcmbvnxc: xckdfjgklfdjgklfdjglkfdjgio
7 iourtiourtioreutiorutoir: i3948j35hj4309457480jkh5jk450
8 5ui43u3213435
9 5io4p3i54op3i5op34i5po34i5
10 54390859043860943853kj5h34jkh534jk543
11 jkljkljkjkjkjklj: hgjjjjjjjjjjjjj
12 4i34935kjtkrelthrket: 4539859435943hkjhkjhk34543ll4
13 jg;lfdjglkfjlkfjghlkfd
14 f;dgljdfl;hj;df
15 fglkdjlkjgkldfjgklfdjhklfhjdflkj
预期输出:
adjfkhjdhfkjd: dfkjfkdljgklfjgklfjgkfjghk
rtueitjrkldngfmdn: kldfjkdjgkdjgkldjgkfjhklfjkkjhkgjhklfk
dkljfkdljgkldg: fgdkjgkfdljglkfdjgkldfjgkljkldfjgkljgkflj a12kjsdk232kjk445kjkjlk34323lkjkjlk3422 98094kjhjkh23434hjhk32453242hkjhkjhkj
bncvmcnbxmbvcnmxbvnxbcnxbnxcmbvnxc: xckdfjgklfdjgklfdjglkfdjgio
iourtiourtioreutiorutoir: i3948j35hj4309457480jkh5jk450 5ui43u3213435 5io4p3i54op3i5op34i5po34i5 54390859043860943853kj5h34jkh534jk543
jkljkljkjkjkjklj: hgjjjjjjjjjjjjj
4i34935kjtkrelthrket: 4539859435943hkjhkjhk34543ll4 lfdjglkfjlkfjghlkfd f;dgljdfl;hj;df fglkdjlkjgkldfjgklfdjhklfhjdflkj
答案1
您可以使用众所周知的 sed 配方的变体:
sed -e :a -e '$!N;/\n.*:/!s/\n/ /;ta' -e 'P;D' test.txt
即如果确实存在,则在前一行后面追加一行(通过用空格替换换行符)不是包含一个冒号。所以给出
$ cat -n test.txt
1 adjfkhjdhfkjd: dfkjfkdljgklfjgklfjgkfjghk
2 rtueitjrkldngfmdn: kldfjkdjgkdjgkldjgkfjhklfjkkjhkgjhklfk
3 dkljfkdljgkldg: fgdkjgkfdljglkfdjgkldfjgkljkldfjgkljgkflj
4 a12kjsdk232kjk445kjkjlk34323lkjkjlk3422
5 98094kjhjkh23434hjhk32453242hkjhkjhkj
6 bncvmcnbxmbvcnmxbvnxbcnxbnxcmbvnxc: xckdfjgklfdjgklfdjglkfdjgio
7 iourtiourtioreutiorutoir: i3948j35hj4309457480jkh5jk450
8 5ui43u3213435
9 5io4p3i54op3i5op34i5po34i5
10 54390859043860943853kj5h34jkh534jk543
11 jkljkljkjkjkjklj: hgjjjjjjjjjjjjj
12 4i34935kjtkrelthrket: 4539859435943hkjhkjhk34543ll4
13 jg;lfdjglkfjlkfjghlkfd
14 f;dgljdfl;hj;df
15 fglkdjlkjgkldfjgklfdjhklfhjdflkj
然后
$ sed -e :a -e '$!N;/\n.*:/!s/\n/ /;ta' -e 'P;D' test.txt | cat -n
1 adjfkhjdhfkjd: dfkjfkdljgklfjgklfjgkfjghk
2 rtueitjrkldngfmdn: kldfjkdjgkdjgkldjgkfjhklfjkkjhkgjhklfk
3 dkljfkdljgkldg: fgdkjgkfdljglkfdjgkldfjgkljkldfjgkljgkflj a12kjsdk232kjk445kjkjlk34323lkjkjlk3422 98094kjhjkh23434hjhk32453242hkjhkjhkj
4 bncvmcnbxmbvcnmxbvnxbcnxbnxcmbvnxc: xckdfjgklfdjgklfdjglkfdjgio
5 iourtiourtioreutiorutoir: i3948j35hj4309457480jkh5jk450 5ui43u3213435 5io4p3i54op3i5op34i5po34i5 54390859043860943853kj5h34jkh534jk543
6 jkljkljkjkjkjklj: hgjjjjjjjjjjjjj
7 4i34935kjtkrelthrket: 4539859435943hkjhkjhk34543ll4 jg;lfdjglkfjlkfjghlkfd f;dgljdfl;hj;df fglkdjlkjgkldfjgklfdjhklfhjdflkj
答案2
使用任何 awk:
$ awk '/:/{if (NR>1) print r; r=$0; next} {r=r OFS $0} END{print r}' file
adjfkhjdhfkjd: dfkjfkdljgklfjgklfjgkfjghk
rtueitjrkldngfmdn: kldfjkdjgkdjgkldjgkfjhklfjkkjhkgjhklfk
dkljfkdljgkldg: fgdkjgkfdljglkfdjgkldfjgkljkldfjgkljgkflj a12kjsdk232kjk445kjkjlk34323lkjkjlk3422 98094kjhjkh23434hjhk32453242hkjhkjhkj
bncvmcnbxmbvcnmxbvnxbcnxbnxcmbvnxc: xckdfjgklfdjgklfdjglkfdjgio
iourtiourtioreutiorutoir: i3948j35hj4309457480jkh5jk450 5ui43u3213435 5io4p3i54op3i5op34i5po34i5 54390859043860943853kj5h34jkh534jk543
jkljkljkjkjkjklj: hgjjjjjjjjjjjjj
4i34935kjtkrelthrket: 4539859435943hkjhkjhk34543ll4 jg;lfdjglkfjlkfjghlkfd f;dgljdfl;hj;df fglkdjlkjgkldfjgklfdjhklfhjdflkj
答案3
使用awk
:
$ awk '{printf "%s", (($1 ~ /:$/) ? ((NR==1) ? "" : ORS) : " ")$0 } END{print ""}' file
# Or
$ awk -v ORS=' ' '{NR!=1 && gsub(/[[:alnum:]]+:/, "\n&")}1; END{printf "\n"}' file | sed 's/ $//'
正如@EdMorton 所建议的:
始终执行
printf "%s", $n
而不是 ,printf $n
因为当输入包含 printf 格式化字符(例如 , )时,后者会失败,%s
并且比前者使用 ORS 具有的任何值而后者使用您希望 ORS 具有的硬编码值print ""
更好。printf "\n"