我试图扩展这个问题但无法弄清楚这个问题:
假设我有一个文件roll.txt
:
echo "'123456789','987651234','129873645','213456789','987612345','543216789','432156789','876543291','213465789','542637819','123456','23456','22234','3456','7890543','34567891,'2345','567'" >> roll.txt
我可以使用以下 sed 命令在每六个逗号后放置一个换行符:
sed 's/,/,\n/6; P; D' roll.txt
'123456789','987651234','129873645','213456789','987612345','543216789',
'432156789','876543291','213465789','542637819','123456','23456',
'22234','3456','7890543','34567891,'2345','567'
但是,当我尝试在每六个逗号后放置两个换行符时:
sed 's/,/,\n\n/6; P; D' roll.txt
'123456789','987651234','129873645','213456789','987612345','543216789',
'432156789','876543291','213465789','542637819','123456','23456',
'22234','3456','7890543','34567891,'2345','567'
相反,我在第六个逗号后得到两个换行符,并且四第 12 个逗号后的换行符。为什么?如何在每六个逗号后得到两个换行符?
答案1
正如 Steeldriver 的评论中所写,在每个周期中,您添加两行,但仅打印和删除一行。对于较长的序列(有 3 行、7 行和 15 行空行),情况会变得更糟......
因此,如果您的第一行为空,请不要进行替换:
sed '/^\n/!s/,/,\n\n/6; P; D'
答案2
使用 GNU awk for multi-char RS,您可以将每个记录定义为 6 个非逗号然后逗号字段:
$ echo "'123456789','987651234','129873645','213456789','987612345','543216789','432156789','876543291','213465789','542637819','123456','23456','22234','3456','7890543','34567891,'2345','567'" |
awk -v RS='([^,]*,){0,6}' 'RT{print RT}'
'123456789','987651234','129873645','213456789','987612345','543216789',
'432156789','876543291','213465789','542637819','123456','23456',
'22234','3456','7890543','34567891,'2345',
如果您想确保每个输出行都有 6 个字段,并且仅在,
最后一个字段为空时结束,因此它是有效的 CSV,您可以执行以下操作:
$ echo "'123456789','987651234','129873645','213456789','987612345','543216789','432156789','876543291','213465789','542637819','123456','23456','22234','3456','7890543','34567891,'2345','567'" |
awk -v n=6 'BEGIN{RS="([^,]*,){0,"n"}"; FS=OFS=","} RT{$0=gensub(/,$/,"",1,RT); $n=$n; print}'
'123456789','987651234','129873645','213456789','987612345','543216789'
'432156789','876543291','213465789','542637819','123456','23456'
'22234','3456','7890543','34567891,'2345',
答案3
使用乐(以前称为 Perl_6)
如果您想将 Raku 中的元素组合在一起,您可以将batch
它们组合在一起:
~$ raku -ne 'put join "\n", .split(",").batch(6).map: *.join(",");' roll.txt
'123456789','987651234','129873645','213456789','987612345','543216789'
'432156789','876543291','213465789','542637819','123456','23456'
'22234','3456','7890543','34567891,'2345','567'
因此,要在每个之间获得两个换行符batch
,只需join
在\n\n
:
~$ raku -ne 'put join "\n\n", .split(",").batch(6).map: *.join(",");' roll.txt
'123456789','987651234','129873645','213456789','987612345','543216789'
'432156789','876543291','213465789','542637819','123456','23456'
'22234','3456','7890543','34567891,'2345','567'
Raku的batch
功能相当于Raku的rotor(..., :partial)
调用。如果您想在最后删除不完整的 6 个元素集,只需调用rotor()
.
最后,有时split
ting 并不总能为您提供所需的答案。在这种情况下,您可以尝试comb
浏览数据以提取感兴趣的元素。下面的代码给出了与上面的答案完全相同的代码,但在概念上可能更简单。唯一的困难是'
撇号可能会弄乱单行引用,因此可以使用其 Unicode 名称来声明该字符\c[APOSTROPHE]
:
~$ raku -ne 'put join "\n\n", .comb(/ \c[APOSTROPHE] \d+ \c[APOSTROPHE] /).batch(6).map: *.join(",");' roll.txt
'123456789','987651234','129873645','213456789','987612345','543216789'
'432156789','876543291','213465789','542637819','123456','23456'
'22234','3456','7890543','2345','567'
https://unix.stackexchange.com/a/611077/227738
https://docs.raku.org/language/regexes
https://raku.org
答案4
使用awk
:
$ awk -F, '{for (i=1;i<NF;i++) printf "%s", $i FS ((i%6==0) ? ORS ORS: "") }END{print $NF; print ""}' file
'123456789','987651234','129873645','213456789','987612345','543216789',
'432156789','876543291','213465789','542637819','123456','23456',
'22234','3456','7890543','34567891,'2345','567'