每六个逗号后添加两个分隔符

每六个逗号后添加两个分隔符

我试图扩展这个问题但无法弄清楚这个问题:

假设我有一个文件roll.txt

echo "'123456789','987651234','129873645','213456789','987612345','543216789','432156789','876543291','213465789','542637819','123456','23456','22234','3456','7890543','34567891,'2345','567'" >> roll.txt

我可以使用以下 sed 命令在每六个逗号后放置一个换行符:

sed 's/,/,\n/6; P; D' roll.txt
'123456789','987651234','129873645','213456789','987612345','543216789',
'432156789','876543291','213465789','542637819','123456','23456',
'22234','3456','7890543','34567891,'2345','567'

但是,当我尝试在每六个逗号后放置两个换行符时:

sed 's/,/,\n\n/6; P; D' roll.txt
'123456789','987651234','129873645','213456789','987612345','543216789',

'432156789','876543291','213465789','542637819','123456','23456',



'22234','3456','7890543','34567891,'2345','567'

相反,我在第六个逗号后得到两个换行符,并且第 12 个逗号后的换行符。为什么?如何在每六个逗号后得到两个换行符?

答案1

正如 Steeldriver 的评论中所写,在每个周期中,您添加两行,但仅打印和删除一行。对于较长的序列(有 3 行、7 行和 15 行空行),情况会变得更糟......

因此,如果您的第一行为空,请不要进行替换:

sed '/^\n/!s/,/,\n\n/6; P; D'

答案2

使用 GNU awk for multi-char RS,您可以将每个记录定义为 6 个非逗号然后逗号字段:

$ echo "'123456789','987651234','129873645','213456789','987612345','543216789','432156789','876543291','213465789','542637819','123456','23456','22234','3456','7890543','34567891,'2345','567'" |
awk -v RS='([^,]*,){0,6}' 'RT{print RT}'
'123456789','987651234','129873645','213456789','987612345','543216789',
'432156789','876543291','213465789','542637819','123456','23456',
'22234','3456','7890543','34567891,'2345',

如果您想确保每个输出行都有 6 个字段,并且仅在,最后一个字段为空时结束,因此它是有效的 CSV,您可以执行以下操作:

$ echo "'123456789','987651234','129873645','213456789','987612345','543216789','432156789','876543291','213465789','542637819','123456','23456','22234','3456','7890543','34567891,'2345','567'" |
awk -v n=6 'BEGIN{RS="([^,]*,){0,"n"}"; FS=OFS=","} RT{$0=gensub(/,$/,"",1,RT); $n=$n; print}'
'123456789','987651234','129873645','213456789','987612345','543216789'
'432156789','876543291','213465789','542637819','123456','23456'
'22234','3456','7890543','34567891,'2345',

答案3

使用(以前称为 Perl_6)

如果您想将 Raku 中的元素组合在一起,您可以将batch它们组合在一起:

~$  raku -ne 'put join "\n", .split(",").batch(6).map: *.join(",");' roll.txt
'123456789','987651234','129873645','213456789','987612345','543216789'
'432156789','876543291','213465789','542637819','123456','23456'
'22234','3456','7890543','34567891,'2345','567'

因此,要在每个之间获得两个换行符batch,只需join\n\n

~$  raku -ne 'put join "\n\n", .split(",").batch(6).map: *.join(",");' roll.txt
'123456789','987651234','129873645','213456789','987612345','543216789'

'432156789','876543291','213465789','542637819','123456','23456'

'22234','3456','7890543','34567891,'2345','567'

Raku的batch功能相当于Raku的rotor(..., :partial)调用。如果您想在最后删除不完整的 6 个元素集,只需调用rotor().

最后,有时splitting 并不总能为您提供所需的答案。在这种情况下,您可以尝试comb浏览数据以提取感兴趣的元素。下面的代码给出了与上面的答案完全相同的代码,但在概念上可能更简单。唯一的困难是'撇号可能会弄乱单行引用,因此可以使用其 Unicode 名称来声明该字符\c[APOSTROPHE]

~$ raku -ne 'put join "\n\n", .comb(/ \c[APOSTROPHE] \d+ \c[APOSTROPHE] /).batch(6).map: *.join(",");'  roll.txt
'123456789','987651234','129873645','213456789','987612345','543216789'

'432156789','876543291','213465789','542637819','123456','23456'

'22234','3456','7890543','2345','567'

https://unix.stackexchange.com/a/611077/227738
https://docs.raku.org/language/regexes
https://raku.org

答案4

使用awk

$ awk -F, '{for (i=1;i<NF;i++) printf "%s", $i FS ((i%6==0) ? ORS ORS: "") }END{print $NF; print ""}' file
'123456789','987651234','129873645','213456789','987612345','543216789',

'432156789','876543291','213465789','542637819','123456','23456',

'22234','3456','7890543','34567891,'2345','567'

相关内容