从制表符分隔文件的特定列中删除多个逗号并在新行上打印单词

从制表符分隔文件的特定列中删除多个逗号并在新行上打印单词

输入文件

jayesh  30,20,50,60 30:20:40,60:55  A   AB,KL,CD        SM1,SM2
rahul   10,80,50,90 25:55:60,25     SGF AAAA,BCD,RTY    SM3,SM4,SM4
pravin  89,78,40,20 25:30:55,96:25  M   J               SD10,SD12
sarika  10,20,48    29:50:30,25     T   K,L             SD20,SD39

我想从第 5 列中删除逗号,并在新行中打印逗号后面的单词(注意:- 第五列的每个单元格包含许多逗号,但我只显示几个)

预期产出

jayesh  30,20,50,60      30:20:40,60:55 A   AB   SM1,SM2
jayesh  30,20,50,60      30:20:40,60:55 A   KL   SM1,SM2
jayesh  30,20,50,60      30:20:40,60:55 A   CD   SM1,SM2
rahul   10,80,50,90,120  25:55:60,25    SGF AAAA SM3,SM4,SM4
rahul   10,80,50,90,120  25:55:60,25    SGF BCD  SM3,SM4,SM4
rahul   10,80,50,90,120  25:55:60,25    SGF RTY  SM3,SM4,SM4
pravin  89,78,40,20      25:30:55,96:25 M   J    SD10,SD12
sarika  10,20,48         29:50:30,25    T   K    SD20,SD39
sarika  10,20,48         29:50:30,25    T   L    SD20,SD39

我使用 awk 尝试了以下操作,但没有给出预期结果。 (为了编写代码,我从这个网站获得帮助如何删除逗号并再次打印整行逗号后面的单词

awk '{
split ($5,w5,",");
for (i in w5) 
{ print $1"\t"$2"\t"$3"\t"$4"\t"w5[i]"\t"$6";}}'

@sundeep,当我对输入文件尝试以下命令时,第 5 列和第 6 列相互混合。(我只在此处显示总共 6 列,但我的文件超过 6 列)

当我在Excel中打开输出文件时获得以下输出

输出

$ awk '{ split ($5,w5,","); for (i in w5) { print $1"\t"$2"\t"$3"\t"$4"\t"w5[i]"\t"$6 } }' ip.txt

jayesh  30,20,50,60 30:20:40,60:55  A   "ABSM1,SM2" 
jayesh  30,20,50,60 30:20:40,60:55  A    KL         SM1,SM2
jayesh  30,20,50,60 30:20:40,60:55  A    CD"        SM1,SM2
rahul   10,80,50,90 25:55:60,25     SGF  AAAASM3,SM4,SM4"   
rahul   10,80,50,90 25:55:60,25     SGF  BCD        SM3,SM4,SM4
rahul   10,80,50,90 25:55:60,25     SGF  RTY"       SM3,SM4,SM4
pravin  89,78,40,20 25:30:55,96:25  M    J          SD10,SD12
sarika  10,20,48    29:50:30,25     T    KSD20,SD39"    
sarika  10,20,48    29:50:30,25     T    L"         SD20,SD39

答案1

awkOP使用的命令只是有语法问题,打印";语句的末尾

$ awk '{ split ($5,w5,","); for (i in w5) { print $1"\t"$2"\t"$3"\t"$4"\t"w5[i]"\t"$6 } }' ip.txt
jayesh  30,20,50,60 30:20:40,60:55  A   AB  SM1,SM2
jayesh  30,20,50,60 30:20:40,60:55  A   KL  SM1,SM2
jayesh  30,20,50,60 30:20:40,60:55  A   CD  SM1,SM2
rahul   10,80,50,90 25:55:60,25 SGF AAAA    SM3,SM4,SM4
rahul   10,80,50,90 25:55:60,25 SGF BCD SM3,SM4,SM4
rahul   10,80,50,90 25:55:60,25 SGF RTY SM3,SM4,SM4
pravin  89,78,40,20 25:30:55,96:25  M   J   SD10,SD12
sarika  10,20,48    29:50:30,25 T   K   SD20,SD39
sarika  10,20,48    29:50:30,25 T   L   SD20,SD39

另外,可以设置输出字段分隔符以获得更清晰的语法,感谢@fedorqui的建议

awk -v OFS='\t' '{ split ($5,w5,","); for (i in w5) { print $1,$2,$3,$4,w5[i],$6 } }' ip.txt

或者

awk -v OFS='\t' '{ split ($5,w5,","); for (i in w5) { $5 = w5[i]; print } }' ip.txt


类似的解决方案perl

$ perl -lane 'print join "\t", @F[0..3],$_,@F[5..$#F] foreach split /,/,$F[4]' ip.txt 
jayesh  30,20,50,60 30:20:40,60:55  A   AB  SM1,SM2
jayesh  30,20,50,60 30:20:40,60:55  A   KL  SM1,SM2
jayesh  30,20,50,60 30:20:40,60:55  A   CD  SM1,SM2
rahul   10,80,50,90 25:55:60,25 SGF AAAA    SM3,SM4,SM4
rahul   10,80,50,90 25:55:60,25 SGF BCD SM3,SM4,SM4
rahul   10,80,50,90 25:55:60,25 SGF RTY SM3,SM4,SM4
pravin  89,78,40,20 25:30:55,96:25  M   J   SD10,SD12
sarika  10,20,48    29:50:30,25 T   K   SD20,SD39
sarika  10,20,48    29:50:30,25 T   L   SD20,SD39

相关内容