使用 AWK 或 SED 解析 CSV

使用 AWK 或 SED 解析 CSV

我尝试使用(AWK 或 SED)清理我的 csv 文件很长时间

csv 文件如下所示:

id,name,contact-type,contact
"1","toto corp","tel","+123456789"
"1","toto corp","fax","+198765432"
"1","toto corp","site","totocorp.com"
"2","Namek corp","tel","+14377678563"
"2","Namek corp","fax","+19867334565"
"2","Namek corp","site","Namekcorp.com"

我想要这样的输出:

id,name,tel,fax,site
"1","toto corp","+123456789","+198765432","totocorp.com"
"2","Namek corp","+14377678563","+19867334565","Namekcorp.com"

谢谢各位的伸出之手!

答案1

呆呆地解决方案:

awk 'BEGIN{ FS=OFS=","; print "id,name,tel,fax,site" }
     { gsub(/"/,"",$3); a[$1][$3]=$4; if(!a[$1]["name"]) a[$1]["name"]=$2; }
     END { for(i in a) print i,a[i]["name"],a[i]["tel"],a[i]["fax"],a[i]["site"] }' file

输出:

id,name,tel,fax,site
"1","toto corp","+123456789","+198765432","totocorp.com"
"2","Namek corp","+14377678563","+19867334565","Namekcorp.com"

  • a[$1][$3]=$4- 多维数组,使用id字段值作为父键,使用contact-type字段值作为嵌套数组的键

答案2

假设双引号 csv 字段内没有逗号,那么我们可以在简单的sed代码中执行以下操作:

sed -ne '
     s/,/&\n/2;s/\n[^,]*,//;h
   n;s/.*,//;H
   n;s/.*,//;H
   g;s/\n/,/gp
' yourfile.csv

结果

"1","toto corp","+123456789","+198765432","totocorp.com"
"2","Namek corp","+14377678563","+19867334565","Namekcorp.com"

答案3

另一种 gnu awk:

$ awk -F, -v OFS="," 'NR==1{h=$0;next}{a[$1FS$2]=(NR==2?FS $4:a[$1FS$2] FS $4)} \
END{print h;for (k in a) print k a[k]}' file

#Output:    
id,name,contact-type,contact
"2","Namek corp","+14377678563","+19867334565","Namekcorp.com"
"1","toto corp","+123456789","+198765432","totocorp.com"

由于 awk 迭代数组的方式,在 END 部分中,数组不是按排序顺序打印的。

答案4

我知道原始用户正在寻找使用awkor 的解决方案sed,但由于这些工具不知道 CSV 的引用规则,因此可能值得使用知道如何解析 CSV 的工具来展示解决方案。

使用 CSV 感知工具磨坊主( mlr) 及其使用字段中的值从字段reshape创建新字段的操作:contact-typecontact

$ mlr --csv reshape -s contact-type,contact file
id,name,tel,fax,site
1,toto corp,+123456789,+198765432,totocorp.com
2,Namek corp,+14377678563,+19867334565,Namekcorp.com

再次,但采用精美印刷的盒子格式:

$ mlr --c2p --barred reshape -s contact-type,contact file
+----+------------+--------------+--------------+---------------+
| id | name       | tel          | fax          | site          |
+----+------------+--------------+--------------+---------------+
| 1  | toto corp  | +123456789   | +198765432   | totocorp.com  |
| 2  | Namek corp | +14377678563 | +19867334565 | Namekcorp.com |
+----+------------+--------------+--------------+---------------+

相关内容