我尝试使用(AWK 或 SED)清理我的 csv 文件很长时间
csv 文件如下所示:
id,name,contact-type,contact
"1","toto corp","tel","+123456789"
"1","toto corp","fax","+198765432"
"1","toto corp","site","totocorp.com"
"2","Namek corp","tel","+14377678563"
"2","Namek corp","fax","+19867334565"
"2","Namek corp","site","Namekcorp.com"
我想要这样的输出:
id,name,tel,fax,site
"1","toto corp","+123456789","+198765432","totocorp.com"
"2","Namek corp","+14377678563","+19867334565","Namekcorp.com"
谢谢各位的伸出之手!
答案1
呆呆地解决方案:
awk 'BEGIN{ FS=OFS=","; print "id,name,tel,fax,site" }
{ gsub(/"/,"",$3); a[$1][$3]=$4; if(!a[$1]["name"]) a[$1]["name"]=$2; }
END { for(i in a) print i,a[i]["name"],a[i]["tel"],a[i]["fax"],a[i]["site"] }' file
输出:
id,name,tel,fax,site
"1","toto corp","+123456789","+198765432","totocorp.com"
"2","Namek corp","+14377678563","+19867334565","Namekcorp.com"
a[$1][$3]=$4
- 多维数组,使用id
字段值作为父键,使用contact-type
字段值作为嵌套数组的键
答案2
假设双引号 csv 字段内没有逗号,那么我们可以在简单的sed
代码中执行以下操作:
sed -ne '
s/,/&\n/2;s/\n[^,]*,//;h
n;s/.*,//;H
n;s/.*,//;H
g;s/\n/,/gp
' yourfile.csv
结果
"1","toto corp","+123456789","+198765432","totocorp.com"
"2","Namek corp","+14377678563","+19867334565","Namekcorp.com"
答案3
另一种 gnu awk:
$ awk -F, -v OFS="," 'NR==1{h=$0;next}{a[$1FS$2]=(NR==2?FS $4:a[$1FS$2] FS $4)} \
END{print h;for (k in a) print k a[k]}' file
#Output:
id,name,contact-type,contact
"2","Namek corp","+14377678563","+19867334565","Namekcorp.com"
"1","toto corp","+123456789","+198765432","totocorp.com"
由于 awk 迭代数组的方式,在 END 部分中,数组不是按排序顺序打印的。
答案4
我知道原始用户正在寻找使用awk
or 的解决方案sed
,但由于这些工具不知道 CSV 的引用规则,因此可能值得使用知道如何解析 CSV 的工具来展示解决方案。
使用 CSV 感知工具磨坊主( mlr
) 及其使用字段中的值从字段reshape
创建新字段的操作:contact-type
contact
$ mlr --csv reshape -s contact-type,contact file
id,name,tel,fax,site
1,toto corp,+123456789,+198765432,totocorp.com
2,Namek corp,+14377678563,+19867334565,Namekcorp.com
再次,但采用精美印刷的盒子格式:
$ mlr --c2p --barred reshape -s contact-type,contact file
+----+------------+--------------+--------------+---------------+
| id | name | tel | fax | site |
+----+------------+--------------+--------------+---------------+
| 1 | toto corp | +123456789 | +198765432 | totocorp.com |
| 2 | Namek corp | +14377678563 | +19867334565 | Namekcorp.com |
+----+------------+--------------+--------------+---------------+