我有一个 excel 文件,已转换为 csv。转换后,它看起来像下面的示例(请注意,csv中有100多列。这是缩小版本):
,Product," ",Citty," ",Price
,Name," ",Location," ",Per Unit
,banana," ",CA," ",5.7
,apple," ",FL," ",2.3
我需要编写一个脚本,该脚本将采用第一行和第二行,并根据它们的逗号位置将它们“合并”在一起:
,Product Name," "" ",Citty Location," "" ",Price Per Unit
,banana," ",CA," ",5.7
,apple," ",FL," ",2.3
我已经查看了此处和堆栈溢出的其他问题,但答案似乎与文件前两行的这种奇怪的逐列情况无关。
作为一项额外的不相关任务,我还想删除 csv 中的空列并修复拼写错误,使其看起来像这样:
Product Name,City Location,Price Per Unit
banana,CA,5.7
apple,FL,2.3
(csv 当前在每个实际数据列之间都有一个由引号包围的选项卡,第一列除外,第一列只是空的,后跟逗号)。
我将多次收到带有拼写错误的 csv,因此我想以编程方式修复脚本中的错误。另请注意,列可能并不总是按上面显示的顺序排列,因此我需要在脚本过程中动态检查每个列名称是否有错误。
答案1
尝试这个
$ awk -F, 'NR<2{split(gensub(/Citty/,"City","g",$0),a,FS)}NR==2{for(b=2;b<=NF;b+=2){c=c a[b]" "$b","}print gensub(/,$/,"",1,c)}NR>2{print gensub(/(^,|" *",)/,"","g",$0)}' inp
Product Name,City Location,Price Per Unit
banana,CA,5.7
apple,FL,2.3
$
如果将相同的代码分成几行,则更具可读性:
$ awk -F, '
> NR<2{split(gensub(/Citty/,"City","g",$0),a,FS)}
> NR==2{for(b=2;b<=NF;b+=2){c=c a[b]" "$b","}print gensub(/,$/,"",1,c)}
> NR>2{print gensub(/(^,|" *",)/,"","g",$0)}' inp
Product Name,City Location,Price Per Unit
banana,CA,5.7
apple,FL,2.3
$
如果是第一行,则将该行拆分为 a 内的数组元素。修复城市->城市拼写错误。
如果是第二行,从第二列开始,将第一行的相应列与该列一起打印。对每一列重复此操作,以 2 列增量进行。去掉尾随的,
.
在第二行之后,将任何前导,
或任何替换"<spaces>",
为空字符串,然后打印结果。
在 GNU Awk 4.0.2 上测试正常
答案2
使用 Perl、Text::CSV 和 MoreUtils:
perl -MText::CSV -MList::MoreUtils=pairwise -lne '
BEGIN { $p = Text::CSV->new(); }
@f = $p->fields() if $p->parse($_);
@hdr = map { s/Citty/City/ ; $_ } @f if $. == 1;
@f = pairwise { $a . " " . $b } @hdr, @f if $. == 2;
print join ",", grep { /\w/ } @f if $. > 1;
' file.csv
Product Name,City Location,Price Per Unit
banana,CA,5.7
apple,FL,2.3
忽略grep
不包含至少一个单词字符的字段。
使用 perl >= 5.14.0,您可以简化替换以map s/Citty/City/r @f
使用无损替代修饰符。
答案3
尝试
awk -F, '
{gsub (/,*"[ ]*",*/, ",")
sub (/^,/, "")
sub (/Citty/, "City")
}
NR == 1 {n = split ($0, T)
next
}
NR == 2 {for (;n; n--) $n = T[n] " " $n
}
1
' OFS=, file
Product Name,City Location,Price Per Unit
banana,CA,5.7
apple,FL,2.3