如何逐列合并 csv 的前两行?

如何逐列合并 csv 的前两行?

我有一个 excel 文件,已转换为 csv。转换后,它看起来像下面的示例(请注意,csv中有100多列。这是缩小版本):

,Product,"  ",Citty,"   ",Price
,Name," ",Location,"    ",Per Unit
,banana,"   ",CA,"  ",5.7
,apple,"    ",FL,"  ",2.3

我需要编写一个脚本,该脚本将采用第一行和第二行,并根据它们的逗号位置将它们“合并”在一起:

,Product Name," ""  ",Citty Location,"  ""  ",Price Per Unit
,banana,"   ",CA,"  ",5.7
,apple,"    ",FL,"  ",2.3

我已经查看了此处和堆栈溢出的其他问题,但答案似乎与文件前两行的这种奇怪的逐列情况无关。


作为一项额外的不相关任务,我还想删除 csv 中的空列并修复拼写错误,使其看起来像这样:

Product Name,City Location,Price Per Unit
banana,CA,5.7
apple,FL,2.3

(csv 当前在每个实际数据列之间都有一个由引号包围的选项卡,第一列除外,第一列只是空的,后跟逗号)。

我将多次收到带有拼写错误的 csv,因此我想以编程方式修复脚本中的错误。另请注意,列可能并不总是按上面显示的顺序排列,因此我需要在脚本过程中动态检查每个列名称是否有错误。

答案1

尝试这个

$ awk -F, 'NR<2{split(gensub(/Citty/,"City","g",$0),a,FS)}NR==2{for(b=2;b<=NF;b+=2){c=c a[b]" "$b","}print gensub(/,$/,"",1,c)}NR>2{print gensub(/(^,|" *",)/,"","g",$0)}' inp
Product Name,City Location,Price Per Unit
banana,CA,5.7
apple,FL,2.3
$

如果将相同的代码分成几行,则更具可读性:

$ awk -F, '
> NR<2{split(gensub(/Citty/,"City","g",$0),a,FS)}
> NR==2{for(b=2;b<=NF;b+=2){c=c a[b]" "$b","}print gensub(/,$/,"",1,c)}
> NR>2{print gensub(/(^,|" *",)/,"","g",$0)}' inp
Product Name,City Location,Price Per Unit
banana,CA,5.7
apple,FL,2.3
$

如果是第一行,则将该行拆分为 a 内的数组元素。修复城市->城市拼写错误。

如果是第二行,从第二列开始,将第一行的相应列与该列一起打印。对每一列重复此操作,以 2 列增量进行。去掉尾随的,.

在第二行之后,将任何前导,或任何替换"<spaces>",为空字符串,然后打印结果。

在 GNU Awk 4.0.2 上测试正常

在线尝试一下!

答案2

使用 Perl、Text::CSV 和 MoreUtils:

perl -MText::CSV -MList::MoreUtils=pairwise -lne '
  BEGIN { $p = Text::CSV->new(); } 
  @f = $p->fields() if $p->parse($_);
  @hdr = map { s/Citty/City/ ; $_ } @f if $. == 1;
  @f = pairwise { $a . " " . $b } @hdr, @f if $. == 2;
  print join ",", grep { /\w/ } @f if $. > 1;
' file.csv
Product Name,City Location,Price Per Unit
banana,CA,5.7
apple,FL,2.3

忽略grep不包含至少一个单词字符的字段。

使用 perl >= 5.14.0,您可以简化替换以map s/Citty/City/r @f使用无损替代修饰符。

答案3

尝试

    awk -F, '
            {gsub (/,*"[    ]*",*/, ",")
             sub (/^,/, "")
             sub (/Citty/, "City")
            }

    NR == 1 {n = split ($0, T)
             next
            }
    NR == 2 {for (;n; n--) $n = T[n] " " $n
            }

    1
    ' OFS=, file
Product Name,City Location,Price Per Unit
banana,CA,5.7
apple,FL,2.3

相关内容