将除第一个 ',' 以外的所有内容更改为 "“对于文件中的每一行(bash)

将除第一个 ',' 以外的所有内容更改为 "“对于文件中的每一行(bash)

我正在使用 bash 并有一个 csv 文件 (dat.csv),该文件只需两列(App、Blurb)数据,但由于每行上有许多“,”,它变成了许多列。

问题 csv.dat 的示例:

 App , Blurb
 diff, this is the diff program, bla bla bla, yadda yadda
 word, this is ms product, it is not very good, I dont like it
 dd, this is a Linux disk application , its awesome!, bla bla, ttly
 ... 

我遇到的问题是,因为 'Blurb' col 有额外的 ',' 数据通过管道传输到 dat.csv 文件的后续列(c、d 等)。

目标是将每行中除第一个“,”之外的所有内容更改为“COMMA”,以便所有“Blurb”数据保留在 B 列中。

例如期望的输出:

 App, Blurb                 
 diff, this is the diff program<COMMMA> bla bla bla<COMMA> yadda yadda
 word, this is ms product<COMMA> it is not very good<COMMA> I dont like it
 dd, this is a Linux disk application <COMMA> its awesome!<COMMA>bla bla<COMMA> ttly
 ...

谢谢!

答案1

使用 GNU sed

sed 's/,/<COMMA>/2g' infile

或者可移植性:

sed 's/,/<COMMA>/g; s/<COMMA>/,/' infile

答案2

你也可以这样做POSIX-如下:

sed -e '
    y/,/\n/          ;# change all commas to newlines, which are guaranteed to not be there
    s/\n/,/          ;# then change the first of those newlines to a comma, i.e., restore
    s//<COMMA>/g     ;# and all the remaining newline(s) change to <COMMA>
' dat.csv

答案3

也许您可以在字段周围加上引号,这应该告诉 csv 解析器内部的逗号不是字段分隔符:

sed 's/"/""/g;                         # escape existing " as ""
     s/[[:space:]]*,[[:space:]]*/","/; # replace the first , and the
                                       # whitespace around it with ","

     s/^[[:space:]]*/"/;               # add a " at the start (and
                                       # get rid of whitespace there)

     s/[[:space:]]*$/"/;               # same at the end'

相关内容