我需要一个 shell 脚本,通过将逗号保留在引号内,将 csv 文件转换为管道 (|) 分隔的文件

我需要一个 shell 脚本,通过将逗号保留在引号内,将 csv 文件转换为管道 (|) 分隔的文件

示例文件(test.csv):

"PRCD-15234","CDOC","12","JUN-20-2016 17:00:00","title, with commas, ","Y!##!"
"PRCD-99999","CDOC","1","Sep-26-2016 17:00:00","title without comma","Y!##!"

输出文件:

PRCD-15234|CDOC|12|JUN-20-2016 17:00:00|title, with commas, |Y!##!
PRCD-99999|CDOC|1|Sep-26-2016 17:00:00|title without comma|Y!##!

我的脚本(不起作用)如下:

while IFS="," read f1 f2 f3 f4 f5 f6; 
do  
    echo $f1|$f2|$f3|$f4|$f5|$f6;  
done < test.csv

答案1

(generate output) | sed -e 's/","/|/g' -e 's/^"//' -e 's/"$//'

或者

sed -e 's/","/|/g' -e 's/^"//' -e 's/"$//' $file

对于 3 个表达式:

  • -e 's/","/|/g' = 将所有分隔符替换","为新分隔符|

  • -e 's/^"//' = 删除前导"标记

  • -e 's/"$//' = 删除行尾"标记

这将保留标题中出现的任何引号,只要它们与初始分隔符模式不匹配","

答案2

怎么样 cat test.csv | sed 's/\",\"/|/g' | sed 's/\"//g'

假设文件中的数据如上面所示的方式,(我没有考虑极端情况。)但上面对我有用。

答案3

这个处理嵌入的字符串分隔符:

$ cat /tmp/bla
"PRCD-15234","CDOC","12","JUN-20-2016 17:00:00","title, with commas, ","Y!##!"
"PRCD-99999","CDOC","1","Sep-26-2016 17:00:00","title without comma","Y!##!"
"PRCD-99999","CDOC","1","Sep-26-2016 17:00:00","embedded\",delimiters\",","Y!##!"

sed -E 's/"(([^"]*(\\")?)*)",/\1|/g;s/"|(([^"]*(\\")?)*)"/\1/g'

PRCD-15234|CDOC|12|JUN-20-2016 17:00:00|title, with commas, |Y!##!
PRCD-99999|CDOC|1|Sep-26-2016 17:00:00|title without comma|Y!##!
PRCD-99999|CDOC|1|Sep-26-2016 17:00:00|embedded\",delimiters\",|Y!##!

答案4

您的脚本不起作用,因为它不会尝试像 CSV 解析器那样解析带引号的字段。这意味着它将引用字段的逗号视为分隔符。


使用两个 CSV 感知工具csvformat(来自csvkit) 和磨坊主( mlr):

$ csvformat -D '|' file
PRCD-15234|CDOC|12|JUN-20-2016 17:00:00|title, with commas, |Y!##!
PRCD-99999|CDOC|1|Sep-26-2016 17:00:00|title without comma|Y!##!
$ mlr --csv --ofs pipe cat file
PRCD-15234|CDOC|12|JUN-20-2016 17:00:00|title, with commas, |Y!##!
PRCD-99999|CDOC|1|Sep-26-2016 17:00:00|title without comma|Y!##!

相关内容