示例文件(test.csv):
"PRCD-15234","CDOC","12","JUN-20-2016 17:00:00","title, with commas, ","Y!##!"
"PRCD-99999","CDOC","1","Sep-26-2016 17:00:00","title without comma","Y!##!"
输出文件:
PRCD-15234|CDOC|12|JUN-20-2016 17:00:00|title, with commas, |Y!##!
PRCD-99999|CDOC|1|Sep-26-2016 17:00:00|title without comma|Y!##!
我的脚本(不起作用)如下:
while IFS="," read f1 f2 f3 f4 f5 f6;
do
echo $f1|$f2|$f3|$f4|$f5|$f6;
done < test.csv
答案1
(generate output) | sed -e 's/","/|/g' -e 's/^"//' -e 's/"$//'
或者
sed -e 's/","/|/g' -e 's/^"//' -e 's/"$//' $file
对于 3 个表达式:
-e 's/","/|/g'
= 将所有分隔符替换","
为新分隔符|
-e 's/^"//'
= 删除前导"
标记-e 's/"$//'
= 删除行尾"
标记
这将保留标题中出现的任何引号,只要它们与初始分隔符模式不匹配","
答案2
怎么样
cat test.csv | sed 's/\",\"/|/g' | sed 's/\"//g'
假设文件中的数据如上面所示的方式,(我没有考虑极端情况。)但上面对我有用。
答案3
这个处理嵌入的字符串分隔符:
$ cat /tmp/bla
"PRCD-15234","CDOC","12","JUN-20-2016 17:00:00","title, with commas, ","Y!##!"
"PRCD-99999","CDOC","1","Sep-26-2016 17:00:00","title without comma","Y!##!"
"PRCD-99999","CDOC","1","Sep-26-2016 17:00:00","embedded\",delimiters\",","Y!##!"
sed -E 's/"(([^"]*(\\")?)*)",/\1|/g;s/"|(([^"]*(\\")?)*)"/\1/g'
→
PRCD-15234|CDOC|12|JUN-20-2016 17:00:00|title, with commas, |Y!##!
PRCD-99999|CDOC|1|Sep-26-2016 17:00:00|title without comma|Y!##!
PRCD-99999|CDOC|1|Sep-26-2016 17:00:00|embedded\",delimiters\",|Y!##!
答案4
您的脚本不起作用,因为它不会尝试像 CSV 解析器那样解析带引号的字段。这意味着它将引用字段的逗号视为分隔符。
使用两个 CSV 感知工具csvformat
(来自csvkit) 和磨坊主( mlr
):
$ csvformat -D '|' file
PRCD-15234|CDOC|12|JUN-20-2016 17:00:00|title, with commas, |Y!##!
PRCD-99999|CDOC|1|Sep-26-2016 17:00:00|title without comma|Y!##!
$ mlr --csv --ofs pipe cat file
PRCD-15234|CDOC|12|JUN-20-2016 17:00:00|title, with commas, |Y!##!
PRCD-99999|CDOC|1|Sep-26-2016 17:00:00|title without comma|Y!##!