也许我运气不好,因为我的双引号逗号分隔的 CSV 文件在有用文本中包含双引号和逗号。
所以我想把这个:
"record 1","name 1","text 1, text 2"
"record 2","name ""2""","text 2"
"record 3","name 3",""
对此:
"record 1","name 1","text 1, text 2"
"record 2","name 2","text 2"
"record 3","name 3",""
name ""2""
请注意,我删除了to中的双引号name 2
,但保留了第 3 行中的双引号:,""
答案1
用于csvformat
将分隔符转换为制表符 ( csvformat -T
),删除所有双引号 ( tr -d '"'
),然后在引用每个字段(管道的最后一位)时将分隔符返回为逗号:
$ csvformat -T file.csv | tr -d '"' | csvformat -t -U1
"record 1","name 1","text 1, text 2"
"record 2","name 2","text 2"
"record 3","name 3",""
csvformat
是其一部分csvkit
。
答案2
无论您的输入中包含哪些字符,这都将起作用(带引号的字段中的换行符除外,但这是另一个问题)。
使用 GNU awk 进行 FPAT:
$ awk -v FPAT='("[^"]*")+' -v OFS='","' '{
for ( i=1; i<=NF; i++ ) {
gsub(/"/,"",$i)
}
print "\"" $0 "\""
}' file
"record 1","name 1","text 1, text 2"
"record 2","name 2","text 2"
"record 3","name 3",""
或任何 awk 的等效项:
$ awk -v OFS='","' '{
orig=$0; $0=""; i=0;
while ( match(orig,/("[^"]*")+/) ) {
$(++i) = substr(orig,RSTART,RLENGTH)
gsub(/"/,"",$i)
orig = substr(orig,RSTART+RLENGTH)
}
print "\"" $0 "\""
}' file
"record 1","name 1","text 1, text 2"
"record 2","name 2","text 2"
"record 3","name 3",""