删除 CSV 中以逗号分隔并用双引号括起来的间隔双引号

删除 CSV 中以逗号分隔并用双引号括起来的间隔双引号

也许我运气不好,因为我的双引号逗号分隔的 CSV 文件在有用文本中包含双引号和逗号。

所以我想把这个:

"record 1","name 1","text 1, text 2"
"record 2","name ""2""","text 2"
"record 3","name 3",""

对此:

"record 1","name 1","text 1, text 2"
"record 2","name 2","text 2"
"record 3","name 3",""

name ""2""请注意,我删除了to中的双引号name 2,但保留了第 3 行中的双引号:,""

答案1

用于csvformat将分隔符转换为制表符 ( csvformat -T),删除所有双引号 ( tr -d '"'),然后在引用每个字段(管道的最后一位)时将分隔符返回为逗号:

$ csvformat -T file.csv | tr -d '"' | csvformat -t -U1
"record 1","name 1","text 1, text 2"
"record 2","name 2","text 2"
"record 3","name 3",""

csvformat是其一部分csvkit

答案2

无论您的输入中包含哪些字符,这都将起作用(带引号的字段中的换行符除外,但这是另一个问题)。

使用 GNU awk 进行 FPAT:

$ awk -v FPAT='("[^"]*")+' -v OFS='","' '{
    for ( i=1; i<=NF; i++ ) {
        gsub(/"/,"",$i)
    }
    print "\"" $0 "\""
}' file
"record 1","name 1","text 1, text 2"
"record 2","name 2","text 2"
"record 3","name 3",""

或任何 awk 的等效项:

$ awk -v OFS='","' '{
    orig=$0; $0=""; i=0;
    while ( match(orig,/("[^"]*")+/) ) {
        $(++i) = substr(orig,RSTART,RLENGTH)
        gsub(/"/,"",$i)
        orig = substr(orig,RSTART+RLENGTH)
    }
    print "\"" $0 "\""
}' file
"record 1","name 1","text 1, text 2"
"record 2","name 2","text 2"
"record 3","name 3",""

也可以看看使用 awk 高效解析 csv 的最稳健方法是什么

相关内容