我有一个 CSV,导出时每个标题和值都带有双引号,我需要它们消失,但不删除可能实际作为值存在的双引号。例如:
"HEADER1","HEADER2","HEADER3","HEADER4","HEADER5"
"SOME_ID_0X0","SOME_ID_1X2","false","Some blob value with "double quotes" inside of it"
"SOME_ID_0X0","SOME_ID_1X2","false","Some blob value with "double quotes" inside of it"
"SOME_ID_0X0","SOME_ID_1X2","false","Some blob value with "double quotes" inside of it"
"
我可以使用以下命令删除每一行的第一行
$ sed -i.bak 's/^"//g' $1
我可以用这个删除中间的所有内容:
$ sed -i.bak 's/","/,/g' $1
最后我想我可以删除"
每一行的最后一个:
$ sed -i.bak 's/"$//g' $1
但这不起作用。我能用一行完成整个任务吗?
更新 我用了这个网站粘贴我的隐藏字符数据,结果如下
看来有些评论可能是准确的,但我不知道这意味着我仍然需要更改。还在尝试删除它们之前,是否有一种干净的方法来检查 CSV 是否包含这些引号?也许甚至只是限定第一个字符是引号?
答案1
用于dos2unix
将 DOS 文本文件格式转换为 UNIX 文本文件格式
dos2unix $1
您可以将所有 3 个组合sed
为:
sed -i 's/^"//g;s/","/,/g;s/"$//g' $1
SOME_ID_0X0,SOME_ID_1X2,false,Some blob value with "double quotes" inside of it
SOME_ID_0X0,SOME_ID_1X2,false,Some blob value with "double quotes" inside of it
SOME_ID_0X0,SOME_ID_1X2,false,Some blob value with "double quotes" inside of it
使用AWK
:
awk -F ',' -v OFS=',' '{for (i=1;i<=NF;i++) sub(/^\"/,"",$i) sub (/\"$/,"",$i); print $0}' 1$
HEADER1,HEADER2,HEADER3,HEADER4,HEADER5
SOME_ID_0X0,SOME_ID_1X2,false,Some blob value with "double quotes" inside of it
SOME_ID_0X0,SOME_ID_1X2,false,Some blob value with "double quotes" inside of it
SOME_ID_0X0,SOME_ID_1X2,false,Some blob value with "double quotes" inside of it
sub(/^\"/,"",$i)
删除"
每个字段的开头。sub(/\"$/,"",$i)
删除"
每个字段的末尾。