我有一个 CSV 文件,其中包含如下所示的值示例
"Basic","""21,21""","[""21"",""21""]","","","","",""
我需要删除某些列上多余的双引号,例如 2 和 3
预期输出如下
"Basic","21,21","[21,21]","","","","",""
如何使用 awk、sed 或任何其他 Linux 工具实现此目的?
下面提到了一些更多的文件示例,该列中的值始终位于 [] 中,[] 中的引号必须删除。
"Basic","""40""","[""40""]","""13F""","[""13F""]","",""
"Basic","""0""","[""0""]","","","""MCOMB""","[""MCOMB""]"
答案1
答案2
答案3
我有一个sed
解决方案
sed -e 's/,"""/,"/g' -e 's/""",/",/g' -e 's/\([^,]\)""/\1/g' -e 's/""\([^,]\)/\1/'
这使
"Basic","40","[40]","13F","[13F]","",""
"Basic","0","[0]","","","MCOMB","[MCOMB]"
"Basic","21,21","[21,21]","","","","",""
sed 命令非常简单
's/,"""/,"/g'
将所有出现的情况替换,"""
为,"
g
's/\([^,]\)""/\1/g'
找到任何非逗号字符[^,]
和两个"
,记住字符\( \)
并替换为记住的字符\1
请注意,行尾的尾随空格将删除最后一个""
正如 @cas 指出的,从长远来看,使用 csv 工具会更好。
答案4
我假设您想要删除数据中的所有双引号,即,而不是 CSV 格式中的双引号以及引用嵌入引号、逗号和换行符所必需的双引号。
使用csvformat
csvkit 和tr
来删除每个字段的内部引用:
$ cat file
"Basic","""40""","[""40""]","""13F""","[""13F""]","",""
"Basic","""0""","[""0""]","","","""MCOMB""","[""MCOMB""]"
"Basic","""21,21""","[""21"",""21""]","","","","",""
$ csvformat -Q "'" file | tr -d '"' | csvformat -q "'"
Basic,40,[40],13F,[13F],,
Basic,0,[0],,,MCOMB,[MCOMB]
Basic,"21,21","[21,21]",,,,,
上面的管道首先将 CSV 文件中使用的引号字符从双引号更改为单引号。该tr
命令删除所有剩余的双引号(部分数据)。最后的csvformat
命令将数据转换回使用双引号进行引用。
如果您需要引用每个字段,甚至是空字段,请添加-U 1
到csvformat
的第二次调用。默认情况下,csvkit 实用程序仅输出需要它的字段的引号。