我有一个 csv 文件,我想删除其中的任何红色"
或,
。问题是我的所有字段看起来都像这样,"***********","*********",
其中*
= 可以是任何字符,包括,
和"
。我需要删除分隔符之间可能出现的任何,
或,以便我的字符串保留在双引号中。"
","
"This, has a comma","This has "quotes"",",but what if this happens "too""
理想情况下输出应该是
"This has a comma","This has quotes","but what if this happens too"
我试过这个,但它删除了我所有的双引号
awk 'BEGIN{FS=OFS=","} {for (i=1; i<=NF; i++) {gsub("\"","", $i)}}1' input.csv > output.csv
编辑:在此示例输入行上尝试了 aborruso 解决方案。
"33333","23","01-07-31-0-000-026.001"," ","","55540","COUNTY ROAD","","","","COUNTY ROAD","","Y","","","VILLE","YY","33367","","","4.246","-22.331","3","X","6000 ","","134","","000003"," "," ","New New New
New","000000370","0000004111","0000002111","94111"," ","JOHN SMITH","BILL","E BILLY",""," ",", JANE DOE","DOE","BOB","","BILL E BILLY BOB",", JANE DOE JOHN DOE BABY DOE"," ","","313 MAIN BLVD","313","","","MAIN","BLVD","","","","NEW YORK","NY","48555","2335","Z005"," ","1234","0","0","12345","12345","0","12345","1234"," ","1234","","01","",""," "," "," "," "," "," "," "," "," "," "," "," ","0","","0","","","","","","","","0","0","0","0","0","","","","","","","","","1","","","","","","0","","","","","","","","","","","0","0","",""," "," "," "," "," ","01"," ","DISTRICT 9 THE MOVE",""," "," ","DISNEY NETFLIX","","COME ADD 1/4 OF A CUP 1/4 OF A SPOON 13-T00Z-B32C; THE 524.01'(X)350'(Y)TO W R/W OF SO SI'LLY 987.7';WOULDN'T YOU SAY","24313317","Z",""
我得到的输出是这样的
"(error)","(error)","01-07-31-0-000-026.001"," ","","(error)","COUNTY ROAD","","","","COUNTY ROAD","","Y","","","VILLE","YY","(error)
这只是一个片段,但你应该明白我的意思了。
答案1
您可以使用sed这里(GNU sed)4.7:
sed 's/","/\x0/g;s/^"\|"$/\x1/g;s/[",]//g;s/\x0/","/g;s/\x1/"/g' input.csv
","
用 NUL 字符替换字段分隔符(\x0
)"
将开头和结尾的部分替换为 SOH char (\x1
)- 删除
"
所有,
- 恢复字段分隔符和
"
答案2
这不是 awk,但我认为它可能对你有用
如果你使用磨坊主然后运行
echo '"This, has a comma","This has ""quotes""",",but what if this happens ""too"""'| \
mlr --csv -N --quote-all cat then put -S 'for (k in $*){$[k]=gsub($[k],"(,|\")","")}'
你有
"This has a comma","This has quotes","but what if this happens too"
编辑
您已插入输入的文本。这很奇怪,它类似于
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 十三 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 二十五 | 二十六 | 二十七 | 二十八 | 二十九 | 三十 | 31 | 三十二 | 33 | 三十四 | 三十五 | 三十六 | 三十七 | 三十八 | 三十九 | 40 | 41 | 四十二 | 43 | 四十四 | 四十五 | 四十六 | 四十七 | 四十八 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | 94 | 95 | 96 | 97 | 98 | 99 | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | 109 | 110 | 111 | 112 | 113 | 114 | 115 | 116 | 117 | 118 | 119 | 120 | 121 | 122 | 123 | 124 | 125 | 126 | 127 | 128 | 129 | 130 | 131 | 132 | 133 | 134 | 135 | 136 | 137 | 138 | 139 | 140 | 141 | 142 | 143 | 144 | 145 | 146 | 147 | 148 | 149 | 150 | 151 | 152 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
33333 | 23 | 01-07-31-0-000-026.001 | 55540 | 县道 | 县道 | 是 | 维尔 | 耶耶 | 33367 | 4.246 | -22.331 | 3 | X | 6000 | 134 | 000003 | 新品 新品 新品 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
新的 | 000000370 | 0000004111 | 0000002111 | 94111 | 约翰·史密斯 | 账单 | 比利 | 无名氏 | 美国能源部 | 鲍勃 | 比利·鲍勃 | , 珍妮·多伊 约翰·多伊 宝贝·多伊 | 313 主大道 | 313 | 主要的 | 大道 | 纽约 | 纽约 | 48555 | 2335 | Z005 | 1234 | 0 | 0 | 12345 | 12345 | 0 | 12345 | 1234 | 1234 | 01 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 01 | 第九区 搬家 | 迪士尼 Netflix | 来添加 1/4 杯 1/4 勺 13-T00Z-B32C;524.01'(X)350'(Y) 到 WR/W 如此愚蠢的 987.7';你不觉得吗 | 24313317 | 是 |
运行 mlr 命令,您将获得此 CSV 输出
"33333","23","01-07-31-0-000-026.001"," ","","55540","COUNTY ROAD","","","","COUNTY ROAD","","Y","","","VILLE","YY","33367","","","4.246","-22.331","3","X","6000 ","","134","","000003"," "," ","New New New
New","000000370","0000004111","0000002111","94111"," ","JOHN SMITH","BILL","E BILLY",""," "," JANE DOE","DOE","BOB","","BILL E BILLY BOB"," JANE DOE JOHN DOE BABY DOE"," ","","313 MAIN BLVD","313","","","MAIN","BLVD","","","","NEW YORK","NY","48555","2335","Z005"," ","1234","0","0","12345","12345","0","12345","1234"," ","1234","","01","",""," "," "," "," "," "," "," "," "," "," "," "," ","0","","0","","","","","","","","0","0","0","0","0","","","","","","","","","1","","","","","","0","","","","","","","","","","","0","0","",""," "," "," "," "," ","01"," ","DISTRICT 9 THE MOVE",""," "," ","DISNEY NETFLIX","","COME ADD 1/4 OF A CUP 1/4 OF A SPOON 13-T00Z-B32C; THE 524.01'(X)350'(Y)TO W R/W OF SO SI'LLY 987.7';WOULDN'T YOU SAY","24313317","Z",""
答案3
在 GNU awk 中可以使用FPAT 分隔符规范,它可以正确处理引号内的分隔符,但当然会以牺牲一些性能为代价:
awk 'BEGIN{ FPAT="([^,]*)|(\"[^\"]*\")"; OFS="," } { for (i=1; i<=NF; i++) {gsub("\"|,","", $i)}; print $0; }' input.csv > output.csv
awk 'BEGIN{ FPAT="([^,]*)|(\"[^\"]*\")"; OFS="|" } { $1=$1; print $0; }' input.csv > output2.csv
其中 FPAT 正则表达式的意思是
([^,]*)
- 任何不包含,
(分隔符)或(\"[^\"]*\")
- 引号之间的任何字符串,中间\"
不能有其他引号[^\"]
for (i=1; i<=NF; i++) {gsub("\"|,","", $i)};
- 替换"
或,
输入字段值并使用 OFS 重建记录- 或者,使用
$1=$1
是使用当前 OFS 重建记录使用分隔符|
,然后将文件作为管道分隔符进行处理。
PS 您可以,
用;
或 替换分隔符*
,以+
避免空字段值。