假设我有一个 CSV 文件:
"col1","col2","col3"
"col4","col5,subtext","col6
我遇到的问题如下:
cut -d, -f1,2 test.txt
"coll1","col2"
"col4","col5
所需的输出是:
"col1","col2"
"col4","col5,subtext"
答案1
Perl 附带的 ParseWords 模块非常优雅地涵盖了这一点。下面的例子。
$ perl -MText::ParseWords -nE '@a=quotewords ",",1,$_;say $a[0],",",$a[1]' <test.txt
"col1","col2"
"col4","col5,subtext"
$
答案2
如果您有gawk
v4 可用,则有一个很好的解决方案:使用 awk 解析 csv 并忽略字段内的逗号
例子:
gawk -vFPAT='[^,]*|"[^"]*"' '{print $1 "," $2}' test.txt
答案3
另一种perl
解决方案,假设所有字段都被引用
$ perl -F'/"\K,(?=")/' -lane 'print "$F[0],$F[1]"' test.txt
"col1","col2"
"col4","col5,subtext"
-F'/"\K,(?=")/'
仅当字段分隔符前后为逗号时,字段分隔符才为"
逗号"
print "$F[0],$F[1]"
打印前两个字段,分隔符,
grep
也 可以用
$ grep -oE '^"[^"]*","[^"]*"' test.txt
"col1","col2"
"col4","col5,subtext"
如果需要 N 个字段,请使用里面的grep -oE '^("[^"]*",){1}"[^"]*"'
数字{}
N-1
答案4
你也可以用 awk 尝试一下,如下所示;
awk -F'","' '{printf "%s\",\"%s\"\n", $1, $2 }' test.txt
例如;
user@host$ awk -F'","' '{printf "%s\",\"%s\"\n", $1, $2 }' test.txt
"col1","col2"
"col4","col5,subtext"