使用 cut 排除封闭的分隔符

使用 cut 排除封闭的分隔符

假设我有一个 CSV 文件:

"col1","col2","col3"
"col4","col5,subtext","col6

我遇到的问题如下:

cut -d, -f1,2 test.txt
"coll1","col2"
"col4","col5

所需的输出是:

"col1","col2"
"col4","col5,subtext"

答案1

Perl 附带的 ParseWords 模块非常优雅地涵盖了这一点。下面的例子。

$ perl -MText::ParseWords -nE '@a=quotewords ",",1,$_;say $a[0],",",$a[1]' <test.txt
"col1","col2"
"col4","col5,subtext"
$

答案2

如果您有gawkv4 可用,则有一个很好的解决方案:使用 awk 解析 csv 并忽略字段内的逗号

例子:

gawk -vFPAT='[^,]*|"[^"]*"' '{print $1 "," $2}' test.txt

答案3

另一种perl解决方案,假设所有字段都被引用

$ perl -F'/"\K,(?=")/' -lane 'print "$F[0],$F[1]"' test.txt 
"col1","col2"
"col4","col5,subtext"
  • -F'/"\K,(?=")/'仅当字段分隔符前后为逗号时,字段分隔符才为"逗号"
  • print "$F[0],$F[1]"打印前两个字段,分隔符,


grep也 可以用

$ grep -oE '^"[^"]*","[^"]*"' test.txt 
"col1","col2"
"col4","col5,subtext"

如果需要 N 个字段,请使用里面的grep -oE '^("[^"]*",){1}"[^"]*"'数字{}N-1

答案4

你也可以用 awk 尝试一下,如下所示;

awk -F'","'  '{printf "%s\",\"%s\"\n", $1, $2 }' test.txt 

例如;

user@host$ awk -F'","'  '{printf "%s\",\"%s\"\n", $1, $2 }' test.txt 
"col1","col2"
"col4","col5,subtext"

相关内容