使用 awk 替换双引号

Question 1

您实际上并没有过多说明数据的来源或预期格式。如果练习可以重新表述为“替换为"(”或“替换为” ，则以下两个命令可以做到这一点：chr(34)("))chr(34)"(tst)"chr(34)(tst)chr(23)sed

$ sed -e 's/"(/chr(34)(/' -e 's/)"/)chr(34)/' file
"this is txt1","this is txt2",3,"this txt3","txt4 chr(34)(tst)chr(34)"

$ sed 's/"\((tst)\)"/chr(34)\1chr(34)/' file
"this is txt1","this is txt2",3,"this txt3","txt4 chr(34)(tst)chr(34)"

无法将文本解析为 CSV 记录，因为最后一个字段的格式无效。该字段的正确引用版本应该是这样的"txt4 ""(tst)"""。

Answer

您实际上并没有过多说明数据的来源或预期格式。如果练习可以重新表述为“替换为"(”或“替换为” ，则以下两个命令可以做到这一点：chr(34)("))chr(34)"(tst)"chr(34)(tst)chr(23)sed

$ sed -e 's/"(/chr(34)(/' -e 's/)"/)chr(34)/' file
"this is txt1","this is txt2",3,"this txt3","txt4 chr(34)(tst)chr(34)"

$ sed 's/"\((tst)\)"/chr(34)\1chr(34)/' file
"this is txt1","this is txt2",3,"this txt3","txt4 chr(34)(tst)chr(34)"

无法将文本解析为 CSV 记录，因为最后一个字段的格式无效。该字段的正确引用版本应该是这样的"txt4 ""(tst)"""。

Question 2

这里观察到有效的 CSV 字段引号位于行首、行尾或逗号旁边。因此：搜索每个引号及其两侧的字符。如果两者都不是逗号，则加倍引号。

这并非绝对正确：逗号可以位于有效 CSV 的引号内，例如：“one field,”“here”。但这适用于您的数据。

测试：

Paul--) ./awkFixCsv

"this is txt1","this is txt2",3,"this txt3","txt4 "(tst)"" <<< Input
"this is txt1","this is txt2",3,"this txt3","txt4 ""(tst)""" <<< Output

"this is txt1","this is txt2",3,"this txt3","txt4 "(tst)"",""","""","done" <<< Input
"this is txt1","this is txt2",3,"this txt3","txt4 ""(tst)""","""","""""","done" <<< Output

One,Two,"3","Four","Five "and" Six",Seven and Eight,"Nine" <<< Input
One,Two,"3","Four","Five ""and"" Six",Seven and Eight,"Nine" <<< Output
Paul--)

代码，测试数据作为此处文档，Fix 作为函数。如果您不知道如何将其合并到您的脚本中，请发表评论。

#! /bin/bash

AWK='

function Fix (s, Local, t, u, x) {
    while (match (s, ".\042.")) {
        u = substr (s, RSTART, RLENGTH);
        x = (u ~ /..,/ || u ~ /,../) ? 0 : 1;
        t = t substr (s, 1, RSTART + x); 
        s = substr (s, RSTART + 1); 
    }
    return (t s);
}

{ print "\n" $0 " <<< Input"; }
{ $0 = Fix( $0); }
{ print $0 " <<< Output"; }
'
    awk "${AWK}" <<[][]
"this is txt1","this is txt2",3,"this txt3","txt4 "(tst)""
"this is txt1","this is txt2",3,"this txt3","txt4 "(tst)"",""","""","done"
One,Two,"3","Four","Five "and" Six",Seven and Eight,"Nine"
[][]

Answer

这里观察到有效的 CSV 字段引号位于行首、行尾或逗号旁边。因此：搜索每个引号及其两侧的字符。如果两者都不是逗号，则加倍引号。

这并非绝对正确：逗号可以位于有效 CSV 的引号内，例如：“one field,”“here”。但这适用于您的数据。

测试：

Paul--) ./awkFixCsv

"this is txt1","this is txt2",3,"this txt3","txt4 "(tst)"" <<< Input
"this is txt1","this is txt2",3,"this txt3","txt4 ""(tst)""" <<< Output

"this is txt1","this is txt2",3,"this txt3","txt4 "(tst)"",""","""","done" <<< Input
"this is txt1","this is txt2",3,"this txt3","txt4 ""(tst)""","""","""""","done" <<< Output

One,Two,"3","Four","Five "and" Six",Seven and Eight,"Nine" <<< Input
One,Two,"3","Four","Five ""and"" Six",Seven and Eight,"Nine" <<< Output
Paul--)

代码，测试数据作为此处文档，Fix 作为函数。如果您不知道如何将其合并到您的脚本中，请发表评论。

#! /bin/bash

AWK='

function Fix (s, Local, t, u, x) {
    while (match (s, ".\042.")) {
        u = substr (s, RSTART, RLENGTH);
        x = (u ~ /..,/ || u ~ /,../) ? 0 : 1;
        t = t substr (s, 1, RSTART + x); 
        s = substr (s, RSTART + 1); 
    }
    return (t s);
}

{ print "\n" $0 " <<< Input"; }
{ $0 = Fix( $0); }
{ print $0 " <<< Output"; }
'
    awk "${AWK}" <<[][]
"this is txt1","this is txt2",3,"this txt3","txt4 "(tst)""
"this is txt1","this is txt2",3,"this txt3","txt4 "(tst)"",""","""","done"
One,Two,"3","Four","Five "and" Six",Seven and Eight,"Nine"
[][]

Question 3

珀尔的文本::CSV模块非常擅长处理这样的格式错误的 CSV。尤其：

如果 CSV 数据确实很糟糕，例如
1,"foo "bar" baz",42 or

1,""foo bar baz"",42 
有一种方法可以解析此数据行并将引号按原样保留在带引号的字段内。这可以通过设置allow_loose_quotes并确保escape_char不等于quote_char来实现。

例如

$ echo '"this is txt1","this is txt2",3,"this txt3","txt4 "(tst)""' | perl -MText::CSV -lne '
  BEGIN{$p = Text::CSV->new({escape_char => "", allow_loose_quotes => 1, quote_space => 1})} 
  @row = $p->fields() if $p->parse($_); 
  $p->escape_char("\""); $p->print(*STDOUT,\@row);
'
"this is txt1","this is txt2",3,"this txt3","txt4 ""(tst)"""

Answer

珀尔的文本::CSV模块非常擅长处理这样的格式错误的 CSV。尤其：

如果 CSV 数据确实很糟糕，例如
1,"foo "bar" baz",42 or

1,""foo bar baz"",42 
有一种方法可以解析此数据行并将引号按原样保留在带引号的字段内。这可以通过设置allow_loose_quotes并确保escape_char不等于quote_char来实现。

例如

$ echo '"this is txt1","this is txt2",3,"this txt3","txt4 "(tst)""' | perl -MText::CSV -lne '
  BEGIN{$p = Text::CSV->new({escape_char => "", allow_loose_quotes => 1, quote_space => 1})} 
  @row = $p->fields() if $p->parse($_); 
  $p->escape_char("\""); $p->print(*STDOUT,\@row);
'
"this is txt1","this is txt2",3,"this txt3","txt4 ""(tst)"""

使用 awk 替换双引号

答案1

答案2

答案3

相关内容