修剪尾部 - 并将其添加到 shell 中的数字前导

修剪尾部 - 并将其添加到 shell 中的数字前导

我有一个 CSV 文件,其中的数字用双引号引起来,有些数字则不带双引号。我必须仅修复数字的负号:必须删除尾随的负号并将其添加到开头。

输入示例:

column 1, column 2, column 3, column 4, column 5
12-,"455,365.44-","string with quotes-and with a comma in between","4,432",6787

示例输出:

column 1, column 2, column 3, column 4, column 5
-12,"-455,365.44","string with quotes-and with a comma in between","4,432",6787

答案1

GNUawk解决方案:

awk -v FPAT='[^,"]+|"[^"]+"' '
       NR==1; NR>1{ 
           for (i=1; i<=NF; i++) { 
               if ($i~/^"?[0-9]+([0-9,.]+[0-9]+)?-"?$/) { 
                   sub(/-/, "", $i); 
                   sub(/[0-9]/, "-&", $i); 
               } 
               printf "%s%s",$i,(i==NF? ORS:",") 
           } 
       }' file.csv
  • -v FPAT='[^,"]+|"[^"]+"'- 定义字段值的正则表达式模式
  • $i~/^"?[0-9]+([0-9,.]+[0-9]+)?-"?$/- 检查字段是否包含尾随减号的数字-(数字可以用双引号引起来)

输出:

column 1, column 2, column 3, column 4, column 5
-12,"-455,365.44","string with quotes-and with a comma in between","4,432",6787

答案2

CSV 数据需要 CSV 解析器。 Ruby 有一个:

$ cat file.csv
12-,"455,365.44-","string with quotes-and with a comma in between","4,432",6787

$ ruby -rcsv -e '
    CSV.foreach(ARGV.shift) do |row|
        corrected = row.collect {|e| e.sub(/^([\d,.]+)-$/, "-\\1")}
        puts CSV.generate_line(corrected)
    end
' file.csv
-12,"-455,365.44",string with quotes-and with a comma in between,"4,432",6787

CSV 生成器决定不需要引用“带引号的字符串”,因为它不包含逗号。

答案3

不使用gawkFPAT 的 awk 解决方案:

NR==1;

NR > 1 {
    $0 = $0","

    while ($0) {
        match($0, / *"[^"]*" *,|[^,]*,/)
        f = substr($0,RSTART,RLENGTH-1)             # save what matched in f
        if (( f ~ /^"[0-9]([0-9,.]+[0-9]+)-"$/ ) ||
            ( f ~ /^[0-9]+[.]?[0-9]+-$/ ) ||
            ( f ~ /^[0-9]+-$/ )) {
            sub(/-/, "", f);
            sub(/[0-9]/, "-&", f);
        }
        $0 = substr($0, RLENGTH+1)                 
        printf "%s%s", f, (0 == NF ? "\n" : ",")
    }
}

提供的示例文件的输出是:

column 1, column 2, column 3, column 4, column 5
-12,"-455,365.44","string with quotes-and with a comma in between","4,432",6787

答案4

通过sed支持 的实现-E,假设字符串字段中嵌入的双引号被编码为""并且这些字符串字段不包含换行符:

sed -E '
  :1
  s/^(("[^"]*"|[^"])*,)?([0-9.]+)-(,|$)/\1-\3\4/;
  s/^(("[^"]*"|[^"])*,)?"([0-9,.]+)-"(,|$)/\1"-\3"\4/
  t1' < file

根据输入的性质,您可能希望更严格地匹配数字。例如,([0-9.]+)-这里会匹配 on12-但也会匹配...-。如果该类型的输入可能出现在输入中,您可以将其更改为([0-9]*\.?[0-9]+)-例如。

相关内容