我有一个 CSV 文件,其中的数字用双引号引起来,有些数字则不带双引号。我必须仅修复数字的负号:必须删除尾随的负号并将其添加到开头。
输入示例:
column 1, column 2, column 3, column 4, column 5
12-,"455,365.44-","string with quotes-and with a comma in between","4,432",6787
示例输出:
column 1, column 2, column 3, column 4, column 5
-12,"-455,365.44","string with quotes-and with a comma in between","4,432",6787
答案1
GNUawk
解决方案:
awk -v FPAT='[^,"]+|"[^"]+"' '
NR==1; NR>1{
for (i=1; i<=NF; i++) {
if ($i~/^"?[0-9]+([0-9,.]+[0-9]+)?-"?$/) {
sub(/-/, "", $i);
sub(/[0-9]/, "-&", $i);
}
printf "%s%s",$i,(i==NF? ORS:",")
}
}' file.csv
-v FPAT='[^,"]+|"[^"]+"'
- 定义字段值的正则表达式模式$i~/^"?[0-9]+([0-9,.]+[0-9]+)?-"?$/
- 检查字段是否包含尾随减号的数字-
(数字可以用双引号引起来)
输出:
column 1, column 2, column 3, column 4, column 5
-12,"-455,365.44","string with quotes-and with a comma in between","4,432",6787
答案2
CSV 数据需要 CSV 解析器。 Ruby 有一个:
$ cat file.csv
12-,"455,365.44-","string with quotes-and with a comma in between","4,432",6787
$ ruby -rcsv -e '
CSV.foreach(ARGV.shift) do |row|
corrected = row.collect {|e| e.sub(/^([\d,.]+)-$/, "-\\1")}
puts CSV.generate_line(corrected)
end
' file.csv
-12,"-455,365.44",string with quotes-and with a comma in between,"4,432",6787
CSV 生成器决定不需要引用“带引号的字符串”,因为它不包含逗号。
答案3
不使用gawk
FPAT 的 awk 解决方案:
NR==1;
NR > 1 {
$0 = $0","
while ($0) {
match($0, / *"[^"]*" *,|[^,]*,/)
f = substr($0,RSTART,RLENGTH-1) # save what matched in f
if (( f ~ /^"[0-9]([0-9,.]+[0-9]+)-"$/ ) ||
( f ~ /^[0-9]+[.]?[0-9]+-$/ ) ||
( f ~ /^[0-9]+-$/ )) {
sub(/-/, "", f);
sub(/[0-9]/, "-&", f);
}
$0 = substr($0, RLENGTH+1)
printf "%s%s", f, (0 == NF ? "\n" : ",")
}
}
提供的示例文件的输出是:
column 1, column 2, column 3, column 4, column 5
-12,"-455,365.44","string with quotes-and with a comma in between","4,432",6787
答案4
通过sed
支持 的实现-E
,假设字符串字段中嵌入的双引号被编码为""
并且这些字符串字段不包含换行符:
sed -E '
:1
s/^(("[^"]*"|[^"])*,)?([0-9.]+)-(,|$)/\1-\3\4/;
s/^(("[^"]*"|[^"])*,)?"([0-9,.]+)-"(,|$)/\1"-\3"\4/
t1' < file
根据输入的性质,您可能希望更严格地匹配数字。例如,([0-9.]+)-
这里会匹配 on12-
但也会匹配...-
。如果该类型的输入可能出现在输入中,您可以将其更改为([0-9]*\.?[0-9]+)-
例如。