filename.csv
我有一个包含以下内容的CSV 文件。
文件名.csv:
"Afghanistan","94.0","81.1"
"Bahamas","42.9","43.2"
"Bolivia (Plurinational State of)","86.7","31.9"
"Brazil","76.7","0.0"
我想计算两列(第 2 列 - 第 3 列)之间的差异,并将输出粘贴到第四列。之后,我想按第四列进行数字排序。但是,我使用的命令不是按第四列排序。
我使用的命令:awk -F'","' '{ print $0, $2 - $3 }' filename.csv | sort -k4 -n
我得到的输出:
"Afghanistan","94.0","81.1" 12.9
"Bahamas","42.9","43.2" -0.3
"Bolivia (Plurinational State of)","86.7","31.9" 54.8
"Brazil","76.7","0.0" 76.7
预期输出文件:
"Bahamas","42.9","43.2","-0.3"
"Afghanistan","94.0","81.1","12.9"
"Bolivia (Plurinational State of)","86.7","31.9","54.8"
"Brazil","76.7","0.0","76.7"
任何帮助表示赞赏。谢谢!
答案1
您遇到的问题是 [sort] 将从非空白到空白的转换理解为字段转换,因为您没有使用 -t 来定义字段分隔符,它使用默认值而不是“,”,但是,在你的 awk 中,你忘记了包含“,”,这样如果你更改为:
awk -F'","' '{ print $0, $2 - $3 }' filename.csv| sort -k2 -n
你会得到:
"Bahamas","42.9","43.2" -0.3
"Bolivia (Plurinational State of)","86.7","31.9" 54.8
"Afghanistan","94.0","81.1" 12.9
"Brazil","76.7","0.0" 76.7
然而,由于逗号问题,这并不是您所期望的,这将通过以下方式解决:
awk -F'","' 'BEGIN {IFS=OFS=","}{ print $0, $2 - $3 }' entrada | sort -k4 -n -t","
并得到预期的结果:
"Bahamas","42.9","43.2",-0.3
"Afghanistan","94.0","81.1",12.9
"Bolivia (Plurinational State of)","86.7","31.9",54.8
"Brazil","76.7","0.0",76.7
我有点不愿意使用这种方法,因为 CSV 倾向于在文本字段内使用逗号,并且这是通过引用此类字段来处理的,这将不得不进行另一次解析以避免这种情况,然后生成一些其他 OFS 并使用它用于排序,然后将 OFS 返回到前一个逗号。
华泰
答案2
我建议使用 OFS 来指定输出字段分隔符,这样它也使用,
而不是空格。然后,您可以使用-t
排序选项来指定字段分隔符。
awk -F'","' 'OFS=","{ print $0,$2-$3 }' filename.csv | sort -t ',' -k4 -n