按第四列排序

按第四列排序

filename.csv我有一个包含以下内容的CSV 文件。

文件名.csv:

"Afghanistan","94.0","81.1"
"Bahamas","42.9","43.2"
"Bolivia (Plurinational State of)","86.7","31.9"
"Brazil","76.7","0.0"

我想计算两列(第 2 列 - 第 3 列)之间的差异,并将输出粘贴到第四列。之后,我想按第四列进行数字排序。但是,我使用的命令不是按第四列排序。

我使用的命令:awk -F'","' '{ print $0, $2 - $3 }' filename.csv | sort -k4 -n

我得到的输出:

"Afghanistan","94.0","81.1" 12.9
"Bahamas","42.9","43.2" -0.3
"Bolivia (Plurinational State of)","86.7","31.9" 54.8
"Brazil","76.7","0.0" 76.7

预期输出文件:

"Bahamas","42.9","43.2","-0.3"    
"Afghanistan","94.0","81.1","12.9"
"Bolivia (Plurinational State of)","86.7","31.9","54.8"
"Brazil","76.7","0.0","76.7"

任何帮助表示赞赏。谢谢!

答案1

您遇到的问题是 [sort] 将从非空白到空白的转换理解为字段转换,因为您没有使用 -t 来定义字段分隔符,它使用默认值而不是“,”,但是,在你的 awk 中,你忘记了包含“,”,这样如果你更改为:

awk -F'","' '{ print $0, $2 - $3 }' filename.csv| sort -k2 -n

你会得到:

"Bahamas","42.9","43.2" -0.3
"Bolivia (Plurinational State of)","86.7","31.9" 54.8
"Afghanistan","94.0","81.1" 12.9
"Brazil","76.7","0.0" 76.7

然而,由于逗号问题,这并不是您所期望的,这将通过以下方式解决:

awk -F'","'  'BEGIN {IFS=OFS=","}{ print $0, $2 - $3 }' entrada | sort -k4 -n -t","

并得到预期的结果:

"Bahamas","42.9","43.2",-0.3
"Afghanistan","94.0","81.1",12.9
"Bolivia (Plurinational State of)","86.7","31.9",54.8
"Brazil","76.7","0.0",76.7

我有点不愿意使用这种方法,因为 CSV 倾向于在文本字段内使用逗号,并且这是通过引用此类字段来处理的,这将不得不进行另一次解析以避免这种情况,然后生成一些其他 OFS 并使用它用于排序,然后将 OFS 返回到前一个逗号。

华泰

答案2

我建议使用 OFS 来指定输出字段分隔符,这样它也使用,而不是空格。然后,您可以使用-t排序选项来指定字段分隔符。

awk -F'","'  'OFS=","{ print $0,$2-$3 }' filename.csv | sort -t ',' -k4 -n

相关内容