我有一个包含以下内容的文件。我希望根据最后一列(以及倒数第三列,但对于另一个文件)对该文件进行排序,同时保留该行的其余内容。
ABC,DEF,GHI,-5,-8,-0.6,0.488
XYZ,JKL,MNO,3,-5,0.2,-0.342
STU,WXY,DEF,-1,4,0.01,0.345
如果我使用此命令,它会按预期工作并显示正确的结果:
awk '{print $NF,$0}' FILE | sort -nr | cut -f2- -d' '
XYZ,JKL,MNO,3,-5,0.2,-0.342
STU,WXY,DEF,-1,4,0.01,0.345
ABC,DEF,GHI,-5,-8,-0.6,0.488
但在更大的文件上执行相同的命令会给出错误的结果。 (我要排序的文件有 4M 行)输入:
ABC,DEF,GHI,-5,-8,-0.6,0.0488
XYZ,JKL,MNO,3,-5,0.2,-0.0342
STU,WXY,DEF,-1,4,0.01,0.0345
JKL,JKL,GHI,-2,-3,0.31,-0.0524
QRS,GHI,YUT,-3,-1,0.20,-0.0503
HUR,JTL,ZST,1,1,0.52,-0.0556
FTT,JL,MKI,0,2,0.21,-0.0529
FTC,JKL,ERW,-1,6,0.23,-0.0441
HJI,MHP,VGT,1,-6,0.80,-0.0433
BUT,IOP,HGT,2,2,0.2,-0.0439
XYZ,BGY,MNO,-2,1,0.01,-0.0416
答案1
如果您知道有多少个字段:
$ sort -t, -k7,7n file
XYZ,JKL,MNO,3,-5,0.2,-0.342
STU,WXY,DEF,-1,4,0.01,0.345
ABC,DEF,GHI,-5,-8,-0.6,0.488
或者如果您不这样做:
$ awk 'BEGIN{FS=OFS=","} {print $NF,$0}' file | sort -t, -k1,1n | cut -d, -f2-
XYZ,JKL,MNO,3,-5,0.2,-0.342
STU,WXY,DEF,-1,4,0.01,0.345
ABC,DEF,GHI,-5,-8,-0.6,0.488
按倒数第三个字段而不是最后一个字段排序显然就是:
$ awk 'BEGIN{FS=OFS=","} {print $(NF-2),$0}' file | sort -t, -k1,1n | cut -d, -f2-
ABC,DEF,GHI,-5,-8,-0.6,0.488
XYZ,JKL,MNO,3,-5,0.2,-0.342
STU,WXY,DEF,-1,4,0.01,0.345
如果您想在多行具有相同的排序字段值时保留输入顺序,那么如果您有 GNU 排序,则可以使用-s
,否则包括行号作为辅助排序键:
$ awk 'BEGIN{FS=OFS=","} {print $NF,NR,$0}' file | sort -t, -k1,1n -k2,2n | cut -d, -f3-
XYZ,JKL,MNO,3,-5,0.2,-0.342
STU,WXY,DEF,-1,4,0.01,0.345
ABC,DEF,GHI,-5,-8,-0.6,0.488
答案2
假设输入相对简单(记录中没有出现分隔符),您可以按给定列进行排序,如下所示:
$ c=$(< input awk -F, 'NR==1 { print NF; exit }')
$ < input sort -t, -k $c,${c}n --debug
输出:
XYZ,JKL,MNO,3,-5,0.2,-0.342
______
___________________________
STU,WXY,DEF,-1,4,0.01,0.345
_____
___________________________
ABC,DEF,GHI,-5,-8,-0.6,0.488
_____
____________________________
在这里,我们说排序是使用逗号作为字段之间的分隔符,并(仅)按特定字段排序,我们在上一步中发现该字段是最后一个字段。
顺便说一句,你的问题很可能会被关闭,因为毫无疑问它已经被问过并回答过。