对值进行排序并获取最佳分数(最高数字)

对值进行排序并获取最佳分数(最高数字)

我有一个如下所示的文件:

    7  C00000002 score:  -41.156 nathvy =  49 nconfs =         2251
    8  C00000002 score:  -39.520 nathvy =  49 nconfs =         3129
    9  C00000004 score:  -38.928 nathvy =  24 nconfs =          150
   10  C00000002 score:  -38.454 nathvy =  49 nconfs =         9473
   11  C00000004 score:  -37.704 nathvy =  24 nconfs =          156
   12  C00000001 score:  -37.558 nathvy =  41 nconfs =           51
    2  C00000002 score:  -48.649 nathvy =  49 nconfs =         3878
    3  C00000001 score:  -44.988 nathvy =  41 nconfs =         1988
    4  C00000002 score:  -42.674 nathvy =  49 nconfs =         6740
    5  C00000002 score:  -42.453 nathvy =  49 nconfs =         4553
    6  C00000002 score:  -41.829 nathvy =  49 nconfs =         7559

我的第二列是一些未在此处排序的 ID,其中一些是重复的,例如 ( C00000001)。它们都分配有不同的数字,后跟分数:(数字通常以 开头-)。

我想要做的是:

1) 读取第二列(未排序的 ID)并始终选择出现的第一个 ID。因此,如果是,C00000001则选择带有的score : -37.558

2)现在,当我呈现唯一的值时,我想根据后面的数字对它们进行排序score:,这意味着最负的数字位于第一个位置,而最正的数字位于最后一个位置。

我希望以与输入文件相同的方式打印输出(相同结构)。

答案1

$ sort -k2,2 -u < filename | sort -k4,4n

7  C00000002 score:  -41.156 nathvy =  49 nconfs =         2251
9  C00000004 score:  -38.928 nathvy =  24 nconfs =          150
12 C00000001 score:  -37.558 nathvy =  41 nconfs =           51

解释:

  1. sort -k2,2 -u:根据第二列对行进行排序并且不改变它们的顺序(因为它们基本上是相同的值)并保留第一行。
  2. sort -k4,4n:按照分数按数字排序(无需-r反转)。

答案2

使用 GNU awk > 4.0:

$ gawk '
    !seen[$2] {seen[$2] = $0} 
    END {PROCINFO["sorted_in"] = "@val_num_asc"; for (i in seen) print seen[i]}
  ' file
    7  C00000002 score:  -41.156 nathvy =  49 nconfs =         2251
    9  C00000004 score:  -38.928 nathvy =  24 nconfs =          150
   12  C00000001 score:  -37.558 nathvy =  41 nconfs =           51

答案3

贡献一个可以轻松配置的附加单行命令

for row in $(cat tmp |  awk '{print $2}' | sort | uniq); do cat tmp | grep $row | head -n 1; done | sort -r --key=4

7  C00000002 score:  -41.156 nathvy =  49 nconfs =         2251
9  C00000004 score:  -38.928 nathvy =  24 nconfs =          150
12  C00000001 score:  -37.558 nathvy =  41 nconfs =           51

相关内容