当第一个键已经排序时，对 N 个键的数据集进行最佳排序

Question 1

我无法用 sole 做到这一点sort。这可能行不通。

在我的解决方案中，awk处理第一列并sort根据需要运行多次。脚本从 stdin 获取输入，然后打印到 stdout。

#!/usr/bin/awk -f

BEGIN { command = "sort -k 2,2g" }

{
if ( NR==1 ) {
   val=$1
   buf=$0
}
else
if ( $1 < val ) {
   print "Unsorted 1st column detected. Processing last valid chunk and aborting." > "/dev/stderr"
   exit 1
   }
else {
if ( $1 == val )
   buf=buf"\n"$0
else
   {
   print buf | command
   close(command)
   buf=$0
   val=$1
   }
   }
}

END { print buf | command }

笔记：

close(command)至关重要。没有它，所有管道command都会通向单身的 sort。
我认为awk的比较运算符可以很好地处理数字。真的确保解决方案能够正常sort工作，您需要分别检索和sort -c -k 1,1g的退出状态，然后在此基础上构建脚本逻辑。这将在每个输入行上运行两个进程，我预计性能会受到很大影响。val"\n"$1$1"\n"valsort

Answer

我无法用 sole 做到这一点sort。这可能行不通。

在我的解决方案中，awk处理第一列并sort根据需要运行多次。脚本从 stdin 获取输入，然后打印到 stdout。

#!/usr/bin/awk -f

BEGIN { command = "sort -k 2,2g" }

{
if ( NR==1 ) {
   val=$1
   buf=$0
}
else
if ( $1 < val ) {
   print "Unsorted 1st column detected. Processing last valid chunk and aborting." > "/dev/stderr"
   exit 1
   }
else {
if ( $1 == val )
   buf=buf"\n"$0
else
   {
   print buf | command
   close(command)
   buf=$0
   val=$1
   }
   }
}

END { print buf | command }

笔记：

close(command)至关重要。没有它，所有管道command都会通向单身的 sort。
我认为awk的比较运算符可以很好地处理数字。真的确保解决方案能够正常sort工作，您需要分别检索和sort -c -k 1,1g的退出状态，然后在此基础上构建脚本逻辑。这将在每个输入行上运行两个进程，我预计性能会受到很大影响。val"\n"$1$1"\n"valsort

Question 2

Perl 来救援！

它完全按照您的要求进行操作，它将以相同数字开头的输入块传输到仅对第二列进行操作的排序。

#!/usr/bin/perl
use warnings;
use strict;

my $sort;

my $first = -1;
while (<>) {
    my ($x, $y) = split;
    if ($first != $x) {
        die "Unsorted line $." if $first > $x;
        $first = $x;
        open $sort, '|-', 'sort -k2,2n' or die $!;
    }
    print {$sort} $_;
}

唯一的问题可能是初始值$first：如果您的输入在第一列以负数开头，则需要指定一个较小的值。

我使用了n排序而不是g因为它在我的计算机上似乎更快一些。

Answer