根据相同的列值合并 2 行

根据相同的列值合并 2 行

我有一个如下所示的文件。

47196436 47723284 name1 1.77273

42672249 52856963 name2 1.06061
52856963 430695 name2 1.16667

55094959 380983 name3 1.55613

17926380 55584836 name4 1.02461
3213456 34211 name4 1.11
54321 34211 name4 1.23

前两列对应于我的表中的主键。我试图以这样的方式合并行,如果有相同的名称,则所有键都将位于同一行中。

我试图得到输出,

47196436 47723284 name1
42672249 52856963 430695 name2
55094959 380983 name3
17926380 55584836 3213456 34211 54321 name4

我能够使用以下命令部分实现它。

awk '{ x[$3]=x[$3] " " $2; } 
END { 
   for (k in x) print k,x[k] >"OUTPUT1";  
}' ccc.txt

但是,它没有给我正确的输出。我需要一些帮助来进一步修改上述命令。

答案1

笨拙,但似乎可以完成工作

awk '$3 != prev {if (NR != 1) print prev; prev=$3; delete a};
!($1 in a){a[$1]++; printf "%s ", $1};
!($2 in a){a[$2]++; printf "%s ", $2}; 
END {print prev}' ccc.txt
47196436 47723284 name1
42672249 52856963 430695 name2
55094959 380983 name3
17926380 55584836 3213456 34211 54321 name4

答案2

一个perl办法:

$ perl -ane '$h{$F[2]} .= " ".$F[0]." ".$F[1];
    END {
        for $k (sort keys %h) {
            print $_," " for grep {!$seen{$_}++} split(" ",$h{$k});
            print "$k\n";
        }
    }' file

47196436 47723284 name1
42672249 52856963 430695 name2
55094959 380983 name3
17926380 55584836 3213456 34211 54321 name4

答案3

这是另一种 Perl 方法:

$ perl -ane 'foreach(@F[0..1]){$k{$F[2]}{$_}++}
           END{
                foreach $v (sort keys(%k)){
                    print "$_ " foreach(keys(%{$k{$v}})); 
                    print "$v\n"
                }; 
            } ' file

这会产生:

47723284 47196436 name1
42672249 430695 52856963 name2
380983 55094959 name3
34211 55584836 17926380 54321 3213456 name4

解释

好吧,我承认,上面的 Perl 脚本并不是一个易于理解的 Perl 示例。我使用了很多技巧,它们混淆了代码。我在这里提出相同的解决方案,但格式化为脚本并使用更详细的方法:

#!/usr/bin/perl 

## This is the hash that will store our values. 
my %k;

## Read through the input file line by line
## saving each line as $line. This is what the -n
## switch to perl means, only there each line is saved
## in the special variable $_.
while (my $line=<>) {
    ## Split the line into the @F array. This is
    ## what the -a switch does.
    #chomp($line);
    my @F=split(/\s+/,$line);


    ## Populate the %k hash that we defined at the beginning.
    ## This is a hash of hashes, it looks like this:
    ##   $hash{key1}{key2}=value
    ## In this case, we are saying:
    ##   $hash{3rd field}{1st field}=1 
    ##   $hash{3rd field}{2nd field}=1 
    ## This just serves to add the 1st and 2nd fields
    ## to the list of fields for this $F[2] (the 3rd field, the name).
    ## A side effect of this is that hash keys are unique so duplicates
## are automatically removed.
    $k{$F[2]}{$F[0]}=1;
    $k{$F[2]}{$F[1]}=1;

}

## We have now finished processing the file
## (this is the END{} block above), so let's print.

## This saves the keys of the hash %k in the @names array
## sorted alphabetically.
my @names=(sort keys(%k));


## Go through each of the names, saving
## them as $name
foreach my $name (@names) {
    ## Now, iterate through the values associated 
    ## with the current $name. These are saved as the
    ## keys of the hash %k{$name}
    foreach my $value ( (keys(%{$k{$name}})) ){
      print "$value ";
    } 
    ## Now print the name as well
    print "$name\n";

}

上面的脚本与我发布的脚本执行的操作完全相同,只是进行了扩展以使用更清晰的语法。

答案4

如果您不介意使用gawk >= 4.0,这(与特登的)将产生所需的输出,具有可选的名称和键顺序:

NF {
    Names[$3][$1] = 1;
    Names[$3][$2] = 1;
} 
END {
    PROCINFO["sorted_in"] = "@ind_str_asc"; # if you want `Name` ordered
    for (Name in Names) { 
        PROCINFO["sorted_in"] = "@ind_num_asc"; # if you want `Key` ordered
        for (Key in Names[Name]) {
            printf("%s ", Key);
        }
        print Name;
    }
}

给出:

47196436 47723284 name1
430695 42672249 52856963 name2
380983 55094959 name3
34211 54321 3213456 17926380 55584836 name4

相关内容