根据相同的列值合并 2 行

Question 1

笨拙，但似乎可以完成工作

awk '$3 != prev {if (NR != 1) print prev; prev=$3; delete a};
!($1 in a){a[$1]++; printf "%s ", $1};
!($2 in a){a[$2]++; printf "%s ", $2}; 
END {print prev}' ccc.txt
47196436 47723284 name1
42672249 52856963 430695 name2
55094959 380983 name3
17926380 55584836 3213456 34211 54321 name4

Answer

笨拙，但似乎可以完成工作

awk '$3 != prev {if (NR != 1) print prev; prev=$3; delete a};
!($1 in a){a[$1]++; printf "%s ", $1};
!($2 in a){a[$2]++; printf "%s ", $2}; 
END {print prev}' ccc.txt
47196436 47723284 name1
42672249 52856963 430695 name2
55094959 380983 name3
17926380 55584836 3213456 34211 54321 name4

Question 2

一个perl办法：

$ perl -ane '$h{$F[2]} .= " ".$F[0]." ".$F[1];
    END {
        for $k (sort keys %h) {
            print $_," " for grep {!$seen{$_}++} split(" ",$h{$k});
            print "$k\n";
        }
    }' file

47196436 47723284 name1
42672249 52856963 430695 name2
55094959 380983 name3
17926380 55584836 3213456 34211 54321 name4

Answer

一个perl办法：

$ perl -ane '$h{$F[2]} .= " ".$F[0]." ".$F[1];
    END {
        for $k (sort keys %h) {
            print $_," " for grep {!$seen{$_}++} split(" ",$h{$k});
            print "$k\n";
        }
    }' file

47196436 47723284 name1
42672249 52856963 430695 name2
55094959 380983 name3
17926380 55584836 3213456 34211 54321 name4

Question 3

这是另一种 Perl 方法：

$ perl -ane 'foreach(@F[0..1]){$k{$F[2]}{$_}++}
           END{
                foreach $v (sort keys(%k)){
                    print "$_ " foreach(keys(%{$k{$v}})); 
                    print "$v\n"
                }; 
            } ' file

这会产生：

47723284 47196436 name1
42672249 430695 52856963 name2
380983 55094959 name3
34211 55584836 17926380 54321 3213456 name4

解释

好吧，我承认，上面的 Perl 脚本并不是一个易于理解的 Perl 示例。我使用了很多技巧，它们混淆了代码。我在这里提出相同的解决方案，但格式化为脚本并使用更详细的方法：

#!/usr/bin/perl 

## This is the hash that will store our values. 
my %k;

## Read through the input file line by line
## saving each line as $line. This is what the -n
## switch to perl means, only there each line is saved
## in the special variable $_.
while (my $line=<>) {
    ## Split the line into the @F array. This is
    ## what the -a switch does.
    #chomp($line);
    my @F=split(/\s+/,$line);


    ## Populate the %k hash that we defined at the beginning.
    ## This is a hash of hashes, it looks like this:
    ##   $hash{key1}{key2}=value
    ## In this case, we are saying:
    ##   $hash{3rd field}{1st field}=1 
    ##   $hash{3rd field}{2nd field}=1 
    ## This just serves to add the 1st and 2nd fields
    ## to the list of fields for this $F[2] (the 3rd field, the name).
    ## A side effect of this is that hash keys are unique so duplicates
## are automatically removed.
    $k{$F[2]}{$F[0]}=1;
    $k{$F[2]}{$F[1]}=1;

}

## We have now finished processing the file
## (this is the END{} block above), so let's print.

## This saves the keys of the hash %k in the @names array
## sorted alphabetically.
my @names=(sort keys(%k));


## Go through each of the names, saving
## them as $name
foreach my $name (@names) {
    ## Now, iterate through the values associated 
    ## with the current $name. These are saved as the
    ## keys of the hash %k{$name}
    foreach my $value ( (keys(%{$k{$name}})) ){
      print "$value ";
    } 
    ## Now print the name as well
    print "$name\n";

}

上面的脚本与我发布的脚本执行的操作完全相同，只是进行了扩展以使用更清晰的语法。

Answer

这是另一种 Perl 方法：

$ perl -ane 'foreach(@F[0..1]){$k{$F[2]}{$_}++}
           END{
                foreach $v (sort keys(%k)){
                    print "$_ " foreach(keys(%{$k{$v}})); 
                    print "$v\n"
                }; 
            } ' file

这会产生：

47723284 47196436 name1
42672249 430695 52856963 name2
380983 55094959 name3
34211 55584836 17926380 54321 3213456 name4

解释

好吧，我承认，上面的 Perl 脚本并不是一个易于理解的 Perl 示例。我使用了很多技巧，它们混淆了代码。我在这里提出相同的解决方案，但格式化为脚本并使用更详细的方法：

#!/usr/bin/perl 

## This is the hash that will store our values. 
my %k;

## Read through the input file line by line
## saving each line as $line. This is what the -n
## switch to perl means, only there each line is saved
## in the special variable $_.
while (my $line=<>) {
    ## Split the line into the @F array. This is
    ## what the -a switch does.
    #chomp($line);
    my @F=split(/\s+/,$line);


    ## Populate the %k hash that we defined at the beginning.
    ## This is a hash of hashes, it looks like this:
    ##   $hash{key1}{key2}=value
    ## In this case, we are saying:
    ##   $hash{3rd field}{1st field}=1 
    ##   $hash{3rd field}{2nd field}=1 
    ## This just serves to add the 1st and 2nd fields
    ## to the list of fields for this $F[2] (the 3rd field, the name).
    ## A side effect of this is that hash keys are unique so duplicates
## are automatically removed.
    $k{$F[2]}{$F[0]}=1;
    $k{$F[2]}{$F[1]}=1;

}

## We have now finished processing the file
## (this is the END{} block above), so let's print.

## This saves the keys of the hash %k in the @names array
## sorted alphabetically.
my @names=(sort keys(%k));


## Go through each of the names, saving
## them as $name
foreach my $name (@names) {
    ## Now, iterate through the values associated 
    ## with the current $name. These are saved as the
    ## keys of the hash %k{$name}
    foreach my $value ( (keys(%{$k{$name}})) ){
      print "$value ";
    } 
    ## Now print the name as well
    print "$name\n";

}

上面的脚本与我发布的脚本执行的操作完全相同，只是进行了扩展以使用更清晰的语法。

Question 4

如果您不介意使用gawk >= 4.0，这（与特登的）将产生所需的输出，具有可选的名称和键顺序：

NF {
    Names[$3][$1] = 1;
    Names[$3][$2] = 1;
} 
END {
    PROCINFO["sorted_in"] = "@ind_str_asc"; # if you want `Name` ordered
    for (Name in Names) { 
        PROCINFO["sorted_in"] = "@ind_num_asc"; # if you want `Key` ordered
        for (Key in Names[Name]) {
            printf("%s ", Key);
        }
        print Name;
    }
}

给出：

47196436 47723284 name1
430695 42672249 52856963 name2
380983 55094959 name3
34211 54321 3213456 17926380 55584836 name4

Answer

如果您不介意使用gawk >= 4.0，这（与特登的）将产生所需的输出，具有可选的名称和键顺序：

NF {
    Names[$3][$1] = 1;
    Names[$3][$2] = 1;
} 
END {
    PROCINFO["sorted_in"] = "@ind_str_asc"; # if you want `Name` ordered
    for (Name in Names) { 
        PROCINFO["sorted_in"] = "@ind_num_asc"; # if you want `Key` ordered
        for (Key in Names[Name]) {
            printf("%s ", Key);
        }
        print Name;
    }
}

给出：

47196436 47723284 name1
430695 42672249 52856963 name2
380983 55094959 name3
34211 54321 3213456 17926380 55584836 name4

根据相同的列值合并 2 行

答案1

答案2

答案3

解释

答案4

相关内容