请指导如何解决这个问题。我有一对需要修改的文件,以便它们具有相同顺序的相同公共列。
如果我的文件是File1和File2如下
R1 C1 C2 C3 C4
R2 1 2 3 4
R3 5 6 7 8
R6 C4 C3 C6 C7
R7 9 10 11 12
R8 13 14 15 16
我正在寻找 mod_File1 和 mod_File2
R1 C3 C4
R2 3 4
R3 7 8
R6 C3 C4
R7 10 9
R8 14 13
这是我尝试过的
awk '
FNR==1 {F++}
F==1 {
if (NR==1)
for (i=2;i<NF;i++)
{
col1[$i];
}
next
}
F==2 {
if (NR==1)
for (i=2;i<NF;i++)
{
col2[$i];
}
next
}
F=3 { NR==1 {
for (i=2;i<NF;i++)
if ($i in cols2)
c1[i];
}
NR>1 { for (j in c1)
print $j >> mod_file1
}
F=4 { NR==1 {
for (i=2;i<NF;i++)
if ($i in cols1)
c1[i];
}
NR>1 { for (j in c1)
print $j >> mod_file2
}
' file1 file1 file2 file2
答案1
它比看起来要复杂一些 - 可能有一个库可以做得更好(perl 中有很多数学库)。
但这应该可以达到您想要的效果:
#!/usr/bin/perl
use strict;
use warnings;
#read file 1
open( my $file1, "<", "data1.txt" ) or die $!;
my $header_line = <$file1>;
chomp($header_line);
my ( $column1, @headers1 ) = split( ' ', $header_line );
my %results;
my %headers_in_file1 = map { $_ => 1 } @headers1;
for (<$file1>) {
my ( $column, @values ) = split;
my %these_results;
@these_results{@headers1} = @values;
$results{$column} = \%these_results;
}
close ( $file1);
#read file 2
open( my $file2, "<", "data2.txt" ) or die $!;
$header_line = <$file2>;
chomp($header_line);
my ( $column2, @headers2 ) = split( ' ', $header_line );
my %results2;
my %headers_in_file2 = map { $_ => 1 } @headers2;
for (<$file2>) {
my ( $column, @values ) = split;
my %these_results;
@these_results{@headers2} = @values;
$results2{$column} = \%these_results;
}
close ( $file2 );
#figure out the columns in both
my %in_both;
foreach my $header ( @headers1, @headers2 ) {
if ( $headers_in_file1{$header}
and $headers_in_file2{$header} )
{
$in_both{$header}++;
}
}
#sort out headers for output.
my @output_headers = sort keys %in_both;
print join( " ", $column1, @output_headers ), "\n";
foreach my $row ( sort keys %results ) {
print $row, " ";
for my $header (@output_headers) {
print $results{$row}{$header}, " ";
}
print "\n";
}
print "Second\n";
print join( " ", $column2, @output_headers ), "\n";
foreach my $row ( sort keys %results2 ) {
print $row, " ";
for my $header (@output_headers) {
print $results2{$row}{$header}, " ";
}
print "\n";
}