将一列的值与另一列中的所有值进行比较

将一列的值与另一列中的所有值进行比较

我有 2 个输入文件。的每一行都File1应该与 的每一行进行比较File2

逻辑是:

  1. 如果Column1of 与 的(其下的所有值)File1不匹配,则在输出文件中打印整行。同样,将 的每个值与的每个值进行比较。Column1File2File1Column1Column1File2

  2. 如果两个文件中的 匹配,并且的Column1值大于或小于,其中的值是,则只能打印 的整行并像这样比较所有行。Column2File1N+10N-10NColumn2File2File1File2

File1:

Contig1  23
Contig1  42
Contig2  68
Contig3  89
Contig3  102
Contig7  79

File2:

Contig1  40
Contig1  49
Contig3  90
Contig2  90
Contig20 200
Contig1  24

预期输出:

Contig2  68
Contig3  102
Contig7  79

任何解决方案,即使是没有awk或 的解决方案都sed可以。

我发现了类似的问题,但我不太确定我必须做什么:

这是代码:

  `NR==FNR { 
   lines[NR,"col1"] = $1
   lines[NR,"col2"] = $2
   lines[NR,"line"] = $0
   next
    }
   (lines[FNR,"col1"] != $1) {
    print lines[FNR,"line"]
    next
    }
   (lines[FNR,"col2"]+10 < $2 || lines[FNR,"col2"]-10 > $2) {
    print lines[FNR,"line"]
    }' file1 file2`

答案1

下面的脚本执行以下操作,我认为这就是您想要的:

  1. 如果 file2 中不存在 file1 中的重叠群,则打印该重叠群的所有行。
  2. 如果它存在于 file2 中,则对于 file1 中的每个值,仅当它不小于 file2 中该重叠群的任何值 -10 或大于 file2 的任何值 +10 时才打印它。
#!/usr/bin/env perl

my (%file1, %file2);

## read file1, the 1st argument
open(F1,"$ARGV[0]");
while(<F1>){
    chomp;
    ## Split the line on whitespace into the @F array.
    my @F=split(/\s+/); 

    ## Save all lines in the %file1 hash.
    ## $F[0] is the contig name and $F[1] the value.
    ## The hash will store a list of all values
    ## associated with this contig.
    push @{$file1{$F[0]}},$F[1];
}
close(F1);
## read file2, the second argument
open(F2,"$ARGV[1]"); 
while(<F2>){
    ## remove newlines
    chomp;
    ## save the fields into array @F
    my @F=split(/\s+/); 
    ## Again, save all values associated with each
    ## contig into the %file2 hash. 
    push @{$file2{$F[0]}},$F[1];
}
close(F2);

## For each of the contigs in file1
foreach my $contig (keys(%file1)) {
    ## If this contig exists in file 2
    if(defined $file2{$contig}){
        ## get the list of values for that contig
        ## in each of the two files
        my @f2_vals=@{$file2{$contig}};
        my @f1_vals=@{$file1{$contig}};
        ## For each of file1's values for this contig
        val1:foreach my $val1 (@f1_vals) {
                ## For each of file2's value for this contig
                foreach my $val2 (@f2_vals) {
                    ## Skip to the next value from file1 unless
                    ## this one falls within the desired range.
                    unless(($val1 < $val2-10) || ($val1 > $val2+10)){
                        next val1;
                    }
                }
                ## We will only get here if none of the values
                ## fell within the desired range. If so, we should
                ## print the value from file1.
                print "$contig $val1\n";
            }
    }
    ## If this contig is not in file2, print the
    ## lines from file1. This will print all lines
    ## from file1 whose contig was not in file2.
    else {
        print "$contig $_\n" for @{$file1{$contig}}
    }
}

将其保存在文本文件中(例如foo.pl),使其可执行(chmod a+x foo.pl)并像这样运行它:

./foo.pl file1 file2

在您的示例中,它返回:

$ foo.pl file1 file2 
Contig2 68
Contig3 102
Contig7 79

相关内容