请帮助编写以下 shell 脚本。我需要计算样本(col2)中每个泳道(col1)中一致变量的数量。例如,由于所有三个样本中lane1变量1的所有值(col4)都是样本,因此将variable1计入一致变量。同样,泳道 2 的变量 2 和 3 也不一致。
lane1 sample1 variable1 ab
lane1 sample2 variable1 ab
lane1 sample3 variable1 ab
lane1 sample1 variable2 cd
lane1 sample2 variable2 cd
lane1 sample3 variable2 cd
lane1 sample1 variable3 gh
lane1 sample2 variable3 ab
lane1 sample3 variable3 gh
lane2 sample1 variable1 ac
lane2 sample2 variable1 ac
lane2 sample3 variable1 ac
lane2 sample1 variable2 gt
lane2 sample2 variable2 gt
lane2 sample3 variable2 ac
lane2 sample1 variable3 ga
lane2 sample2 variable3 ga
lane2 sample3 variable3 ac
输出
所有三个样本中一致和不一致变量的数量
#Consistent #Inconsistent
lane1 2 1
lane2 1 2
答案1
Perl解决方案:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my %values;
while (<>) {
next if /^$/; # Skip empty lines
my ($lane, $sample, $var, $val) = split;
die "Duplicate $lane $sample $var\n" if $values{$lane}{$var}{$val}{$sample};
$values{$lane}{$var}{$val}{$sample} = 1;
}
my %results;
for my $lane (keys %values) {
for my $var (keys %{ $values{$lane} }) {
my $count = keys %{ $values{$lane}{$var} };
if (1 == $count) {
++$results{$lane}{consistent};
} else {
++$results{$lane}{inconsistent};
}
}
say join "\t", $lane, @{ $results{$lane} }{qw{ consistent inconsistent }};
}