检查列表中是否存在字符串，如果字符串存在则输出第三个文件

Question 1

这可以用 awk 直接表达：

awk 'FNR==NR { h[$1]; next } { for(i=2; i<=NF; i++) $i = ($i in h)? 1 : 0 } 1' mylist.tab data.tab

或者采用更易读的格式：

解析.awk

# Collect mylist.tab into the `h` associative array
FNR==NR {
  h[$1]
  next
}

# For all but the first column in data.tab check and record if it is in `h`
{ 
  for(i=2; i<=NF; i++) 
    $i = ($i in h) ? 1 : 0 
}

# Short for { print $0 }
1

像这样运行它：

awk -f parse.awk mylist.tab data.tab

输出：

Info_1 0 1 1
Info_2 1 0
Info_3 1
Info_4 1 0 0 0 1
Info_5

或者对于制表符分隔的列：

awk -v OFS='\t' -f parse.awk mylist.tab data.tab

输出：

Info_1  0   1   1
Info_2  1   0
Info_3  1
Info_4  1   0   0   0   1
Info_5

Answer

这可以用 awk 直接表达：

awk 'FNR==NR { h[$1]; next } { for(i=2; i<=NF; i++) $i = ($i in h)? 1 : 0 } 1' mylist.tab data.tab

或者采用更易读的格式：

解析.awk

# Collect mylist.tab into the `h` associative array
FNR==NR {
  h[$1]
  next
}

# For all but the first column in data.tab check and record if it is in `h`
{ 
  for(i=2; i<=NF; i++) 
    $i = ($i in h) ? 1 : 0 
}

# Short for { print $0 }
1

像这样运行它：

awk -f parse.awk mylist.tab data.tab

输出：

Info_1 0 1 1
Info_2 1 0
Info_3 1
Info_4 1 0 0 0 1
Info_5

或者对于制表符分隔的列：

awk -v OFS='\t' -f parse.awk mylist.tab data.tab

输出：

Info_1  0   1   1
Info_2  1   0
Info_3  1
Info_4  1   0   0   0   1
Info_5

Question 2

Perl 来救援！

将列表元素保存到哈希中，然后读取表，按空格分割并检查哈希以打印 0 或 1。

#!/usr/bin/perl
use warnings;
use strict;

my %in_list;
open my $LIST, '<', 'mylist.tab' or die $!;
while (<$LIST>) {
    chomp;
    $in_list{$_} = 1;
}

open my $TAB, '<', 'data.tab';
while (<$TAB>) {
    my @cells = split;
    print shift @cells, "\t";
    print join "\t", map $in_list{$_} ? 1 : 0, @cells;
    print "\n";
}

Answer

Perl 来救援！

将列表元素保存到哈希中，然后读取表，按空格分割并检查哈希以打印 0 或 1。

#!/usr/bin/perl
use warnings;
use strict;

my %in_list;
open my $LIST, '<', 'mylist.tab' or die $!;
while (<$LIST>) {
    chomp;
    $in_list{$_} = 1;
}

open my $TAB, '<', 'data.tab';
while (<$TAB>) {
    my @cells = split;
    print shift @cells, "\t";
    print join "\t", map $in_list{$_} ? 1 : 0, @cells;
    print "\n";
}

Question 3

用于从 mylist.tabsed创建脚本并在 data.tab 上运行它：sed

sed \
    -e '1i s/^[ \\t]*//' \
    -e 's@\(.*\)@s/\\([ \\t]\\)\1\\b/\\11/@g' \
    -e '$as/\\([ \\t]\\)[^ \\t]\\{2,\\}\\b/\\10/g' mylist.tab \
    > /tmp/x.sed 
sed -f /tmp/x.sed data.tab

请注意，我假设“mylist.tab”中的所有字符串都至少有 2 个字符。

Answer

用于从 mylist.tabsed创建脚本并在 data.tab 上运行它：sed

sed \
    -e '1i s/^[ \\t]*//' \
    -e 's@\(.*\)@s/\\([ \\t]\\)\1\\b/\\11/@g' \
    -e '$as/\\([ \\t]\\)[^ \\t]\\{2,\\}\\b/\\10/g' mylist.tab \
    > /tmp/x.sed 
sed -f /tmp/x.sed data.tab

请注意，我假设“mylist.tab”中的所有字符串都至少有 2 个字符。

Question 4

另一种perl解决方案

$ perl -lne 'if(!$#ARGV){ $h{$_}=1 }
             else{ s/\h\K\H+/$h{$&} ? 1 : 0/ge; print }
            ' mylist.tab data.tab
Info_1    0     1     1
Info_2    1     0
Info_3    1
Info_4    1     0     0    0    1
Info_5

if(!$#ARGV){ $h{$_}=1 }建立一个单词哈希表mylist.tab
s/\h\K\H+/$h{$&} ? 1 : 0/ge对于中的行data.tab，1如果散列变量中存在则替换为，否则替换为0。对于空白的存在是\h\K积极的向后查找，从而避免第一列匹配
然后打印修改后的行

Answer

另一种perl解决方案

$ perl -lne 'if(!$#ARGV){ $h{$_}=1 }
             else{ s/\h\K\H+/$h{$&} ? 1 : 0/ge; print }
            ' mylist.tab data.tab
Info_1    0     1     1
Info_2    1     0
Info_3    1
Info_4    1     0     0    0    1
Info_5

if(!$#ARGV){ $h{$_}=1 }建立一个单词哈希表mylist.tab
s/\h\K\H+/$h{$&} ? 1 : 0/ge对于中的行data.tab，1如果散列变量中存在则替换为，否则替换为0。对于空白的存在是\h\K积极的向后查找，从而避免第一列匹配
然后打印修改后的行

检查列表中是否存在字符串，如果字符串存在则输出第三个文件

答案1

答案2

答案3

答案4

相关内容