grep 并在一条通道中打印所有模式

grep 并在一条通道中打印所有模式

我有一个包含这样的模式列表的文件

K00001

K00003

K00005

我想在制表符分隔的表中 grep 并打印我的模式(原始表没有空行),如下所示:

K00001  ko00010_Glycolysis__Gluconeogenesis

K00003  ko00010_Glycolysis__Gluconeogenesis

K00005  ko00010_Glycolysis__Gluconeogenesis

K00001  ko00020_Citrate_cycle_(TCA_cycle)

K00003  ko00020_Citrate_cycle_(TCA_cycle)

K00005  ko00020_Citrate_cycle_(TCA_cycle)

获得这个:一行包含我的模式文件中的所有模式

K00001_ko00010_Glycolysis__Gluconeogenesis;K00001_ko00020_Citrate_cycle_(TCA_cycle)
K00003_ko00010_Glycolysis__Gluconeogenesis;K00003_ko00020_Citrate_cycle_(TCA_cycle)
K00005_ko00010_Glycolysis__Gluconeogenesis;K00005_ko00020_Citrate_cycle_(TCA_cycle)

答案1

该解决方案使用awk。我们将传递两个文件名作为参数,并if (FNR == NR)根据我们正在读取第一个文件还是第二个文件,使用习惯用法执行不同的操作。我们将使用关联数组存储按键和输出线。

这是文件a.awk

# usage: awk -f a.awk keyfile1 datafile2
BEGIN {
    FS = "\t"                               # set field separator to TAB
}
{
    if (FNR == NR) {                        # if looking at first (key) file
        k[$1]=$1                            # just save each key
    } else {                                # if looking at second file
        if ($1 in k) {                      # if first col is one that we want
            output=$1 "_" $2                # prepare output line
            if (out[$1]=="")                # if first time we've seen this key
                out[$1]=output              # store output as is
            else                            # and when we find more matches for this key
                out[$1]=out[$1] ";" output  # we append ";" and the output
        }
    }
}
END {                                       # at the end
    for (i in out)                          # print all the output lines
        print out[i]
}

以下是如何使用它:

$ awk -f a.awk file1 file2
K00001_ko00010_Glycolysis__Gluconeogenesis;K00001_ko00020_Citrate_cycle_(TCA_cycle)
K00003_ko00010_Glycolysis__Gluconeogenesis;K00003_ko00020_Citrate_cycle_(TCA_cycle)
K00005_ko00010_Glycolysis__Gluconeogenesis;K00005_ko00020_Citrate_cycle_(TCA_cycle)

相关内容