grep / 解析文本

grep / 解析文本

我需要从 Medline 摘要中解析药物名称。我希望通过获取输出然后使用粘贴来做到这一点grep -wfgrep -owf但输出不对应,因为grep -owf为每个匹配创建一个输出,即使它位于同一行。

图案文件:

DrugA
DrugB
DrugC
DrugD

要解析的文件:

In our study, DrugA and DrugB were found to be effective.  DrugA was more effective than DrugB.
In our study, DrugC was found to be effective
In our study, DrugX was found to be effective

期望的输出:

DrugA    In our study, DrugA and DrugB were found to be effective. DrugA was more effective.
DrugB    In our study, DrugA and DrugB were found to be effective. DrugA was more effective.
DrugC    In our study, DrugC was found to be effective

答案1

也许有一种awk方法?

awk '
    NR == FNR {
        a[$0] = 1
        n = length($0)
        w = n > w ? n : w
        next
    }
    {
        for (i in a)
            if ($0 ~ i)
                printf "%-* s %s\n", w, i, $0
    } 
' pattern_file.txt data_file.txt

答案2

严格来说它并不grep孤单,但这确实有效:

while IFS= read -r pattern; do
    grep "$pattern" input | awk -v drug="$pattern" 'BEGIN {OFS="\t"} { print drug,$0}'
done < "patterns"

答案3

一个sed办法:

sed 's|.*|/&/{h;s/^/&\\t/p;g}|' pattern_file | sed -nf - input

相关内容