我需要从 Medline 摘要中解析药物名称。我希望通过获取输出然后使用粘贴来做到这一点grep -wf
,grep -owf
但输出不对应,因为grep -owf
为每个匹配创建一个输出,即使它位于同一行。
图案文件:
DrugA
DrugB
DrugC
DrugD
要解析的文件:
In our study, DrugA and DrugB were found to be effective. DrugA was more effective than DrugB.
In our study, DrugC was found to be effective
In our study, DrugX was found to be effective
期望的输出:
DrugA In our study, DrugA and DrugB were found to be effective. DrugA was more effective.
DrugB In our study, DrugA and DrugB were found to be effective. DrugA was more effective.
DrugC In our study, DrugC was found to be effective
答案1
也许有一种awk
方法?
awk '
NR == FNR {
a[$0] = 1
n = length($0)
w = n > w ? n : w
next
}
{
for (i in a)
if ($0 ~ i)
printf "%-* s %s\n", w, i, $0
}
' pattern_file.txt data_file.txt
答案2
严格来说它并不grep
孤单,但这确实有效:
while IFS= read -r pattern; do
grep "$pattern" input | awk -v drug="$pattern" 'BEGIN {OFS="\t"} { print drug,$0}'
done < "patterns"
答案3
一个sed
办法:
sed 's|.*|/&/{h;s/^/&\\t/p;g}|' pattern_file | sed -nf - input