如何从包含另一个文件中的列表中的字符串的文本文件中按搜索列表的顺序提取行?

如何从包含另一个文件中的列表中的字符串的文本文件中按搜索列表的顺序提取行?

文件1:源文件.txt

Hello, It's the beginning of the sentence. 
it is the beginpoint of my career.
The end is always far.
We can start our beginpoint anytime we want.
The time we utilise to make our life good should be more.
This text doesn't mean anything.
I am writing this to include my three points:
beginpoint
time
end

文件2:字符串.txt

beginpoint
end
time

所需输出:

it is the beginpoint of my career
We can start our beginpoint anytime we want.
beginpoint
The end is always far.
end
The time we utilise to make our life good should be more.
time

我用了

grep -w -F -f  strings.txt sorcefile.txt > outputfile.txt

我得到输出:

it is the beginpoint of my career.
The end is always far.
We can start our beginpoint anytime we want.
The time we utilise to make our life good should be more.
beginpoint
time
end

因此,这些行是根据需要的,但我想按搜索词顺序对它们进行分组,而不是按照与源文件相同的顺序

答案1

grep一种方法是每行调用一次strings.txt

$ while IFS= read -r line; do grep -wF "$line" sourcefile.txt; done < strings.txt
it is the beginpoint of my career.
We can start our beginpoint anytime we want.
beginpoint
The end is always far.
end
The time we utilise to make our life good should be more.
time

如果strings.txt文件太长,速度可能会很慢,请参阅 为什么使用 shell 循环处理文本被认为是不好的做法?


如果sed它支持e标志:

$ sed 's/.*/grep -wF '"'&'"' sourcefile.txt/e' strings.txt
it is the beginpoint of my career.
We can start our beginpoint anytime we want.
beginpoint
The end is always far.
end
The time we utilise to make our life good should be more.
time

答案2

假设您的字符串列表不包含空格,如您的示例所示:

$ awk -F'[^[:alnum:]_]+' '
    NR==FNR { strs[$0]; next }
    { for (str in strs) for (i=1; i<=NF; i++) if ($i==str) print str, FNR, $0 }
' file2 file1 | sort -k1,1 -k2,2n | cut -d' ' -f3-
it is the beginpoint of my career.
We can start our beginpoint anytime we want.
beginpoint
The end is always far.
end
The time we utilise to make our life good should be more.
time

上面的工作原理不仅打印包含匹配字符串的行,还打印匹配的字符串加上匹配的行号(以在排序后保留相对顺序 - 如果我们使用 GNU sort for ,则不需要-s)然后排序,然后删除在第一步中添加的装饰。这是一步一步:

$ awk -F'[^[:alnum:]_]+' 'NR==FNR{strs[$0];next} {for (str in strs) for (i=1; i<=NF; i++) if ($i==str) print str, FNR, $0}' file2 file1
beginpoint 2 it is the beginpoint of my career.
end 3 The end is always far.
beginpoint 4 We can start our beginpoint anytime we want.
time 5 The time we utilise to make our life good should be more.
beginpoint 8 beginpoint
time 9 time
end 10 end

$ awk -F'[^[:alnum:]_]+' 'NR==FNR{strs[$0];next} {for (str in strs) for (i=1; i<=NF; i++) if ($i==str) print str, FNR, $0}' file2 file1 | sort -k1,1 -k2,2n
beginpoint 2 it is the beginpoint of my career.
beginpoint 4 We can start our beginpoint anytime we want.
beginpoint 8 beginpoint
end 3 The end is always far.
end 10 end
time 5 The time we utilise to make our life good should be more.
time 9 time

$ awk -F'[^[:alnum:]_]+' 'NR==FNR{strs[$0];next} {for (str in strs) for (i=1; i<=NF; i++) if ($i==str) print str, FNR, $0}' file2 file1 |
    sort -k1,1 -k2,2n | cut -d' ' -f3-
it is the beginpoint of my career.
We can start our beginpoint anytime we want.
beginpoint
The end is always far.
end
The time we utilise to make our life good should be more.
time

相关内容