从包含的每一行中提取唯一的字符串

从包含的每一行中提取唯一的字符串

以下是文件中的文本块示例:


Now is the time for all blah:1; to come to the aid
Now is the time for all blah:1; to come to the aid
Now is the time for all blah:1; to come to the aid
Now is the time for all blah:10; to come to the aid
Go to your happy place  blah:100; to come to the aid
Go to your happy place  blah:4321; to come to the aid
Go to your happy place  blah:4321; to come to the aid
Now is the time for all blah:4321; to come to the aid
Now is the time for all blah:9876; to come to the aid
Now is the time for all blah:108636; to come to the aid
Now is the time for all blah:1194996; to come to the aid

问题:如何从其中包含“is the”的行中提取所有唯一数字?

我尝试过使用grep -o -P -u '(?<=blah:).*(?=;)' 但它不喜欢分号

答案1

您正在寻找\K指令来忘记刚刚匹配的内容。

grep -oP 'is the.*?blah:\K\d+'

然后sort -u

答案2

使用sed

$ sed -n '/is the/s/^.*blah:\([0-9]*\);.*$/\1/p' file | sort -u
1
10
108636
1194996
4321
9876

替换将替换包含和is the之间数字的字符串的所有行的内容。不包含该字符串的行将被忽略。blah:;

答案3

cat file | grep "is the" | awk -F':' '{print $2}'|awk -F';' '{print $1}'|sort -u

答案4

尝试这个

grep "is the" file | sed 's/.*blah://;s/;.*//' | sort -u

解释

  1. grep获取所有带有“ is the”的行(在该行的任何部分)
  2. sed删除“ :”之前和“ ;”之后的所有内容(您可以使用sed -e 's/.*blah://' -e 's/;.*//'它来更好地理解)
  3. sort排序线

相关内容