以下是文件中的文本块示例:
Now is the time for all blah:1; to come to the aid
Now is the time for all blah:1; to come to the aid
Now is the time for all blah:1; to come to the aid
Now is the time for all blah:10; to come to the aid
Go to your happy place blah:100; to come to the aid
Go to your happy place blah:4321; to come to the aid
Go to your happy place blah:4321; to come to the aid
Now is the time for all blah:4321; to come to the aid
Now is the time for all blah:9876; to come to the aid
Now is the time for all blah:108636; to come to the aid
Now is the time for all blah:1194996; to come to the aid
问题:如何从其中包含“is the”的行中提取所有唯一数字?
我尝试过使用grep -o -P -u '(?<=blah:).*(?=;)
' 但它不喜欢分号
答案1
您正在寻找\K
指令来忘记刚刚匹配的内容。
grep -oP 'is the.*?blah:\K\d+'
然后sort -u
答案2
使用sed
:
$ sed -n '/is the/s/^.*blah:\([0-9]*\);.*$/\1/p' file | sort -u
1
10
108636
1194996
4321
9876
替换将替换包含和is the
之间数字的字符串的所有行的内容。不包含该字符串的行将被忽略。blah:
;
答案3
cat file | grep "is the" | awk -F':' '{print $2}'|awk -F';' '{print $1}'|sort -u
答案4
尝试这个:
grep "is the" file | sed 's/.*blah://;s/;.*//' | sort -u
解释:
grep
获取所有带有“is the
”的行(在该行的任何部分)sed
删除“:
”之前和“;
”之后的所有内容(您可以使用sed -e 's/.*blah://' -e 's/;.*//'
它来更好地理解)sort
排序线