grep 标记周围的单词

grep 标记周围的单词

我的文件中有一些行,例如:

This is one word1:word2 of the lines    
This is another word3:word4 of the lines    
Line without a match    
Yet another line word5:word6 for test

我需要 grep:并返回之前和之后的单词:

我需要从以上几行 grep 得到的输出是

word1:word2
word3:word4
word5:word6

答案1

使用 GNU grep

start cmd:> echo "This is one word1:word2 of the lines" |
  grep -Eo '[[:alnum:]]+:[[:alnum:]]+'
word1:word2

start cmd:> echo "This is one wordx:wordy of the lines" |
  grep -Eo '[[:alpha:]]*:[[:alpha:]]*'
wordx:wordy

start cmd:> echo "This is one wo_rdx:wo_rdy of the lines" |
  grep -Eo '[[:alpha:]_]*:[[:alpha:]_]*'
wo_rdx:wo_rdy

答案2

POSIXly(尽管要注意某些tr实现(例如 GNU 的)不能正确处理多字节字符)。

tr -s '[:space:]_' '[\n*]' << 'EOF' |
  grep -xE '[[:alnum:]_]+:[[:alnum:]_]+'
This is one word1:word2 of the lines and another is word:word   
This is another word3:word4 of the lines  and this is not wordnot::wordnot
Line without a match    
Yet another line word5:word6 for test
This is one wo_rdx:wo_rdy of the lines
This is one wordx:wordy of the lines
not/a:match
EOF

给出:

word1:word2
word:word
word3:word4
word5:word6
rdx:wo
wordx:wordy

答案3

对于您想要的结果的所有情况,您可以使用grep带有 PCRE support( -P) 的 GNU 及其单词正则表达式 ( \w),如下所示:

grep -oP '\w+:\w+' file

输入文件:

This is one word1:word2 of the lines and another is word:word   
This is another word3:word4 of the lines  and this is not wordnot::wordnot
Line without a match    
Yet another line word5:word6 for test
This is one wo_rdx:wo_rdy of the lines
This is one wordx:wordy of the lines

输出:

word1:word2
word:word
word3:word4
word5:word6
wo_rdx:wo_rdy
wordx:wordy

正如您所看到的,与模式grep不匹配,因为它本身之间wordnot::wordnot有额外的内容。:

答案4

通过 grep,

grep -oP '[^:\s]+:[^:\s]+' file

或者

grep -oP '\S+?:\S+' file

上面的命令不仅获取字符串foo:bar,而且还获取?foo:bar?

相关内容