如何才能找到某个特定单词重复 N 次的行?

如何才能找到某个特定单词重复 N 次的行?

对于给定的输入:

How to get This line that this word repeated 3 times in THIS line?
But not this line which is THIS word repeated 2 times.
And I will get This line with this here and This one
A test line with four this and This another THIS and last this

我想要这个输出:

How to get This line that this word repeated 3 times in THIS line?
And I will get This line with this here and This one

获取仅包含三个重复的“this”单词的整行。(不区分大小写的匹配)

答案1

在 中perlthis不区分大小写地用其自身替换,并计算替换次数:

$ perl -ne 's/(this)/$1/ig == 3 && print' <<EOF
How to get This line that this word repeated 3 times in THIS line?
But not this line which is THIS word repeated 2 times.
And I will get This line with this here and This one
A test line with four this and This another THIS and last this
EOF
How to get This line that this word repeated 3 times in THIS line?
And I will get This line with this here and This one

使用匹配次数反而:

perl -ne 'my $c = () = /this/ig; $c == 3 && print'

如果你有 GNU awk,一个非常简单的方法:

gawk -F'this' -v IGNORECASE=1 'NF == 4'

字段的数量将比分隔符的数量多一。

答案2

假设你的源文件是 tmp.txt,

grep -iv '.*this.*this.*this.*this' tmp.txt | grep -i '.*this.*this.*this.*'

左边的 grep 输出 tmp.txt 中所有不包含 4 个或更多不区分大小写的“this”的行。

结果通过管道传输到右侧 grep,它将输出左侧 grep 结果中出现 3 次或更多次的所有行。

更新:感谢@Muru,这是该解决方案的更好版本,

grep -Eiv '(.*this){4,}' tmp.txt | grep -Ei '(.*this){3}'

用 n+1 替换 4,用 n 替换 3。

答案3

在 Python 中,这可以完成这项工作:

#!/usr/bin/env python3

s = """How to get This line that this word repeated 3 times in THIS line?
But not this line which is THIS word repeated 2 times.
And I will get This line with this here and This one
A test line with four this and This another THIS and last this"""

for line in s.splitlines():
    if line.lower().count("this") == 3:
        print(line)

输出:

How to get This line that this word repeated 3 times in THIS line?
And I will get This line with this here and This one

或者从文件中读取,以文件作为参数:

#!/usr/bin/env python3
import sys

file = sys.argv[1]

with open(file) as src:
    lines = [line.strip() for line in src.readlines()]

for line in lines:
    if line.lower().count("this") == 3:
        print(line)
  • 将脚本粘贴到一个空文件中,另存为find_3.py,通过以下命令运行:

    python3 /path/to/find_3.py <file_withlines>
    

当然单词“this”可以被任何其他单词(或其他字符串或行部分)替换,并且每行出现的次数可以设置为行中的任何其他值:

    if line.lower().count("this") == 3:

编辑

如果文件很大(数十万/数百万行),下面的代码会更快;它按行读取文件而不是一次加载文件:

#!/usr/bin/env python3
import sys
file = sys.argv[1]

with open(file) as src:
    for line in src:
        if line.lower().count("this") == 3:
            print(line.strip())

答案4

假设这些行存储在名为的文件中FILE

while read line; do 
    if [ $(grep -oi "this" <<< "$line" | wc -w)  = 3 ]; then 
        echo "$line"; 
    fi  
done  <FILE

相关内容