对于给定的输入:
How to get This line that this word repeated 3 times in THIS line?
But not this line which is THIS word repeated 2 times.
And I will get This line with this here and This one
A test line with four this and This another THIS and last this
我想要这个输出:
How to get This line that this word repeated 3 times in THIS line?
And I will get This line with this here and This one
获取仅包含三个重复的“this”单词的整行。(不区分大小写的匹配)
答案1
在 中perl
,this
不区分大小写地用其自身替换,并计算替换次数:
$ perl -ne 's/(this)/$1/ig == 3 && print' <<EOF
How to get This line that this word repeated 3 times in THIS line?
But not this line which is THIS word repeated 2 times.
And I will get This line with this here and This one
A test line with four this and This another THIS and last this
EOF
How to get This line that this word repeated 3 times in THIS line?
And I will get This line with this here and This one
使用匹配次数反而:
perl -ne 'my $c = () = /this/ig; $c == 3 && print'
如果你有 GNU awk,一个非常简单的方法:
gawk -F'this' -v IGNORECASE=1 'NF == 4'
字段的数量将比分隔符的数量多一。
答案2
假设你的源文件是 tmp.txt,
grep -iv '.*this.*this.*this.*this' tmp.txt | grep -i '.*this.*this.*this.*'
左边的 grep 输出 tmp.txt 中所有不包含 4 个或更多不区分大小写的“this”的行。
结果通过管道传输到右侧 grep,它将输出左侧 grep 结果中出现 3 次或更多次的所有行。
更新:感谢@Muru,这是该解决方案的更好版本,
grep -Eiv '(.*this){4,}' tmp.txt | grep -Ei '(.*this){3}'
用 n+1 替换 4,用 n 替换 3。
答案3
在 Python 中,这可以完成这项工作:
#!/usr/bin/env python3
s = """How to get This line that this word repeated 3 times in THIS line?
But not this line which is THIS word repeated 2 times.
And I will get This line with this here and This one
A test line with four this and This another THIS and last this"""
for line in s.splitlines():
if line.lower().count("this") == 3:
print(line)
输出:
How to get This line that this word repeated 3 times in THIS line?
And I will get This line with this here and This one
或者从文件中读取,以文件作为参数:
#!/usr/bin/env python3
import sys
file = sys.argv[1]
with open(file) as src:
lines = [line.strip() for line in src.readlines()]
for line in lines:
if line.lower().count("this") == 3:
print(line)
将脚本粘贴到一个空文件中,另存为
find_3.py
,通过以下命令运行:python3 /path/to/find_3.py <file_withlines>
当然单词“this”可以被任何其他单词(或其他字符串或行部分)替换,并且每行出现的次数可以设置为行中的任何其他值:
if line.lower().count("this") == 3:
编辑
如果文件很大(数十万/数百万行),下面的代码会更快;它按行读取文件而不是一次加载文件:
#!/usr/bin/env python3
import sys
file = sys.argv[1]
with open(file) as src:
for line in src:
if line.lower().count("this") == 3:
print(line.strip())
答案4
假设这些行存储在名为的文件中FILE
:
while read line; do
if [ $(grep -oi "this" <<< "$line" | wc -w) = 3 ]; then
echo "$line";
fi
done <FILE