如何过滤来自特定域的电子邮件？

Question 1

在一个很小的脚本中：

#!/usr/bin/env python3
import sys

# list domains to be removed
rm = [
    'gmail.com', 'hotmail', 'yahoo', 'aol', 'rediffmail.com',
    'msn', 'outlook', 'inbox.com', 'icloud.com', 'mail.com',
    'zoho.com', 'yandex', 'live'
    ]
# read the file per line
for l in open(sys.argv[1]):
    # see if not any of the @domains is in the line
    if not any([d in l for d in ["@"+d for d in rm]]):
        # then print the line
        print(l.strip())

使用

将脚本复制到一个空文件中filter_doms.py

使用输入文件作为参数运行它：

python3 /path/to/filter_doms.py input_file > output_file

解释

我相信代码和注释已经说明了一切:)

根据评论中的要求，版本忽略解码错误：

#!/usr/bin/env python3
import sys
import codecs

rm = [
    'gmail.com', 'hotmail', 'yahoo', 'aol', 'rediffmail.com',
    'msn', 'outlook', 'inbox.com', 'icloud.com', 'mail.com',
    'zoho.com', 'yandex', 'live'
    ]

with codecs.open(sys.argv[1], "r", encoding="utf-8", errors="ignore") as read:
    for l in read:
        if not any([d in l for d in ["@"+d for d in rm]]):
            print(l.strip())

使用方法完全相同

Answer

在一个很小的脚本中：

#!/usr/bin/env python3
import sys

# list domains to be removed
rm = [
    'gmail.com', 'hotmail', 'yahoo', 'aol', 'rediffmail.com',
    'msn', 'outlook', 'inbox.com', 'icloud.com', 'mail.com',
    'zoho.com', 'yandex', 'live'
    ]
# read the file per line
for l in open(sys.argv[1]):
    # see if not any of the @domains is in the line
    if not any([d in l for d in ["@"+d for d in rm]]):
        # then print the line
        print(l.strip())

使用

将脚本复制到一个空文件中filter_doms.py

使用输入文件作为参数运行它：

python3 /path/to/filter_doms.py input_file > output_file

解释

我相信代码和注释已经说明了一切:)

根据评论中的要求，版本忽略解码错误：

#!/usr/bin/env python3
import sys
import codecs

rm = [
    'gmail.com', 'hotmail', 'yahoo', 'aol', 'rediffmail.com',
    'msn', 'outlook', 'inbox.com', 'icloud.com', 'mail.com',
    'zoho.com', 'yandex', 'live'
    ]

with codecs.open(sys.argv[1], "r", encoding="utf-8", errors="ignore") as read:
    for l in read:
        if not any([d in l for d in ["@"+d for d in rm]]):
            print(l.strip())

使用方法完全相同

Question 2

您可以使用类似这样的选项grep。-f

grep -vhf patternfile file file1

将所有图案patternfile逐行放入

@gmail.com
@hotmail
@yahoo
@aol
@rediffmail.com
..

-h当输入中给出多个文件时，该选项是抑制前缀文件名。

Answer

您可以使用类似这样的选项grep。-f

grep -vhf patternfile file file1

将所有图案patternfile逐行放入

@gmail.com
@hotmail
@yahoo
@aol
@rediffmail.com
..

-h当输入中给出多个文件时，该选项是抑制前缀文件名。

如何过滤来自特定域的电子邮件？

答案1

使用

解释

答案2

相关内容