这在 procmail 中有效,但 procmail 似乎2001 年 9 月废弃。我有一条规则,可以感知在“收件人:”标头中使用 utf-8 时使用表情符号或非拉丁字符写我的名字的情况。当我在 Dovecot 的 Sieve 实现“Pigeonhole”中尝试相同操作时,我感到很沮丧,因为它似乎丢弃了一些数据。
参考。RFC5228 中的筛选规则
参考。Dovecot Pigeonhole 实现
我尝试过的:
require ["fileinto"];
if header :contains ["to", "from"] "=?utf-8?B?" { fileinto "Junk"; }
elsif address :contains :all ["to", "from"] "=?utf-8?B?" { fileinto "Junk"; }
使用此示例数据:
From: "=?utf-8?B?TWluaSBXdQ==?=" <[email protected]>
To: "=?utf-8?B?Q1VTVA==?=" <[email protected]>
Subject: =?utf-8?B?UmU6TWljcm9jaGlwIFRleGFzIE9mZmVy?=
Date: Mon, 20 Mar 2023 16:12:50 +0900
Hello potential customer! Please stop whatever you're
doing and pay attention to me!
我得到的是:
sieve-test -Tlevel=matching -t - /tmp/badmail.sieve /tmp/badmail.txt
## Started executing script 'badmail'
2: header test
2: starting `:contains' match with `i;ascii-casemap' comparator:
2: extracting `to' headers from message
2: matching value `"CUST" <[email protected]>'
2: with key `=?utf-8?B?' => 0
2: extracting `from' headers from message
2: matching value `"Mini Wu" <[email protected]>'
2: with key `=?utf-8?B?' => 0
2: finishing match with result: not matched
2: jump if result is false
2: jumping to line 3
3: address test
3: starting `:contains' match with `i;ascii-casemap' comparator:
3: extracting `to' headers from message
3: parsing address header value `"=?utf-8?B?Q1VTVA==?=" <[email protected]>'
3: address value `[email protected]'
3: extracting `all' part from address <[email protected]>
3: matching value `[email protected]'
3: with key `=?utf-8?B?' => 0
3: extracting `from' headers from message
3: parsing address header value `"=?utf-8?B?TWluaSBXdQ==?=" <[email protected]>'
3: address value `[email protected]'
3: extracting `all' part from address <[email protected]>
3: matching value `[email protected]'
3: with key `=?utf-8?B?' => 0
3: finishing match with result: not matched
3: jump if result is false
3: jumping to line 3
## Finished executing script 'badmail'
Implicit keep: store message in folder: INBOX
它在跟踪输出中记录了“=?utf-8?B?...”,所以我知道它知道。但是“header”测试和“address”测试在执行之前都会丢弃该数据。我还尝试了:comparator "i;octet"
默认的“i;ascii-casemap”,结果相同。
我如何测试原始标题而不是这些解释的值?
答案1
所以..您实际上并不是想区分“表情符号或非拉丁字符”,而是想区分字符在网上传输的具体方式?
我想不出让 Sieve 返回原始字节的方法。你可以通过在邮件服务器中进行匹配来解决例如,使用 Postfix (RFC2047-ignorant) header_checks 功能添加自定义标头,例如
# header_checks = pcre:/etc/postfix/maps/remember_header_encoding
# pcre is case insensitive by default
/^To:.*=\?utf-8\?B\?/ PREPEND X-Preserve-For-Sieve: RFC2047 marker in header To:
然后检查筛选中是否存在这样的标记头。
即使今天是这样,我也怀疑在可预见的未来,这整件事是否能成为可靠的分类标准。中继 SMTP 服务器(包括传递给筛选的服务器)可能会在消息转换过程中添加以前没有的编码。一些邮件客户端会在不需要的地方添加编码,而其他邮件客户端即使应该添加也会失败。检测到没有预期的差异可能不会静态影响相同类型的消息。
‡ 除了多余的编码之外,普通邮件很少有其他选择 - Dovecot 尚未保证 8 位清洁传输,例如 SMTPUTF8