筛选规则以匹配原始标头值

筛选规则以匹配原始标头值

这在 procmail 中有效,但 procmail 似乎2001 年 9 月废弃。我有一条规则,可以感知在“收件人:”标头中使用 utf-8 时使用表情符号或非拉丁字符写我的名字的情况。当我在 Dovecot 的 Sieve 实现“Pigeonhole”中尝试相同操作时,我感到很沮丧,因为它似乎丢弃了一些数据。

参考。RFC5228 中的筛选规则
参考。Dovecot Pigeonhole 实现

我尝试过的:

require ["fileinto"];
if header :contains ["to", "from"] "=?utf-8?B?" {   fileinto "Junk"; }
elsif address :contains :all ["to", "from"] "=?utf-8?B?" {   fileinto "Junk"; }

使用此示例数据:

From: "=?utf-8?B?TWluaSBXdQ==?=" <[email protected]>
To: "=?utf-8?B?Q1VTVA==?=" <[email protected]>
Subject: =?utf-8?B?UmU6TWljcm9jaGlwIFRleGFzIE9mZmVy?=
Date: Mon, 20 Mar 2023 16:12:50 +0900

Hello potential customer! Please stop whatever you're
doing and pay attention to me!

我得到的是:

sieve-test -Tlevel=matching -t - /tmp/badmail.sieve /tmp/badmail.txt

      ## Started executing script 'badmail'
   2: header test
   2:   starting `:contains' match with `i;ascii-casemap' comparator:
   2:   extracting `to' headers from message
   2:   matching value `"CUST" <[email protected]>'
   2:     with key `=?utf-8?B?' => 0
   2:   extracting `from' headers from message
   2:   matching value `"Mini Wu" <[email protected]>'
   2:     with key `=?utf-8?B?' => 0
   2:   finishing match with result: not matched
   2: jump if result is false
   2:   jumping to line 3
   3: address test
   3:   starting `:contains' match with `i;ascii-casemap' comparator:
   3:   extracting `to' headers from message
   3:   parsing address header value `"=?utf-8?B?Q1VTVA==?=" <[email protected]>'
   3:   address value `[email protected]'
   3:   extracting `all' part from address <[email protected]>
   3:   matching value `[email protected]'
   3:     with key `=?utf-8?B?' => 0
   3:   extracting `from' headers from message
   3:   parsing address header value `"=?utf-8?B?TWluaSBXdQ==?=" <[email protected]>'
   3:   address value `[email protected]'
   3:   extracting `all' part from address <[email protected]>
   3:   matching value `[email protected]'
   3:     with key `=?utf-8?B?' => 0
   3:   finishing match with result: not matched
   3: jump if result is false
   3:   jumping to line 3
      ## Finished executing script 'badmail'

Implicit keep:  store message in folder: INBOX

它在跟踪输出中记录了“=?utf-8?B?...”,所以我知道它知道。但是“header”测试和“address”测试在执行之前都会丢弃该数据。我还尝试了:comparator "i;octet"默认的“i;ascii-casemap”,结果相同。

我如何测试原始标题而不是这些解释的值?

答案1

所以..您实际上并不是想区分“表情符号或非拉丁字符”,而是想区分字符在网上传输的具体方式?

我想不出让 Sieve 返回原始字节的方法。你可以通过在邮件服务器中进行匹配来解决例如,使用 Postfix (RFC2047-ignorant) header_checks 功能添加自定义标头,例如

# header_checks = pcre:/etc/postfix/maps/remember_header_encoding
#  pcre is case insensitive by default
/^To:.*=\?utf-8\?B\?/   PREPEND X-Preserve-For-Sieve: RFC2047 marker in header To:

然后检查筛选中是否存在这样的标记头。


即使今天是这样,我也怀疑在可预见的未来,这整件事是否能成为可靠​​的分类标准。中继 SMTP 服务器(包括传递给筛选的服务器)可能会在消息转换过程中添加以前没有的编码。一些邮件客户端会在不需要的地方添加编码,而其他邮件客户端即使应该添加也会失败。检测到没有预期的差异可能不会静态影响相同类型的消息。


‡ 除了多余的编码之外,普通邮件很少有其他选择 - Dovecot 尚未保证 8 位清洁传输,例如 SMTPUTF8

相关内容