是否应使用 utf-8 或“utf-8”作为电子邮件标头中的字符集值?

是否应使用 utf-8 或“utf-8”作为电子邮件标头中的字符集值?

当向启用了转发功能的 Outlook.com 发送电子邮件时,我发现转发的邮件被拒绝。

检查已发送的邮件和 Outlook 收件箱中的邮件。我发现 Microsoft 基本上重写了邮件正文的部分内容。

例如

This is a multi-part message in MIME format.
--=_5226908e44ebc0462f06052400644d2f
Content-Type: multipart/alternative;
 boundary="=_926d2a45bc543e1972443c87118fa61a"

--=_926d2a45bc543e1972443c87118fa61a
Content-Transfer-Encoding: base64
Content-Type: text/plain; charset=utf-8

SGF2aW5nIGFub3RoZXIgZ28gYXQgZm9yd2FyZGluZyBhbiBlbWFpbCB2aWEgT3V0bG9vay4NCg0K
DQo=
--=_926d2a45bc543e1972443c87118fa61a
Content-Transfer-Encoding: base64
Content-Type: text/html; charset=utf-8

变为如下;请注意charset值周围的引号:

--=_5226908e44ebc0462f06052400644d2f
Content-Type: multipart/alternative;
boundary="=_926d2a45bc543e1972443c87118fa61a"

--=_926d2a45bc543e1972443c87118fa61a
Content-Transfer-Encoding: base64
Content-Type: text/plain; charset="utf-8"

SGF2aW5nIGFub3RoZXIgZ28gYXQgZm9yd2FyZGluZyBhbiBlbWFpbCB2aWEgT3V0bG9vay4NCg0K
DQo=
--=_926d2a45bc543e1972443c87118fa61a
Content-Transfer-Encoding: base64
Content-Type: text/html; charset="utf-8"

现在,除了邮件 RFC 明确禁止修改正文(这会破坏 DKIM 签名)之外,我不得不问charset=utf-8在电子邮件标题中写入正确的方式是什么?

答案1

RFC2045Content-Type第 5.1 节提供了用于构造MIME 消息中有效标头的语法:

5.1.  Syntax of the Content-Type Header Field

   In the Augmented BNF notation of RFC 822, a Content-Type header field
   value is defined as follows:

     content := "Content-Type" ":" type "/" subtype
                *(";" parameter)
                ; Matching of media type and subtype
                ; is ALWAYS case-insensitive.

     type := discrete-type / composite-type

     discrete-type := "text" / "image" / "audio" / "video" /
                      "application" / extension-token

     composite-type := "message" / "multipart" / extension-token

     extension-token := ietf-token / x-token

     ietf-token := <An extension token defined by a
                    standards-track RFC and registered
                    with IANA.>

     x-token := <The two characters "X-" or "x-" followed, with
                 no intervening white space, by any token>

     subtype := extension-token / iana-token

     iana-token := <A publicly-defined extension token. Tokens
                    of this form must be registered with IANA
                    as specified in RFC 2048.>

     parameter := attribute "=" value

     attribute := token
                  ; Matching of attributes
                  ; is ALWAYS case-insensitive.

     value := token / quoted-string

     token := 1*<any (US-ASCII) CHAR except SPACE, CTLs,
                 or tspecials>

     tspecials :=  "(" / ")" / "<" / ">" / "@" /
                   "," / ";" / ":" / "\" / <">
                   "/" / "[" / "]" / "?" / "="
                   ; Must be in quoted-string,
                   ; to use within parameter values

注意是如何value定义为的token / quoted-string

本节下面是通过示例进行文字说明:

   Note that the value of a quoted string parameter does not include the
   quotes.  That is, the quotation marks in a quoted-string are not a
   part of the value of the parameter, but are merely used to delimit
   that parameter value.  In addition, comments are allowed in
   accordance with RFC 822 rules for structured header fields.  Thus the
   following two forms

     Content-type: text/plain; charset=us-ascii (Plain text)

     Content-type: text/plain; charset="us-ascii"

   are completely equivalent.

正如你所见,引用不是必需的当值已经是token1*<any (US-ASCII) CHAR except SPACE, CTLs, or tspecials>)但仍然有效时。

答案2

好问题。根据我的经验,HTML 电子邮件标头与 HTML(Web 服务器)标头没有太大区别,因此我会遵循非引用版本,如下所示:

Content-Type: text/html; charset=utf-8

深入研究 RFC(RFC 2047)对于 MIME 编码我发现了这个:

2. Syntax of encoded-words

   An 'encoded-word' is defined by the following ABNF grammar.  The
   notation of RFC 822 is used, with the exception that white space
   characters MUST NOT appear between components of an 'encoded-word'.

   encoded-word = "=?" charset "?" encoding "?" encoded-text "?="

   charset = token    ; see section 3

   encoding = token   ; see section 4

它从未提及带引号的标记值是否有效。那么我会假设 Microsoft 以某种方式重写标头以使用带引号的值吗?除了提供的证据之外没有任何线索,但我会推迟使用不带引号的值,而不是默认使用 Microsoft 正在做的事情。

相关内容