穆特的侧边栏,无法正确读取希腊字符(使用希腊字符命名的邮箱)。反正两者内部都不存在这样的问题索引和寻呼机,其中希腊字符/单词/名称显示得很好。
吉尔斯发表评论后更新
所讨论的设置用于两个不同的系统(工作站和笔记本电脑,都运行 Funtoo 和 GNU bash,版本 4.2.45(1)-release),包括用于从 IMAP 服务器读取电子邮件/向 IMAP 服务器写入电子邮件的 mutt 。的回应locale
是
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE=POSIX
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
并且 mutt 的任何配置文件中都没有设置任何语言环境变量。
消息实际上是通过以下方式同步的离线地图并使用发送后缀。离线地图,将邮箱(名称)列表记录在一个文件(即名为 的文件mailboxes
)中,该文件源自 mutt 的配置文件之一(通过指示source ~/.mutt/mailboxes
)。查看文件的内容mailboxes
,发现希腊名字已经被“误解”。
无论如何,希腊名字通过网络邮件客户端(RoundCube)显示得很好,该客户端也访问相同的 IMAP 服务器,从而访问可疑邮箱。
问题
- 为什么会发生这种情况?
- 是
offlineimap
配置错误的问题吗? - 怎么解决呢?
剩下的问题?[2015 年 3 月](另见下面的答案)
但是,本地存储库文件夹名称(目录名称)仍然处于不可读状态。即,上面显示的希腊文件夹名称(Yποτροφία)实际上是目录&A6UDwAO,A8QDwQO,A8YDrwOx-
.这是否意味着文件夹名称转换发生在文件夹名称和消息同步之后,而不是同步期间? 或者,是否需要删除这些目录(来自本地存储库)并通过 Offlineimap 强制另一次同步(注意不要让相应的邮箱文件夹从远程存储库中删除)?
答案1
简而言之,
“问题”源于以下事实:IMAP4 使用修改后的 UTF-7 编码对文件夹名称进行编码。离线地图在创建本地存储库之前,不会将文件夹名称转换为可读的名称(例如,以 UTF-8 格式)。反过来,这会派生出不可读的文件夹名称,如该问题的屏幕截图中所示的名称。因此,它既不是穆特也不离线地图本身的处理不当或配置错误。
以下博客文章和 git 存储库详细讨论并解决了该问题:
解决方案
本质上,一个蟒蛇脚本(下面提供)有助于导出可读的文件夹名称,被输入到offlineimap的配置文件中(即offlineimaprc
,如中所述离线IMAP手册)。此外,还有一行用于正确文件夹名称转换的指导性代码(使用 python 脚本中定义的函数),即
# Name translation from UTF7 to UTF8
nametrans = lambda foldername: foldername.decode('imap4-utf-7').encode('utf-8')
被添加到 offlineimap 配置文件中的远程存储库选项部分下。
更新(2015 年 4 月)
另一个规则是必需的反向操作参见文件夹过滤和名称翻译。该说明可能类似于
# Name translation, reverse!
nametrans = lambda foldername: foldername.decode('utf-8').encode('imap4-utf-7')
在该mailboxes
文件中,offlineimap 创建的希腊名称正确显示。这解决了 mutt 内部的问题,并且文件夹名称按预期显示(在本例中为希腊名称)。
剩下的问题?
但是,本地存储库文件夹名称(目录名称)仍然处于不可读状态。即,上面显示的希腊文件夹名称(Yποτροφία)实际上是目录&A6UDwAO,A8QDwQO,A8YDrwOx-
.这是否意味着文件夹名称转换发生在文件夹名称和消息同步之后而不是同步期间? 或者,是否需要删除这些目录(来自本地存储库)并通过 Offlineimap 强制另一次同步(注意不要让相应的邮箱文件夹从远程存储库中删除)?
处理国际邮箱名称(IMAP、UTF-7)的 Python 脚本:
# vim:fileencoding=utf-8
r"""
Imap folder names are encoded using a special version of utf-7 as defined in RFC
2060 section 5.1.3.
From: http://piao-tech.blogspot.com/2010/03/get-offlineimap-working-with-non-ascii.html
5.1.3. Mailbox International Naming Convention
By convention, international mailbox names are specified using a
modified version of the UTF-7 encoding described in [UTF-7]. The
purpose of these modifications is to correct the following problems
with UTF-7:
1) UTF-7 uses the "+" character for shifting; this conflicts with
the common use of "+" in mailbox names, in particular USENET
newsgroup names.
2) UTF-7's encoding is BASE64 which uses the "/" character; this
conflicts with the use of "/" as a popular hierarchy delimiter.
3) UTF-7 prohibits the unencoded usage of "\"; this conflicts with
the use of "\" as a popular hierarchy delimiter.
4) UTF-7 prohibits the unencoded usage of "~"; this conflicts with
the use of "~" in some servers as a home directory indicator.
5) UTF-7 permits multiple alternate forms to represent the same
string; in particular, printable US-ASCII chararacters can be
represented in encoded form.
In modified UTF-7, printable US-ASCII characters except for "&"
represent themselves; that is, characters with octet values 0x20-0x25
and 0x27-0x7e. The character "&" (0x26) is represented by the two-
octet sequence "&-".
All other characters (octet values 0x00-0x1f, 0x7f-0xff, and all
Unicode 16-bit octets) are represented in modified BASE64, with a
further modification from [UTF-7] that "," is used instead of "/".
Modified BASE64 MUST NOT be used to represent any printing US-ASCII
character which can represent itself.
"&" is used to shift to modified BASE64 and "-" to shift back to US-
ASCII. All names start in US-ASCII, and MUST end in US-ASCII (that
is, a name that ends with a Unicode 16-bit octet MUST end with a "-
").
For example, here is a mailbox name which mixes English, Japanese,
and Chinese text: ~peter/mail/&ZeVnLIqe-/&U,BTFw-
"""
import binascii
import codecs
# encoding
def modified_base64(s):
s = s.encode('utf-16be')
return binascii.b2a_base64(s).rstrip(b'\n=').replace(b'/', b',').decode('ascii')
def doB64(_in, r):
if _in:
r.append('&%s-' % modified_base64(''.join(_in)))
del _in[:]
def encoder(s):
r = []
_in = []
for c in s:
ordC = ord(c)
if 0x20 <= ordC <= 0x25 or 0x27 <= ordC <= 0x7e:
doB64(_in, r)
r.append(c)
elif c == '&':
doB64(_in, r)
r.append('&-')
else:
_in.append(c)
doB64(_in, r)
return (''.join(r).encode('ascii'), len(s))
# decoding
def modified_unbase64(s):
b = binascii.a2b_base64(s.replace(b',', b'/') + b'===')
return str(b, 'utf-16be')
def decoder(s):
r = []
decode = []
for c in s:
if c == b'&' and not decode:
decode.append(b'&')
elif c == b'-' and decode:
if len(decode) == 1:
r.append('&')
else:
r.append(modified_unbase64(b''.join(decode[1:])))
decode = []
elif decode:
decode.append(c)
else:
r.append(c.decode('ascii'))
if decode:
r.append(modified_unbase64(b''.join(decode[1:])))
bin_str = ''.join(r)
return (bin_str, len(s))
class StreamReader(codecs.StreamReader):
def decode(self, s, errors='strict'):
return decoder(s)
class StreamWriter(codecs.StreamWriter):
def decode(self, s, errors='strict'):
return encoder(s)
def imap4_utf_7(name):
if name == 'imap4-utf-7':
return (encoder, decoder, StreamReader, StreamWriter)
codecs.register(imap4_utf_7)