我有一个文件夹,里面装满了从我解散的 IMAP 帐户中保存的电子邮件。
文件名是每封电子邮件的主题行。
不幸的是,现在当使用非 ASCII 编码时,主题行看起来就像它们内部看起来一样 - 它们将以前缀=_
和使用的编码结尾:
=_UTF-8_Q_Auftragsbest=C3=A4tigung_(Kundennummer__)_=_20100819_150312_37.eml
=_windows-1252_Q_Best=E4tigung=3A_Wir_haben_Ihre_=_20100819_150310_28.eml
有人知道可以在文件系统级别大规模修复此问题的工具吗?
解决方案必须是 1. 删除=_ENCODING
前缀 2. 如果可能的话,将文件名中的编码字符转换为其适当的文件系统等效变音符号。
我使用的是 Windows 7 或 XP,但我准备将其带到 Linux VM,因为它是大的文件夹和自动化解决方案将是伟大的。
答案1
我为自己编写了一个 PHP 脚本。我想分享它,以防其他人遇到类似的问题。它对我和我需要的编码都有效(您可能需要扩展编码数组)。
该脚本将 MIME 编码文件名称将整个指定的目录结构递归转换为 UTF-8。
它无法产生完全完美的结果:有几个特殊字符被双重转换,或者根本没有转换。据我所知,这是 IMAP 导出器的错误或电子邮件本身的编码信息不正确。
mb_decode_mimeheader()
是整个事情的核心。
发布到公共领域;不提供任何保证。需要 PHP 5.2。
它应该在 CLI 和 Web 上运行;我在浏览器中对其进行了测试。
在对数据运行此类脚本之前,请先进行备份。
<?php
/* Directory to parse */
$dir = "D:/IMAP";
/* Extensions to parse. Leave empty for none */
$extensions = array("eml");
/* Set to true to actually run the renaming */
define ("GO", true);
/* No need to change past this point */
/* Output content type header if not in CLI */
if (strtolower(php_sapi_name()) != "CLI")
header("Content-type: text/plain; charset=utf-8");
$FixNames = new FixEmlNames($dir, $extensions);
$FixNames->fixAll();
class FixEmlNames
{
/* List of possible encodings here */
private $encodings = array("iso-8859-1", "iso-8859-15", "windows-1252", "utf-8");
/* Encoding Prefix. The exporter exports e.g. =_iso-8859-1_ with underscores
instead of question marks */
private $encoding_prefix = "=_";
/* Encoding postfix */
private $encoding_postfix = "_";
/* Temporary storage for files */
private $files;
/* Array of file extensions to process. Leave empty to parse all files and directories */
private $extensions = array();
/* Count of renamed files */
private $count = 0;
/* Count of failed renames */
private $failed = 0;
/* Count of skipped renames */
private $skipped = 0;
/* Transform forbidden characters in host OS */
private $transform_characters = array(":" => "_", "?" => "_", ">" => "_");
function __construct($dir, $extensions = array("eml"))
{
$this->files = new RecursiveIteratorIterator(new RecursiveDirectoryIterator($dir));
$this->extensions = $extensions;
}
function fixAll()
{
echo "Starting....\n";
while($this->files->valid())
{
if (!$this->files->isDot())
{
$path = $this->files->key();
$ext = pathinfo($path, PATHINFO_EXTENSION);
if ((count($this->extensions) == 0 ) or (in_array($ext, $this->extensions)))
$this->renameOne($path);
}
$this->files->next();
}
echo "Done. ";
/* Show stats */
$status = array();
if ($this->count > 0) $status[] = $this->count." OK";
if ($this->failed > 0) $status[] = $this->failed." failed";
if ($this->skipped > 0) $status[] = $this->skipped." skipped";
echo implode(", ", $status);
}
function renameOne($fullPath)
{
$filename = pathinfo($fullPath, PATHINFO_BASENAME);
$is_mime = false;
// See whether file name is MIME encoded or not
foreach ($this->encodings as $encoding)
{ if (stristr($filename, $this->encoding_prefix.$encoding.$this->encoding_postfix))
$is_mime = true;
}
// No MIME encoding? Skip.
if (!$is_mime)
{
# uncomment to see skipped files
# echo "Skipped: $filename\n";
$this->skipped++;
return true;
}
mb_internal_encoding("UTF-8");
$filename = str_replace("_", "?", $filename); // Question marks were converted to underscores
$filename = mb_decode_mimeheader($filename);
$filename = str_replace("?", "_", $filename);
// Remove forbidden characters
$filename = strtr($filename, $this->transform_characters);
// Rename
if (constant("GO") == true)
{
// We catch the error manually
$old = error_reporting(0);
$success = rename($fullPath, realpath(dirname($fullPath)).DIRECTORY_SEPARATOR.$filename);
error_reporting($old);
if ($success)
{
echo "OK: $filename\n";
$this->count++;
return true;
}
else
{
$error = error_get_last();
$message = $error["message"];
$this->failed++;
echo "Failed renaming $fullPath. Error message: ".$message."\n";
return false;
}
}
else
{
$this->count++;
echo "Simulation: $filename\n";
return true;
}
}
}
答案2
既然您愿意迁移到 Linux,您可以在其上安装一个 php 服务器,并编写一个相当简单的脚本来重新编码文件。难度取决于您是否曾经做过编程。您可以在php.net
这些是您需要的功能
<?php
opendir ( string $path [, resource $context ] )
readdir ([ resource $dir_handle ] )
file_get_contents(ENTER THE FILE NAMES HERE WITH A VARIABLE PASSED FROM readdir)
preg_replace(REGULAR EXPRESSION TO REMOVE THE =ENCODING part of the filename)
string mb_convert_encoding ( string $str , string $to_encoding [, mixed $from_encoding ] )
file_put_contents(THE NEW FILE NAME.eml)
?>