在 Windows 上确定和更改文件名编码

Question 1

我可以使用下一个简单的 Powershell 脚本重现您的问题

$RatedName = "šöü"                            # set sample string
$FormDName = $RatedName.Normalize("FormD")    # its Canonical Decomposition
$FormCName = $FormDName.Normalize("FormC")    #     followed by Canonical Composition
                                              # list each string character by character
($RatedName,$FormDName,$FormCName) | ForEach-Object {
    $charArr = [char[]]$_ 
    "$_"      # display string in new line for better readability
              # display each character together with its Unicode codepoint
    For( $i=0; $i -lt $charArr.Count; $i++ ) { 
        $charInt = [int]$charArr[$i]
        # next "Try-Catch-Finally" code snippet adopted from my "Alt KeyCode Finder"
        #                                       http://superuser.com/a/1047961/376602
        Try {    
            # Get-CharInfo module downloadable from http://poshcode.org/5234
            #        to add it into the current session: use Import-Module cmdlet
            $charInt | Get-CharInfo |% {
                $ChUCode = $_.CodePoint
                $ChCtgry = $_.Category
                $ChDescr = $_.Description
            }
        }
        Catch {
            $ChUCode = "U+{0:x4}" -f $charInt
            if ( $charInt -le 0x1F -or ($charInt -ge 0x7F -and $charInt -le 0x9F)) 
                 { $ChCtgry = "Control" } else { $ChCtgry = "" }
            $ChDescr = ""
        }
        Finally { $ChOut = $charArr[$i] }
        "{0} {1,-2} {2} {3,5} {4}" -f $i, $charArr[$i], $ChUCode, $charInt, $ChDescr
    }
}
# create sample files
$RatedName | Out-File "D:\test\1097217Rated$RatedName.txt" -Encoding utf8
$FormDName | Out-File "D:\test\1097217FormD$FormDName.txt" -Encoding utf8
$FormCName | Out-File "D:\test\1097217FormC$FormCName.txt" -Encoding utf8


""                                 # very artless draft of possible solution
Get-ChildItem "D:\test\1097217*" | ForEach-Object {
    $y = $_.Name.Normalize("FormC")
    if ( $y.Length -ne $_.Name.Length ) {
        Rename-Item -NewName $y -LiteralPath $_ -WhatIf
    } else {
        "       : file name is already normalized $_"
    }
}

上面的脚本是更新如下：第一显示有关组合/分解 Unicode 字符的更多信息，即它们的 Unicode 名称（参见Get-CharInfo 模块（英文）：第二嵌入非常朴实的草稿可能的解决方案。
输出从cmd提示：

==> powershell -c D:\PShell\SU\1097217.ps1
šöü
0 š  U+0161   353 Latin Small Letter S With Caron
1 ö  U+00F6   246 Latin Small Letter O With Diaeresis
2 ü  U+00FC   252 Latin Small Letter U With Diaeresis
šöü
0 s  U+0073   115 Latin Small Letter S
1 ̌  U+030C   780 Combining Caron
2 o  U+006F   111 Latin Small Letter O
3 ̈  U+0308   776 Combining Diaeresis
4 u  U+0075   117 Latin Small Letter U
5 ̈  U+0308   776 Combining Diaeresis
šöü
0 š  U+0161   353 Latin Small Letter S With Caron
1 ö  U+00F6   246 Latin Small Letter O With Diaeresis
2 ü  U+00FC   252 Latin Small Letter U With Diaeresis

       : file name is already normalized D:\test\1097217FormCšöü.txt
What if: Performing the operation "Rename File" on target "Item: D:\test\1097217
FormDšöü.txt Destination: D:\test\1097217FormDšöü.txt".
       : file name is already normalized D:\test\1097217Ratedšöü.txt

==> dir /b D:\test\1097217*
1097217FormCšöü.txt
1097217FormDšöü.txt
1097217Ratedšöü.txt

事实上，以上dir输出看起来1097217FormDsˇo¨u¨.txt就像在cmd窗口和我的 unicode 感知浏览器中一样组成字符串如上所列，但unicode 分析器真实展现人物以及最新形象：

然而，下一个示例充分展示了问题：for循环发生了变化结合强调普通的那些：

==> for /F "delims=" %G in ('dir /b /S D:\test\1097217*') do @echo %~nxG & dir /B %~fG
1097217FormCšöü.txt
1097217FormCšöü.txt
1097217FormDsˇo¨u¨.txt
File Not Found
1097217Ratedšöü.txt
1097217Ratedšöü.txt

==>

这是非常朴实无华的草稿可能的解决方案（参见上面的输出）：

""                                 # very artless draft of possible solution
Get-ChildItem "D:\test\1097217*" | ForEach-Object {
    $y = $_.Name.Normalize("FormC")
    if ( $y.Length -ne $_.Name.Length ) {
        Rename-Item -NewName $y -LiteralPath $_ -WhatIf
    } else {
        "       : file name is already normalized $_"
    }
}

~~（去做：仅在必要时调用Rename-Item）：~~

~~Get-ChildItem "D:\test\1097217*" | ForEach-Object { $y = $_.Name.Normalize("FormC") if ($true) { ### ToDo Rename-Item -NewName $y -LiteralPath $_ -WhatIf } }~~

~~及其输出~~（同样，这里呈现组合字符串下图显示cmd窗口看起来没有偏差):

What if: Performing the operation "Rename File" on target "Item: D:\test\1097217 FormCšöü.txt Destination: D:\test\1097217FormCšöü.txt". What if: Performing the operation "Rename File" on target "Item: D:\test\1097217 FormDšöü.txt Destination: D:\test\1097217FormDšöü.txt". What if: Performing the operation "Rename File" on target "Item: D:\test\1097217 Ratedšöü.txt Destination: D:\test\1097217Ratedšöü.txt".

更新cmd输出

Answer

我可以使用下一个简单的 Powershell 脚本重现您的问题

$RatedName = "šöü"                            # set sample string
$FormDName = $RatedName.Normalize("FormD")    # its Canonical Decomposition
$FormCName = $FormDName.Normalize("FormC")    #     followed by Canonical Composition
                                              # list each string character by character
($RatedName,$FormDName,$FormCName) | ForEach-Object {
    $charArr = [char[]]$_ 
    "$_"      # display string in new line for better readability
              # display each character together with its Unicode codepoint
    For( $i=0; $i -lt $charArr.Count; $i++ ) { 
        $charInt = [int]$charArr[$i]
        # next "Try-Catch-Finally" code snippet adopted from my "Alt KeyCode Finder"
        #                                       http://superuser.com/a/1047961/376602
        Try {    
            # Get-CharInfo module downloadable from http://poshcode.org/5234
            #        to add it into the current session: use Import-Module cmdlet
            $charInt | Get-CharInfo |% {
                $ChUCode = $_.CodePoint
                $ChCtgry = $_.Category
                $ChDescr = $_.Description
            }
        }
        Catch {
            $ChUCode = "U+{0:x4}" -f $charInt
            if ( $charInt -le 0x1F -or ($charInt -ge 0x7F -and $charInt -le 0x9F)) 
                 { $ChCtgry = "Control" } else { $ChCtgry = "" }
            $ChDescr = ""
        }
        Finally { $ChOut = $charArr[$i] }
        "{0} {1,-2} {2} {3,5} {4}" -f $i, $charArr[$i], $ChUCode, $charInt, $ChDescr
    }
}
# create sample files
$RatedName | Out-File "D:\test\1097217Rated$RatedName.txt" -Encoding utf8
$FormDName | Out-File "D:\test\1097217FormD$FormDName.txt" -Encoding utf8
$FormCName | Out-File "D:\test\1097217FormC$FormCName.txt" -Encoding utf8


""                                 # very artless draft of possible solution
Get-ChildItem "D:\test\1097217*" | ForEach-Object {
    $y = $_.Name.Normalize("FormC")
    if ( $y.Length -ne $_.Name.Length ) {
        Rename-Item -NewName $y -LiteralPath $_ -WhatIf
    } else {
        "       : file name is already normalized $_"
    }
}

上面的脚本是更新如下：第一显示有关组合/分解 Unicode 字符的更多信息，即它们的 Unicode 名称（参见Get-CharInfo 模块（英文）：第二嵌入非常朴实的草稿可能的解决方案。
输出从cmd提示：

==> powershell -c D:\PShell\SU\1097217.ps1
šöü
0 š  U+0161   353 Latin Small Letter S With Caron
1 ö  U+00F6   246 Latin Small Letter O With Diaeresis
2 ü  U+00FC   252 Latin Small Letter U With Diaeresis
šöü
0 s  U+0073   115 Latin Small Letter S
1 ̌  U+030C   780 Combining Caron
2 o  U+006F   111 Latin Small Letter O
3 ̈  U+0308   776 Combining Diaeresis
4 u  U+0075   117 Latin Small Letter U
5 ̈  U+0308   776 Combining Diaeresis
šöü
0 š  U+0161   353 Latin Small Letter S With Caron
1 ö  U+00F6   246 Latin Small Letter O With Diaeresis
2 ü  U+00FC   252 Latin Small Letter U With Diaeresis

       : file name is already normalized D:\test\1097217FormCšöü.txt
What if: Performing the operation "Rename File" on target "Item: D:\test\1097217
FormDšöü.txt Destination: D:\test\1097217FormDšöü.txt".
       : file name is already normalized D:\test\1097217Ratedšöü.txt

==> dir /b D:\test\1097217*
1097217FormCšöü.txt
1097217FormDšöü.txt
1097217Ratedšöü.txt

事实上，以上dir输出看起来1097217FormDsˇo¨u¨.txt就像在cmd窗口和我的 unicode 感知浏览器中一样组成字符串如上所列，但unicode 分析器真实展现人物以及最新形象：

然而，下一个示例充分展示了问题：for循环发生了变化结合强调普通的那些：

==> for /F "delims=" %G in ('dir /b /S D:\test\1097217*') do @echo %~nxG & dir /B %~fG
1097217FormCšöü.txt
1097217FormCšöü.txt
1097217FormDsˇo¨u¨.txt
File Not Found
1097217Ratedšöü.txt
1097217Ratedšöü.txt

==>

这是非常朴实无华的草稿可能的解决方案（参见上面的输出）：

""                                 # very artless draft of possible solution
Get-ChildItem "D:\test\1097217*" | ForEach-Object {
    $y = $_.Name.Normalize("FormC")
    if ( $y.Length -ne $_.Name.Length ) {
        Rename-Item -NewName $y -LiteralPath $_ -WhatIf
    } else {
        "       : file name is already normalized $_"
    }
}

~~（去做：仅在必要时调用Rename-Item）：~~

~~Get-ChildItem "D:\test\1097217*" | ForEach-Object { $y = $_.Name.Normalize("FormC") if ($true) { ### ToDo Rename-Item -NewName $y -LiteralPath $_ -WhatIf } }~~

~~及其输出~~（同样，这里呈现组合字符串下图显示cmd窗口看起来没有偏差):

What if: Performing the operation "Rename File" on target "Item: D:\test\1097217 FormCšöü.txt Destination: D:\test\1097217FormCšöü.txt". What if: Performing the operation "Rename File" on target "Item: D:\test\1097217 FormDšöü.txt Destination: D:\test\1097217FormDšöü.txt". What if: Performing the operation "Rename File" on target "Item: D:\test\1097217 Ratedšöü.txt Destination: D:\test\1097217Ratedšöü.txt".

更新cmd输出

Question 2

根据 JosefZ 的脚本，这里有一个以递归方式工作的修改版本：

Get-ChildItem "X:\" -Recurse | ForEach-Object {
    $y = $_.Name.Normalize("FormC")
    $file = $_.Fullname
    if ( $y.Length -ne $_.Name.Length ) {
        Rename-Item -LiteralPath "$file" -NewName "$y" -WhatIf
        Write-Host "renamed file $file"
    }
}

测试后删除-WhatIf。我遇到了路径太长的问题，但那是另一篇文章的主题。

Answer

根据 JosefZ 的脚本，这里有一个以递归方式工作的修改版本：

Get-ChildItem "X:\" -Recurse | ForEach-Object {
    $y = $_.Name.Normalize("FormC")
    $file = $_.Fullname
    if ( $y.Length -ne $_.Name.Length ) {
        Rename-Item -LiteralPath "$file" -NewName "$y" -WhatIf
        Write-Host "renamed file $file"
    }
}

测试后删除-WhatIf。我遇到了路径太长的问题，但那是另一篇文章的主题。

Question 3

问题出在这个标签页地区控制面板：

这不仅影响屏幕字体，还影响文件系统（基本上如您所述）。

截图来自我的机器。如果我将语言环境更改为英语，所有特殊的斯洛伐克国家字符，如ľôščťž 在文件名中会变成垃圾，而有些甚至会完全阻止打开文件（已测试……），没有解决方法（直到代码页被恢复）。但是，这个问题不会出现在更常见的国家字符上，例如áíé在许多语言中都可以看到。

这也会影响一些离线媒体，例如尝试打开使用不同语言环境制作的备份。

最简单的解决方案是让所有访问资源的机器保持相同的语言环境。

解决方法是确定哪台机器具有不同的语言环境，然后从该机器对所有文件名中的所有国家字符（例如č-> c、ž-> z）进行批量替换。 Total Commander（文件管理器）可以一次性替换整个目录树中的每一对字符。然后，您可以将该机器恢复为英语（请注意，它可能无法读取自己的备份），或保持原样，要求用户不要在文件名中使用国家字符。

（但在此之前，您可以尝试一件事：您可以chcp在具有不同语言环境的机器上运行，了解正在使用的代码页（例如 852），然后在其他机器上尝试chcp 852。不确定这是否能令人满意地解决问题。）

Answer