在 Windows 上,我需要查找目录中包含 UTF-8 BOM 的所有文件(字节顺序标记)。哪个工具可以做到这一点以及如何做到这一点?
它可以是 PowerShell 脚本、某些文本编辑器的高级搜索功能或其他任何功能。
答案1
以下是 PowerShell 脚本的示例。它在C:
路径中查找前 3 个字节为 的任何文件0xEF, 0xBB, 0xBF
。
Function ContainsBOM
{
return $input | where {
$contents = [System.IO.File]::ReadAllBytes($_.FullName)
$_.Length -gt 2 -and $contents[0] -eq 0xEF -and $contents[1] -eq 0xBB -and $contents[2] -eq 0xBF }
}
get-childitem "C:\*.*" | where {!$_.PsIsContainer } | ContainsBOM
是否有必要“ReadAllBytes”?也许只读取前几个字节会更好?
不错。这是更新后的版本,只读取前 3 个字节。
Function ContainsBOM
{
return $input | where {
$contents = new-object byte[] 3
$stream = [System.IO.File]::OpenRead($_.FullName)
$stream.Read($contents, 0, 3) | Out-Null
$stream.Close()
$contents[0] -eq 0xEF -and $contents[1] -eq 0xBB -and $contents[2] -eq 0xBF }
}
get-childitem "C:\*.*" | where {!$_.PsIsContainer -and $_.Length -gt 2 } | ContainsBOM
答案2
附注一下,下面是我用来从源文件中删除 UTF-8 BOM 字符的 PowerShell 脚本:
$files=get-childitem -Path . -Include @("*.h","*.cpp") -Recurse
foreach ($f in $files)
{
(Get-Content $f.PSPath) |
Foreach-Object {$_ -replace "\xEF\xBB\xBF", ""} |
Set-Content $f.PSPath
}
答案3
如果你在一台企业电脑上(像我一样)具有受限权限并且无法运行 powershell 脚本,你可以使用便携式 Notepad++Python脚本插件来执行任务,使用以下脚本:
import os;
import sys;
filePathSrc="C:\\Temp\\UTF8"
for root, dirs, files in os.walk(filePathSrc):
for fn in files:
if fn[-4:] != '.jar' and fn[-5:] != '.ear' and fn[-4:] != '.gif' and fn[-4:] != '.jpg' and fn[-5:] != '.jpeg' and fn[-4:] != '.xls' and fn[-4:] != '.GIF' and fn[-4:] != '.JPG' and fn[-5:] != '.JPEG' and fn[-4:] != '.XLS' and fn[-4:] != '.PNG' and fn[-4:] != '.png' and fn[-4:] != '.cab' and fn[-4:] != '.CAB' and fn[-4:] != '.ico':
notepad.open(root + "\\" + fn)
console.write(root + "\\" + fn + "\r\n")
notepad.runMenuCommand("Encoding", "Convert to UTF-8 without BOM")
notepad.save()
notepad.close()
功劳归于https://pw999.wordpress.com/2013/08/19/mass-convert-a-project-to-utf-8-using-notepad/
答案4
Powershell 测试前两个字节。-eq 等运算符的右侧变成字符串。
dir -file |
% { $utf8bom = '239 187' -eq (get-content $_.fullname -AsByteStream)[0..1]
[pscustomobject]@{name=$_.name; utf8bom=$utf8bom} }
name utf8bom
---- -------
foo False
script.ps1 True
script.ps1~ False