由于某种原因(在我开始这个项目之前就发生了)——我客户的网站的每个文件都有 2 个副本。这实际上使网站的大小增加了三倍。
这些文件看起来非常像这样:
wp-comments-post.php | 3,982 bytes
wp-comments-post (john smith's conflicted copy 2012-01-12).php | 3,982 bytes
wp-comments-post (JohnSmith's conflicted copy 2012-01-14).php | 3,982 bytes
该网站所在的主机无法访问 bash 或 SSH。
您认为删除这些重复文件的最简单且耗时最少的方法是什么?
答案1
我在 PowerShell 中编写了一个重复查找器脚本,使用WinSCP .NET 程序集。
此脚本的最新版本和增强版本现可作为 WinSCP 扩展使用
在 SFTP/FTP 服务器中查找重复文件。
该脚本首先迭代远程目录树并查找大小相同的文件。当找到任何文件时,它会默认下载文件并在本地进行比较。
如果您知道服务器支持用于计算校验和的协议扩展-remoteChecksumAlg
,可以通过添加开关,让脚本向服务器询问校验和,从而节省文件下载时间,提高脚本效率。
powershell.exe -File find_duplicates.ps1 -sessionUrl ftp://user:[email protected]/ -remotePath /path
脚本如下:
param (
# Use Generate URL function to obtain a value for -sessionUrl parameter.
$sessionUrl = "sftp://user:mypassword;[email protected]/",
[Parameter(Mandatory)]
$remotePath,
$remoteChecksumAlg = $Null
)
function FileChecksum ($remotePath)
{
if (!($checksums.ContainsKey($remotePath)))
{
if ($remoteChecksumAlg -eq $Null)
{
Write-Host "Downloading file $remotePath..."
# Download file
$localPath = [System.IO.Path]::GetTempFileName()
$transferResult = $session.GetFiles($remotePath, $localPath)
if ($transferResult.IsSuccess)
{
$stream = [System.IO.File]::OpenRead($localPath)
$checksum = [BitConverter]::ToString($sha1.ComputeHash($stream))
$stream.Dispose()
Write-Host "Downloaded file $remotePath checksum is $checksum"
Remove-Item $localPath
}
else
{
Write-Host ("Error downloading file ${remotePath}: " +
$transferResult.Failures[0])
$checksum = $False
}
}
else
{
Write-Host "Request checksum for file $remotePath..."
$buf = $session.CalculateFileChecksum($remoteChecksumAlg, $remotePath)
$checksum = [BitConverter]::ToString($buf)
Write-Host "File $remotePath checksum is $checksum"
}
$checksums[$remotePath] = $checksum
}
return $checksums[$remotePath]
}
function FindDuplicatesInDirectory ($remotePath)
{
Write-Host "Finding duplicates in directory $remotePath ..."
try
{
$directoryInfo = $session.ListDirectory($remotePath)
foreach ($fileInfo in $directoryInfo.Files)
{
$remoteFilePath = ($remotePath + "/" + $fileInfo.Name)
if ($fileInfo.IsDirectory)
{
# Skip references to current and parent directories
if (($fileInfo.Name -ne ".") -and
($fileInfo.Name -ne ".."))
{
# Recurse into subdirectories
FindDuplicatesInDirectory $remoteFilePath
}
}
else
{
Write-Host ("Found file $($fileInfo.FullName) " +
"with size $($fileInfo.Length)")
if ($sizes.ContainsKey($fileInfo.Length))
{
$checksum = FileChecksum($remoteFilePath)
foreach ($otherFilePath in $sizes[$fileInfo.Length])
{
$otherChecksum = FileChecksum($otherFilePath)
if ($checksum -eq $otherChecksum)
{
Write-Host ("Checksums of files $remoteFilePath and " +
"$otherFilePath are identical")
$duplicates[$remoteFilePath] = $otherFilePath
}
}
}
else
{
$sizes[$fileInfo.Length] = @()
}
$sizes[$fileInfo.Length] += $remoteFilePath
}
}
}
catch [Exception]
{
Write-Host "Error processing directory ${remotePath}: $($_.Exception.Message)"
}
}
try
{
# Load WinSCP .NET assembly
Add-Type -Path "WinSCPnet.dll"
# Setup session options from URL
$sessionOptions = New-Object WinSCP.SessionOptions
$sessionOptions.ParseUrl($sessionUrl)
$session = New-Object WinSCP.Session
$session.SessionLogPath = "session.log"
try
{
# Connect
$session.Open($sessionOptions)
$sizes = @{}
$checksums = @{}
$duplicates = @{}
$sha1 = [System.Security.Cryptography.SHA1]::Create()
# Start recursion
FindDuplicatesInDirectory $remotePath
}
finally
{
# Disconnect, clean up
$session.Dispose()
}
# Print results
Write-Host
if ($duplicates.Count -gt 0)
{
Write-Host "Duplicates found:"
foreach ($path1 in $duplicates.Keys)
{
Write-Host "$path1 <=> $($duplicates[$path1])"
}
}
else
{
Write-Host "No duplicates found."
}
exit 0
}
catch [Exception]
{
Write-Host "Error: $($_.Exception.Message)"
exit 1
}
(我是 WinSCP 的作者)
答案2
编辑:使用 ftpfs 在本地挂载点挂载远程 ftp 文件系统,然后使用此处详述的任何其他方法。
如果所有文件都符合该语法,则可以例如
rbos@chili:~/tmp$ touch asdf.php
rbos@chili:~/tmp$ touch "asdf (blah blah blah).php"
rbos@chili:~/tmp$ touch "asdf (blah blah rawr).php"
rbos@chili:~/tmp$ find | grep "(.*)"
./asdf (blah blah rawr).php
./asdf (blah blah blah).php
匹配文件,然后将其放入 xargs 或循环中以检查列表:
find | grep "(.*)" | while read i; do echo "$i";done | less
一旦您确认列表准确无误,就将其替换echo
为。rm
答案3
您可以使用FSlint查找重复的文件。
答案4
运行这个:find /yourdir -name "*conflicted copy*" -type f -ls
如果列出的文件是您想要删除的文件,请更改-ls
为-delete
并再次运行。
我建议您在执行此操作之前先使用 tar 备份您的基本目录......
编辑:我刚刚意识到您无权访问 shell 会话,因此这对您不起作用......
你可能需要这样的东西:http://www.go4expert.com/forums/showthread.php?t=2348以递归方式转储文件列表,然后创建另一个脚本,仅删除您想要的文件。