删除网络托管 FTP 服务器上的重复文件的最佳方法是什么?

删除网络托管 FTP 服务器上的重复文件的最佳方法是什么?

由于某种原因(在我开始这个项目之前就发生了)——我客户的网站的每个文件都有 2 个副本。这实际上使网站的大小增加了三倍。

这些文件看起来非常像这样:

wp-comments-post.php    |    3,982 bytes
wp-comments-post (john smith's conflicted copy 2012-01-12).php    |    3,982 bytes
wp-comments-post (JohnSmith's conflicted copy 2012-01-14).php    |    3,982 bytes

该网站所在的主机无法访问 bash 或 SSH。

您认为删除这些重复文件的最简单且耗时最少的方法是什么?

答案1

我在 PowerShell 中编写了一个重复查找器脚本,使用WinSCP .NET 程序集

此脚本的最新版本和增强版本现可作为 WinSCP 扩展使用
在 SFTP/FTP 服务器中查找重复文件

该脚本首先迭代远程目录树并查找大小相同的文件。当找到任何文件时,它会默认下载文件并在本地进行比较。

如果您知道服务器支持用于计算校验和的协议扩展-remoteChecksumAlg,可以通过添加开关,让脚本向服务器询问校验和,从而节省文件下载时间,提高脚本效率。

powershell.exe -File find_duplicates.ps1 -sessionUrl ftp://user:[email protected]/ -remotePath /path

脚本如下:

param (
    # Use Generate URL function to obtain a value for -sessionUrl parameter.
    $sessionUrl = "sftp://user:mypassword;[email protected]/",
    [Parameter(Mandatory)]
    $remotePath,
    $remoteChecksumAlg = $Null
)

function FileChecksum ($remotePath)
{
    if (!($checksums.ContainsKey($remotePath)))
    {
        if ($remoteChecksumAlg -eq $Null)
        {
            Write-Host "Downloading file $remotePath..."
            # Download file
            $localPath = [System.IO.Path]::GetTempFileName()
            $transferResult = $session.GetFiles($remotePath, $localPath)

            if ($transferResult.IsSuccess)
            {
                $stream = [System.IO.File]::OpenRead($localPath)
                $checksum = [BitConverter]::ToString($sha1.ComputeHash($stream))
                $stream.Dispose()

                Write-Host "Downloaded file $remotePath checksum is $checksum"

                Remove-Item $localPath
            }
            else
            {
                Write-Host ("Error downloading file ${remotePath}: " +
                    $transferResult.Failures[0])
                $checksum = $False
            }
        }
        else
        {
            Write-Host "Request checksum for file $remotePath..."
            $buf = $session.CalculateFileChecksum($remoteChecksumAlg, $remotePath)
            $checksum = [BitConverter]::ToString($buf)
            Write-Host "File $remotePath checksum is $checksum"
        }

        $checksums[$remotePath] = $checksum
    }

    return $checksums[$remotePath]
}

function FindDuplicatesInDirectory ($remotePath)
{
    Write-Host "Finding duplicates in directory $remotePath ..."

    try
    {
        $directoryInfo = $session.ListDirectory($remotePath)

        foreach ($fileInfo in $directoryInfo.Files)
        {
            $remoteFilePath = ($remotePath + "/" + $fileInfo.Name) 

            if ($fileInfo.IsDirectory)
            {
                # Skip references to current and parent directories
                if (($fileInfo.Name -ne ".") -and
                    ($fileInfo.Name -ne ".."))
                {
                    # Recurse into subdirectories
                    FindDuplicatesInDirectory $remoteFilePath
                }
            }
            else
            {
                Write-Host ("Found file $($fileInfo.FullName) " +
                    "with size $($fileInfo.Length)")

                if ($sizes.ContainsKey($fileInfo.Length))
                {
                    $checksum = FileChecksum($remoteFilePath)

                    foreach ($otherFilePath in $sizes[$fileInfo.Length])
                    {
                        $otherChecksum = FileChecksum($otherFilePath)

                        if ($checksum -eq $otherChecksum)
                        {
                            Write-Host ("Checksums of files $remoteFilePath and " +
                                "$otherFilePath are identical")
                            $duplicates[$remoteFilePath] = $otherFilePath
                        }
                    }
                }
                else
                {
                    $sizes[$fileInfo.Length] = @()
                }

                $sizes[$fileInfo.Length] += $remoteFilePath
            }
        }
    }
    catch [Exception]
    {
        Write-Host "Error processing directory ${remotePath}: $($_.Exception.Message)"
    }
}

try
{
    # Load WinSCP .NET assembly
    Add-Type -Path "WinSCPnet.dll"

    # Setup session options from URL
    $sessionOptions = New-Object WinSCP.SessionOptions
    $sessionOptions.ParseUrl($sessionUrl)

    $session = New-Object WinSCP.Session
    $session.SessionLogPath = "session.log"

    try
    {
        # Connect
        $session.Open($sessionOptions)

        $sizes = @{}
        $checksums = @{}
        $duplicates = @{}

        $sha1 = [System.Security.Cryptography.SHA1]::Create()

        # Start recursion
        FindDuplicatesInDirectory $remotePath
    }
    finally
    {
        # Disconnect, clean up
        $session.Dispose()
    }

    # Print results
    Write-Host

    if ($duplicates.Count -gt 0)
    {
        Write-Host "Duplicates found:"

        foreach ($path1 in $duplicates.Keys)
        {
            Write-Host "$path1 <=> $($duplicates[$path1])"
        }
    }
    else
    {
        Write-Host "No duplicates found."
    }

    exit 0
}
catch [Exception]
{
    Write-Host "Error: $($_.Exception.Message)"
    exit 1
}

(我是 WinSCP 的作者)

答案2

编辑:使用 ftpfs 在本地挂载点挂载远程 ftp 文件系统,然后使用此处详述的任何其他方法。

如果所有文件都符合该语法,则可以例如

rbos@chili:~/tmp$ touch asdf.php
rbos@chili:~/tmp$ touch "asdf (blah blah blah).php"
rbos@chili:~/tmp$ touch "asdf (blah blah rawr).php"
rbos@chili:~/tmp$ find | grep "(.*)"
./asdf (blah blah rawr).php
./asdf (blah blah blah).php

匹配文件,然后将其放入 xargs 或循环中以检查列表:

find | grep "(.*)" | while read i; do echo "$i";done | less

一旦您确认列表准确无误,就将其替换echo为。rm

答案3

您可以使用FSlint查找重复的文件。

答案4

运行这个:find /yourdir -name "*conflicted copy*" -type f -ls

如果列出的文件是您想要删除的文件,请更改-ls-delete并再次运行。

我建议您在执行此操作之前先使用 tar 备份您的基本目录......

编辑:我刚刚意识到您无权访问 shell 会话,因此这对您不起作用......

你可能需要这样的东西:http://www.go4expert.com/forums/showthread.php?t=2348以递归方式转储文件列表,然后创建另一个脚本,仅删除您想要的文件。

相关内容