使用脚本查找并删除 osx 中的重复文件

Question 1

首先，您必须重新排序第一个命令行，以便保持 find 命令找到的文件的顺序：

find . -size 20 ! -type d -exec cksum {} \; | tee /tmp/f.tmp | cut -f 1,2 -d ‘ ‘ | sort | uniq -d | grep -hif – /tmp/f.tmp > duplicates.txt

（注意：为了在我的计算机上进行测试，我使用了find . -type f -exec cksum {} \;）

其次，打印除第一个重复项之外的所有重复项的一种方法是使用辅助文件，比如说/tmp/f2.tmp。然后我们可以执行以下操作：

while read line; do
    checksum=$(echo "$line" | cut -f 1,2 -d' ')
    file=$(echo "$line" | cut -f 3 -d' ')

    if grep "$checksum" /tmp/f2.tmp > /dev/null; then
        # /tmp/f2.tmp already contains the checksum
        # print the file name
        # (printf is safer than echo, when for example "$file" starts with "-")
        printf %s\\n "$file"
    else
        echo "$checksum" >> /tmp/f2.tmp
    fi
done < duplicates.txt

在运行此操作之前，只需确保它/tmp/f2.tmp存在并且为空，例如通过以下命令：

rm /tmp/f2.tmp
touch /tmp/f2.tmp

希望这有帮助=)

Answer

首先，您必须重新排序第一个命令行，以便保持 find 命令找到的文件的顺序：

find . -size 20 ! -type d -exec cksum {} \; | tee /tmp/f.tmp | cut -f 1,2 -d ‘ ‘ | sort | uniq -d | grep -hif – /tmp/f.tmp > duplicates.txt

（注意：为了在我的计算机上进行测试，我使用了find . -type f -exec cksum {} \;）

其次，打印除第一个重复项之外的所有重复项的一种方法是使用辅助文件，比如说/tmp/f2.tmp。然后我们可以执行以下操作：

while read line; do
    checksum=$(echo "$line" | cut -f 1,2 -d' ')
    file=$(echo "$line" | cut -f 3 -d' ')

    if grep "$checksum" /tmp/f2.tmp > /dev/null; then
        # /tmp/f2.tmp already contains the checksum
        # print the file name
        # (printf is safer than echo, when for example "$file" starts with "-")
        printf %s\\n "$file"
    else
        echo "$checksum" >> /tmp/f2.tmp
    fi
done < duplicates.txt

在运行此操作之前，只需确保它/tmp/f2.tmp存在并且为空，例如通过以下命令：

rm /tmp/f2.tmp
touch /tmp/f2.tmp

希望这有帮助=)

Question 2

另一个选择是使用 fdupes：

brew install fdupes
fdupes -r .

fdupes -r .在当前目录下递归查找重复文件。添加-d以删除重复文件 — 系统将提示您要保留哪些文件；如果您添加-dN，fdupes 将始终保留第一个文件并删除其他文件。

Answer

另一个选择是使用 fdupes：

brew install fdupes
fdupes -r .

fdupes -r .在当前目录下递归查找重复文件。添加-d以删除重复文件 — 系统将提示您要保留哪些文件；如果您添加-dN，fdupes 将始终保留第一个文件并删除其他文件。

Question 3

我编写了一个脚本，可以重命名您的文件以匹配其内容的哈希值。

它使用文件字节的子集，因此速度很快，并且如果发生冲突，它会在名称后附加一个计数器，如下所示：

3101ace8db9f.jpg
3101ace8db9f (1).jpg
3101ace8db9f (2).jpg

这样，您可以轻松地自行查看和删除重复项，而无需过度信任其他人的软件来处理您的照片。

脚本： https://gist.github.com/SimplGy/75bb4fd26a12d4f16da6df1c4e506562

Answer