BASH:查找重复文件(兼容 MAC/LINUX)

BASH:查找重复文件(兼容 MAC/LINUX)

我正在寻找一个与 Mac 兼容的 bash 脚本,用于查找目录中的重复文件。

答案1

不知道 Mac 兼容性如何,但是对我有用(TM):

#!/bin/bash
[ -n "$1" ] || exit 1
exec 9< <( find "$1" -type f -print0 )
while IFS= read -r -d '' -u 9
do
    file_path="$(readlink -fn -- "$REPLY"; echo x)"
    file_path="${file_path%x}"
    exec 8< <( find "$1" -type f -not -path "$file_path" -print0 )
    while IFS= read -r -d '' -u 8 OTHER
    do
        cmp --quiet -- "$REPLY" "$OTHER"
        case $? in
            0)
                echo -n "cmp -- "
                printf %q "${REPLY}"
                echo -n ' '
                printf %q "${OTHER}"
                echo ""
                break
                ;;
            2)
                echo "\`cmp\` failed!"
                exit 2
                ;;
            *)
                :
                ;;
        esac
    done
done

结果是一组命令,您可以运行它们来检查结果是否正确:)

编辑:最后一个版本适用于非常奇怪的文件名,例如:

$'/tmp/--$`\\! *@ \a\b\E\E\f\r\t\v\\"\' \n'

答案2

它在我的 Mac 上对我有用,你将通过它们的 md5 值捕获重复的文件:

find ./ -type f -exec md5 {} \; | awk -F '=' '{print $2 "\t" $1}' | sort

答案3

这将在目录下查找重复文件。虽然这个方法很原始,但确实有效。

#!/bin/bash

CKSUMPROG=md5sum
TMPFILE=${TMPDIR:-/tmp}/duplicate.$$
trap "rm -f $TMPFILE" EXIT INT

if [ ! -d "$1" ]
then
    echo "usage $0 directory" >2
    exit 1
fi

PRINTBLANK=
# dump fingerprints from all target files into a tmpfile
find "$1" -type f 2> /dev/null | xargs $CKSUMPROG  > $TMPFILE 2> /dev/null

# get fingerprints from tmpfile, get the ones with duplicates which means multiple files with same contents
for DUPEMD5 in $(cut -d ' ' -f 1 $TMPFILE | sort  | uniq -c | sort -rn | grep -v '^  *1 ' | sed 's/^ *[1-9][0-9]* //')
do
    if [ -z "$PRINTBLANK" ]
    then
        PRINTBLANK=1
    else
        echo
        echo
    fi

    grep "^${DUPEMD5} " $TMPFILE | gawk '{print $2}'
done

答案4

我还建议使用命令行fdupes https://macappstore.org/fdupes/

您可以像这样运行/扫描目录:

fdupes -r /directory/to/scan/for/file/dups

# You can redirect the results to a file
fdupes -r /directory/to/scan/for/file/dups > /tmp/duplicates.txt

# You can also delete flag ( -d ) while you find
# I less prefer this option
# I add this flag only after I scan with my eyes
# the results from the above command ..
fdupes -r -d /directory/to/scan/for/file/dups

相关内容