测试

Question 1

如果您愿意简单地使用命令行工具，而不必创建 shell 脚本，则fdupes大多数发行版上都可以使用该程序来执行此操作。

还有fslint具有相同功能的基于 GUI 的工具。

Answer

如果您愿意简单地使用命令行工具，而不必创建 shell 脚本，则fdupes大多数发行版上都可以使用该程序来执行此操作。

还有fslint具有相同功能的基于 GUI 的工具。

Question 2

该解决方案将在 O(n) 时间内找到重复项。每个文件都有一个为其生成的校验和，并且每个文件依次通过关联数组与一组已知校验和进行比较。

#!/bin/bash
#
# Usage:  ./delete-duplicates.sh  [<files...>]
#
declare -A filecksums

# No args, use files in current directory
test 0 -eq $# && set -- *

for file in "$@"
do
    # Files only (also no symlinks)
    [[ -f "$file" ]] && [[ ! -h "$file" ]] || continue

    # Generate the checksum
    cksum=$(cksum <"$file" | tr ' ' _)

    # Have we already got this one?
    if [[ -n "${filecksums[$cksum]}" ]] && [[ "${filecksums[$cksum]}" != "$file" ]]
    then
        echo "Found '$file' is a duplicate of '${filecksums[$cksum]}'" >&2
        echo rm -f "$file"
    else
        filecksums[$cksum]="$file"
    fi
done

如果您未在命令行上指定任何文件（或通配符），它将使用当前目录中的文件集。它将比较多个目录中的文件，但不会递归到目录本身。

该集中的“第一个”文件始终被视为最终版本。不考虑文件时间、权限或所有权。仅考虑内容。

当您确定它能满足您的要求时，请将其echo从行中删除。rm -f "$file"请注意，如果您要替换该行，则ln -f "${filecksums[$cksum]}" "$file"可以硬链接内容。同样节省磁盘空间，但不会丢失文件名。

Answer

该解决方案将在 O(n) 时间内找到重复项。每个文件都有一个为其生成的校验和，并且每个文件依次通过关联数组与一组已知校验和进行比较。

#!/bin/bash
#
# Usage:  ./delete-duplicates.sh  [<files...>]
#
declare -A filecksums

# No args, use files in current directory
test 0 -eq $# && set -- *

for file in "$@"
do
    # Files only (also no symlinks)
    [[ -f "$file" ]] && [[ ! -h "$file" ]] || continue

    # Generate the checksum
    cksum=$(cksum <"$file" | tr ' ' _)

    # Have we already got this one?
    if [[ -n "${filecksums[$cksum]}" ]] && [[ "${filecksums[$cksum]}" != "$file" ]]
    then
        echo "Found '$file' is a duplicate of '${filecksums[$cksum]}'" >&2
        echo rm -f "$file"
    else
        filecksums[$cksum]="$file"
    fi
done

如果您未在命令行上指定任何文件（或通配符），它将使用当前目录中的文件集。它将比较多个目录中的文件，但不会递归到目录本身。

该集中的“第一个”文件始终被视为最终版本。不考虑文件时间、权限或所有权。仅考虑内容。

当您确定它能满足您的要求时，请将其echo从行中删除。rm -f "$file"请注意，如果您要替换该行，则ln -f "${filecksums[$cksum]}" "$file"可以硬链接内容。同样节省磁盘空间，但不会丢失文件名。

Question 3

脚本中的主要问题似乎是i将实际文件名作为值，而j只是一个数字。将名称放入数组并使用i和j作为索引应该可以工作：

files=(*)
count=${#files[@]}
for (( i=0 ; i < count ;i++ )); do 
    for (( j=i+1 ; j < count ; j++ )); do
        if diff -q "${files[i]}" "${files[j]}"  >/dev/null ; then
            echo "${files[i]} and ${files[j]} are the same"
        fi
    done
done

（似乎可以与 Bash 和ksh/ ksh93Debian 一起使用。）

该赋值操作将使用两个元素和（索引为 0 和 1）来a=(this that)初始化数组。分词和通配符照常工作，因此使用当前目录中所有文件的名称（点文件除外）进行初始化。将扩展到数组的所有元素，哈希符号要求长度，数组中元素的数量也是如此。（请注意，这将是数组的第一个元素，并且是第一个元素的长度，而不是数组！）athisthatfiles=(*)files"${files[@]}"${#files[@]}${files}${#files}

for i in `/folder/*`

这里的反引号肯定是一个错字吗？您将作为命令运行第一个文件，并将其余文件作为参数提供给它。

Answer

脚本中的主要问题似乎是i将实际文件名作为值，而j只是一个数字。将名称放入数组并使用i和j作为索引应该可以工作：

files=(*)
count=${#files[@]}
for (( i=0 ; i < count ;i++ )); do 
    for (( j=i+1 ; j < count ; j++ )); do
        if diff -q "${files[i]}" "${files[j]}"  >/dev/null ; then
            echo "${files[i]} and ${files[j]} are the same"
        fi
    done
done

（似乎可以与 Bash 和ksh/ ksh93Debian 一起使用。）

该赋值操作将使用两个元素和（索引为 0 和 1）来a=(this that)初始化数组。分词和通配符照常工作，因此使用当前目录中所有文件的名称（点文件除外）进行初始化。将扩展到数组的所有元素，哈希符号要求长度，数组中元素的数量也是如此。（请注意，这将是数组的第一个元素，并且是第一个元素的长度，而不是数组！）athisthatfiles=(*)files"${files[@]}"${#files[@]}${files}${#files}

for i in `/folder/*`

这里的反引号肯定是一个错字吗？您将作为命令运行第一个文件，并将其余文件作为参数提供给它。

Question 4

顺便说一句，使用校验和或哈希是个好主意。我的脚本没有使用它。但如果文件很小并且文件数量不大（例如 10-20 个文件），则此脚本将运行得相当快。如果你有 100 个或更多文件，每个文件有 1000 行，那么时间将超过 10 秒。

用法： ./duplicate_removing.sh files/*

#!/bin/bash

for target_file in "$@"; do
    shift
    for candidate_file in "$@"; do
        compare=$(diff -q "$target_file" "$candidate_file")
        if [ -z "$compare" ]; then
            echo the "$target_file" is a copy "$candidate_file"
            echo rm -v "$candidate_file"
        fi
    done
done

测试

创建随机文件： ./creating_random_files.sh

#!/bin/bash

file_amount=10
files_dir="files"

mkdir -p "$files_dir"

while ((file_amount)); do
    content=$(shuf -i 1-1000)
    echo "$RANDOM" "$content" | tee "${files_dir}/${file_amount}".txt{,.copied} > /dev/null
    ((file_amount--))
done

跑步 ./duplicate_removing.sh files/* 并得到输出

the files/10.txt is a copy files/10.txt.copied
rm -v files/10.txt.copied
the files/1.txt is a copy files/1.txt.copied
rm -v files/1.txt.copied
the files/2.txt is a copy files/2.txt.copied
rm -v files/2.txt.copied
the files/3.txt is a copy files/3.txt.copied
rm -v files/3.txt.copied
the files/4.txt is a copy files/4.txt.copied
rm -v files/4.txt.copied
the files/5.txt is a copy files/5.txt.copied
rm -v files/5.txt.copied
the files/6.txt is a copy files/6.txt.copied
rm -v files/6.txt.copied
the files/7.txt is a copy files/7.txt.copied
rm -v files/7.txt.copied
the files/8.txt is a copy files/8.txt.copied
rm -v files/8.txt.copied
the files/9.txt is a copy files/9.txt.copied
rm -v files/9.txt.copied

Answer

顺便说一句，使用校验和或哈希是个好主意。我的脚本没有使用它。但如果文件很小并且文件数量不大（例如 10-20 个文件），则此脚本将运行得相当快。如果你有 100 个或更多文件，每个文件有 1000 行，那么时间将超过 10 秒。

用法： ./duplicate_removing.sh files/*

#!/bin/bash

for target_file in "$@"; do
    shift
    for candidate_file in "$@"; do
        compare=$(diff -q "$target_file" "$candidate_file")
        if [ -z "$compare" ]; then
            echo the "$target_file" is a copy "$candidate_file"
            echo rm -v "$candidate_file"
        fi
    done
done

测试

创建随机文件： ./creating_random_files.sh

#!/bin/bash

file_amount=10
files_dir="files"

mkdir -p "$files_dir"

while ((file_amount)); do
    content=$(shuf -i 1-1000)
    echo "$RANDOM" "$content" | tee "${files_dir}/${file_amount}".txt{,.copied} > /dev/null
    ((file_amount--))
done

跑步 ./duplicate_removing.sh files/* 并得到输出

the files/10.txt is a copy files/10.txt.copied
rm -v files/10.txt.copied
the files/1.txt is a copy files/1.txt.copied
rm -v files/1.txt.copied
the files/2.txt is a copy files/2.txt.copied
rm -v files/2.txt.copied
the files/3.txt is a copy files/3.txt.copied
rm -v files/3.txt.copied
the files/4.txt is a copy files/4.txt.copied
rm -v files/4.txt.copied
the files/5.txt is a copy files/5.txt.copied
rm -v files/5.txt.copied
the files/6.txt is a copy files/6.txt.copied
rm -v files/6.txt.copied
the files/7.txt is a copy files/7.txt.copied
rm -v files/7.txt.copied
the files/8.txt is a copy files/8.txt.copied
rm -v files/8.txt.copied
the files/9.txt is a copy files/9.txt.copied
rm -v files/9.txt.copied

测试

答案1

答案2

答案3

答案4

测试

相关内容