我的 bash 脚本有什么问题？

Question

看起来你：

获取文件列表
然后它们的大小等等
然后生成相同大小的文件的 md5sum
并打印出具有相同 md5sum 的

我不会尝试修复 awk 代码。相反，请注意，您正在复制该fdupes命令的功能。来自手册页：

Searches  the  given  path for duplicate files. Such files are found by
comparing file sizes and MD5 signatures,  followed  by  a  byte-by-byte
comparison.

我强烈建议您使用它，而不是为此编写复杂的脚本。

如果不这样做，消除大小检查可以更容易地找到重复项：

$ md5sum * | sort -k1,1 | uniq -w32 -D
b1946ac92492d2347c6235b4d2611184  file1
b1946ac92492d2347c6235b4d2611184  file2
b1946ac92492d2347c6235b4d2611184  file3

所有哈希值md5sums的宽度均为 32 个字符，因此很容易告诉打印uniq仅比较这 32 个字符并打印找到的所有重复项。

如果您绝对必须进行尺寸检查，那么它会变得相当复杂，但仍然比您的脚本简单。find可以打印文件大小，因此无需ls混合使用：

find . -maxdepth 1 -type f -printf "%s/%P\n" | 
  awk -F/ '       # Use / as delimiter, it wont appear in filename
  s[$1]++ {       # if the file size has appeared before
    if (n[$1]) {  # if the first name hasnt been printed yet
      print n[$1] # print it and delete it
      n[$1] = "";
    }
    print $2;     # print filename with duplicated size
    next
  } {n[$1] = $2}  # save filename for each new size encountered'

这个 awk 命令将打印所有大小重复的文件。

现在，只需使用md5sum | sort | uniq前面提到的管道：

find -maxdepth 1 -type f -printf "%s/%P\n" | 
  awk -F/ 's[$1]++ {if (n[$1]){print n[$1]} print $2; n[$1] = ""; next} {n[$1] = $2}' |
  xargs -d '\n' md5sum |
  sort -k1,1 |
  uniq -w32 -D

Answer 1