通过 unix 命令行从文件组中批量删除较小的文件

Question 1

此脚本假设最小文件组和最大文件组之间的大小存在明显差距。具体来说，大文件中最小的文件至少是小文件中最大文件的两倍。

调用脚本“imagedirstats”并循环运行，如下所示：

find /path/to/main/branch -type d | while read subdir; do (cd "$subdir" && ~/bin/imagedirstats ); done

脚本如下：

#!/bin/bash
# from http://superuser.com/questions/135951/batch-deletion-of-smaller-files-from-group-of-files-via-unix-command-line
# by Dennis Williamson - 2010-04-29

prevn=1     # prevent division by zero
factor=4    # how close to the largest of the small files to set the threshold, 4 == one fourth of the way above
min=1000    # ignore files below this size

while read n
do
    (( ratio = n / prevn ))
    if (( ratio > 1 && n > min ))
    then
        break
    fi
    if (( n > 0 ))
    then
        prevn=$n
    fi
done < <(find . -maxdepth 1 -name "*.jpg" -printf "%s\n" | sort -n)
# for OS X, comment out the preceding line and uncomment this one:
# done < <(find . -maxdepth 1 -name "*.jpg" | stat -f "%z" | sort -n)

# the following line would be the GNU equivalent using stat(1) instead of printf
# it's included here for reference:
# done < <(find . -maxdepth 1 -name "*.jpg" | stat -c "%s" | sort -n)

(( size = (n - prevn) / factor + prevn ))

echo "Smallest of the large: $n"
echo "Largest of the small: $prevn"
echo "Ratio: $ratio"
echo "Threshold: $size"

if (( ratio < 2 ))
then
    read -p "Warning: ratio too small. Delete anyway? Only 'Yes' will proceed" reply
    if [[ $reply != "Yes" ]]
    then
        echo "Cancelled" >&2
        exit 1
    fi
fi

# uncomment the delete on the following line to actually do the deletion

find . -maxdepth 1 -name "*.jpg" -size -${size}c # -delete

编辑：移动了警告提示，以便首先显示有用的信息。修复了缺失的fi。

编辑2：使两个find命令一致。为 OS X 添加了注释掉的变体。添加了有关运行脚本的信息。

Answer

此脚本假设最小文件组和最大文件组之间的大小存在明显差距。具体来说，大文件中最小的文件至少是小文件中最大文件的两倍。

调用脚本“imagedirstats”并循环运行，如下所示：

find /path/to/main/branch -type d | while read subdir; do (cd "$subdir" && ~/bin/imagedirstats ); done

脚本如下：

#!/bin/bash
# from http://superuser.com/questions/135951/batch-deletion-of-smaller-files-from-group-of-files-via-unix-command-line
# by Dennis Williamson - 2010-04-29

prevn=1     # prevent division by zero
factor=4    # how close to the largest of the small files to set the threshold, 4 == one fourth of the way above
min=1000    # ignore files below this size

while read n
do
    (( ratio = n / prevn ))
    if (( ratio > 1 && n > min ))
    then
        break
    fi
    if (( n > 0 ))
    then
        prevn=$n
    fi
done < <(find . -maxdepth 1 -name "*.jpg" -printf "%s\n" | sort -n)
# for OS X, comment out the preceding line and uncomment this one:
# done < <(find . -maxdepth 1 -name "*.jpg" | stat -f "%z" | sort -n)

# the following line would be the GNU equivalent using stat(1) instead of printf
# it's included here for reference:
# done < <(find . -maxdepth 1 -name "*.jpg" | stat -c "%s" | sort -n)

(( size = (n - prevn) / factor + prevn ))

echo "Smallest of the large: $n"
echo "Largest of the small: $prevn"
echo "Ratio: $ratio"
echo "Threshold: $size"

if (( ratio < 2 ))
then
    read -p "Warning: ratio too small. Delete anyway? Only 'Yes' will proceed" reply
    if [[ $reply != "Yes" ]]
    then
        echo "Cancelled" >&2
        exit 1
    fi
fi

# uncomment the delete on the following line to actually do the deletion

find . -maxdepth 1 -name "*.jpg" -size -${size}c # -delete

编辑：移动了警告提示，以便首先显示有用的信息。修复了缺失的fi。

编辑2：使两个find命令一致。为 OS X 添加了注释掉的变体。添加了有关运行脚本的信息。

Question 2

如果您发现有一些特定的截止值，例如所有大图片都大于 200KB，那么您可以执行以下操作：

find */*.jpg -size -200k -delete

您可能需要先制作一个备份。

Answer

如果您发现有一些特定的截止值，例如所有大图片都大于 200KB，那么您可以执行以下操作：

find */*.jpg -size -200k -delete

您可能需要先制作一个备份。

Question 3

如果文件大小不一致，图像尺寸是否一致？

您可以使用identifyImageMagick 附带的工具来获取图像尺寸。通过一些简单的 bash 脚本，您可以根据图像的大小对其进行处理。

要获取图像的宽度和高度，请使用identify：

识别 -格式'％wx％h'文件名

您将获得如下输出：

[john@awesome:~]$ 识别 -格式'%wx%h' W4.JPG
1680x1050

然后您可以使用该cut实用程序来获取脚本中的数字：

[john@awesome:~]$ 识别 -格式'％wx％h'W4.JPG | 剪切-d'x'-f1
1680
[john@awesome:~]$ 识别 -格式'%wx%h' W4.JPG | 剪切 -d'x' -f2
1050

Answer