如何仅在特定文件具有特定大小时运行命令

Question 1

如果你有 GNU stat，你可以使用它的--printf选项来获取它的大小。

例如

size=$(stat --printf '%s' /cache/myfile.csv)
if [ "$size" -gt 5368709120 ] ; then  # 5 GiB = 5 * 1024 * 1024 * 1024
  echo "file is > 5GB"
fi

man stat详情请参阅。

BSD stat（例如在 FreeBSD 和 Mac 上）具有类似的格式化选项-f：

size=$(stat -f '%z' /cache/myfile.csv)

或者，您可以使用 perl 的内置stat函数或其-s文件测试运算符（与 bash 的文件测试类似-s，但它返回文件的大小，而不仅仅是在文件存在且非空时返回 true）。 perl 的 stat 函数返回有关包含以下数据的文件的元数据的 13 元素列表（数组）（从复制perldoc -f stat）：

[...] Not all fields are supported on all filesystem types. Here are
the meanings of the fields: 

  0 dev      device number of filesystem
  1 ino      inode number
  2 mode     file mode  (type and permissions)
  3 nlink    number of (hard) links to the file 
  4 uid      numeric user ID of file's owner
  5 gid      numeric group ID of file's owner
  6 rdev     the device identifier (special files only) 
  7 size     total size of file, in bytes
  8 atime    last access time in seconds since the epoch
  9 mtime    last modify time in seconds since the epoch
 10 ctime    inode change time in seconds since the epoch (*)
 11 blksize  preferred I/O size in bytes for interacting with the
             file (may vary from file to file)
 12 blocks   actual number of system-specific blocks allocated
             on disk (often, but not always, 512 bytes each) 

(The epoch was at 00:00 January 1, 1970 GMT.)

字段 7 就是我们需要的字段。

要返回文件的大小（以便稍后在 shell 命令或脚本中使用），请使用stat：

# stat
perl -e 'print scalar((stat(shift))[7])' /cache/myfile.csv

# -s
perl -e 'print -s shift' /cache/myfile.csv

或者用 perl 完成这一切：

# stat
perl -e 'print "File is > 5 GiB\n" if (stat(shift))[7] > 5*1024*1024*1024' /cache/myfile.csv

# -s
perl -e 'print "File is > 5 GiB\n" if -s shift > 5*1024*1024*1024' /cache/myfile.csv

请参阅perldoc -f statand perldoc -f -X（以及help testbash 中的）。

顺便说一句，perl 的shift函数删除数组的第一个元素（默认情况下@ARGV，如果未指定，则为命令行参数数组）并返回其值。它经常在循环中使用来处理数组的所有元素，但这里我们只对第一个参数（文件名）感兴趣。perldoc -f shift详细信息请参见参考资料，包括有关词法范围和在子例程中使用的注释。

Answer

如果你有 GNU stat，你可以使用它的--printf选项来获取它的大小。

例如

size=$(stat --printf '%s' /cache/myfile.csv)
if [ "$size" -gt 5368709120 ] ; then  # 5 GiB = 5 * 1024 * 1024 * 1024
  echo "file is > 5GB"
fi

man stat详情请参阅。

BSD stat（例如在 FreeBSD 和 Mac 上）具有类似的格式化选项-f：

size=$(stat -f '%z' /cache/myfile.csv)

或者，您可以使用 perl 的内置stat函数或其-s文件测试运算符（与 bash 的文件测试类似-s，但它返回文件的大小，而不仅仅是在文件存在且非空时返回 true）。 perl 的 stat 函数返回有关包含以下数据的文件的元数据的 13 元素列表（数组）（从复制perldoc -f stat）：

[...] Not all fields are supported on all filesystem types. Here are
the meanings of the fields: 

  0 dev      device number of filesystem
  1 ino      inode number
  2 mode     file mode  (type and permissions)
  3 nlink    number of (hard) links to the file 
  4 uid      numeric user ID of file's owner
  5 gid      numeric group ID of file's owner
  6 rdev     the device identifier (special files only) 
  7 size     total size of file, in bytes
  8 atime    last access time in seconds since the epoch
  9 mtime    last modify time in seconds since the epoch
 10 ctime    inode change time in seconds since the epoch (*)
 11 blksize  preferred I/O size in bytes for interacting with the
             file (may vary from file to file)
 12 blocks   actual number of system-specific blocks allocated
             on disk (often, but not always, 512 bytes each) 

(The epoch was at 00:00 January 1, 1970 GMT.)

字段 7 就是我们需要的字段。

要返回文件的大小（以便稍后在 shell 命令或脚本中使用），请使用stat：

# stat
perl -e 'print scalar((stat(shift))[7])' /cache/myfile.csv

# -s
perl -e 'print -s shift' /cache/myfile.csv

或者用 perl 完成这一切：

# stat
perl -e 'print "File is > 5 GiB\n" if (stat(shift))[7] > 5*1024*1024*1024' /cache/myfile.csv

# -s
perl -e 'print "File is > 5 GiB\n" if -s shift > 5*1024*1024*1024' /cache/myfile.csv

请参阅perldoc -f statand perldoc -f -X（以及help testbash 中的）。

顺便说一句，perl 的shift函数删除数组的第一个元素（默认情况下@ARGV，如果未指定，则为命令行参数数组）并返回其值。它经常在循环中使用来处理数组的所有元素，但这里我们只对第一个参数（文件名）感兴趣。perldoc -f shift详细信息请参见参考资料，包括有关词法范围和在子例程中使用的注释。

Question 2

要使用文件大小作为前提条件，您可以使用stat或者find：

[ -n "$(find /cache/myfile.csv -prune -size +5G 2>/dev/null)" ] && echo "file is > 5GB"

或者，如果目标命令（echo此处为）很短，则将其放入exec“find”部分

find /cache/myfile.csv -prune -size +5G -exec echo "file is > 5GB" \;

以防-prune万一myfile.csv是目录类型的文件，以防止find下降到其中。

Answer

要使用文件大小作为前提条件，您可以使用stat或者find：

[ -n "$(find /cache/myfile.csv -prune -size +5G 2>/dev/null)" ] && echo "file is > 5GB"

或者，如果目标命令（echo此处为）很短，则将其放入exec“find”部分

find /cache/myfile.csv -prune -size +5G -exec echo "file is > 5GB" \;

以防-prune万一myfile.csv是目录类型的文件，以防止find下降到其中。

Question 3

如果您需要在 shell 中处理文件，则两个版本仅在满足所有条件时才执行 shell 的命令：是一个文件、已命名myfile.csv且 > 5G：

find /cache -name 'myfile.csv' -type f -size +5G -exec bash -c '
    echo "$1 is > 5GB"
' bash {} \;

或者

find /cache -name 'myfile.csv' -type f -size +5G -exec bash -c '
    for file; do echo "$file is > 5GB"; done
' bash {} +

Answer

如果您需要在 shell 中处理文件，则两个版本仅在满足所有条件时才执行 shell 的命令：是一个文件、已命名myfile.csv且 > 5G：

find /cache -name 'myfile.csv' -type f -size +5G -exec bash -c '
    echo "$1 is > 5GB"
' bash {} \;

或者

find /cache -name 'myfile.csv' -type f -size +5G -exec bash -c '
    for file; do echo "$file is > 5GB"; done
' bash {} +

Question 4

请注意，某些 shell 具有内置功能。

SHELL=/bin/tcsh
* * * * * if (-Z /cache/myfile.csv > 5*1024*1024*1024) echo 'file is > 5GiB'

或者使用zsh，这里使用 glob 限定符和匿名函数，尽管 zsh 也有一个stat 早于 GNU 和 BSD 的内置函数stat：

SHELL=/bin/zsh
* * * * * (){ if (($#)) echo 'file is > 5GiB'; } /cache/myfile.csv(NLG+5)

（请注意，与 for 一样find -size +5G，我们在这里讨论的是千兆字节（1GiB = 1,073,741,824 字节），而不是千兆字节（1GB = 1,000,000,000 字节））

对于符号链接，tcsh将获取它最终解析为的文件的大小，而像s 这样zsh的LG+5限定符将检查符号链接本身的大小。更改为检查符号链接解析后的大小。的内置函数默认在符号链接解析后为您提供信息，以更改它。在 GNU 和 BSD 中，情况正好相反。与where告诉它遵循符号链接相同。find-size-LG+5zshstat-Lstatfind-L

有关获取文件大小的更多方法，请参阅如何在 bash 脚本中获取文件的大小？

Answer