通过管道 ls 和 awk 到 rsync

Question 1

该perl脚本应该执行您想要的操作：给定一个以 NUL 分隔的文件名列表（例如，来自find -print0），输出最近修改的文件名列表，只要这些文件的总大小不超过 1GB（默认）。您可以在命令行上指定最大大小的演出数量 - 这可以是任何有效的数字、整数或浮点数。

NUL 分隔符意味着这适用于任何文件名，即使它们包含空格或换行符。

$ cat select-newest-one-gig.pl
#! /usr/bin/perl -0

use strict;

my $gigs = shift || 1;

my $maxsize = $gigs * 1024 * 1024 * 1024 ;  # 1GB
my $total = 0;

# a hash to contain the list of input filenames and their modtimes
my %filemtimes=();

# hash to contain the list of input filenames and their sizes
my %filesizes=();

# a hash to contain a list of filenames to output.
# use a hash for this so we don't need to write a `uniq` function.
my %outfiles=();

while (<>) {
   chomp;

   # 7th field of stat() is size in bytes.
   # 9th field of stat() is modime in secs since epoch

   my ($size,$mtime) = (stat($_))[7,9];
   $filesizes{$_} = $size;
   $filemtimes{$_} = $mtime;
}

# iterate through the %filemtimes hash in order of reverse mtime
foreach (reverse sort { $filemtimes{$b} <=> $filemtimes{$a} } keys %filemtimes) {
   my $size = $filesizes{$_};

   # add it to our list of filenames to print if it won't exceed $maxsize
   if (($size + $total) <= $maxsize) {
       $total += $size;
       $outfiles{$_}++;
   }
}

# now iterate through the %filesizes hash in order of reverse size
# just in case we can sequeeze in a few more files.
foreach (reverse sort { $filesizes{$b} <=> $filesizes{$a} } keys %filesizes) {
   my $size = $filesizes{$_};
   if (($size + $total) < $maxsize) {
       $total += $size;
       $outfiles{$_}++;
   }
}

# now print our list of files.  choose one of the following, for
# newline separated filenames or NUL-separated.   
#print join("\n", sort keys %outfiles), "\n";
print join("\000", sort keys %outfiles), "\000";

将其另存为select-newest-one-gig.pl并使其可执行chmod +x。

像这样运行它（例如，最大总文件大小为 10GB）：

find /volume1/cctv/ -type f -iname '*.mp4' -print0 | ./select-newest-one-gig.pl 10

这个 perl 脚本可以很容易地修改为采用一个或多个文件扩展名（例如.mp4）作为参数，然后使用system()函数调用运行 find 本身并迭代它而不是while (<>)。将的输出通过管道传输到其中可能更简单find- 为什么要重新发明轮子？

以下 perl 脚本将列出（或删除，如果取消注释最后一行）rsync 目标目录中存在的文件不是列在标准输入上。它假定 NUL 分隔的输入，因此即使文件名包含换行符也是安全的。

$ cat unlink-others.pl
#! /usr/bin/perl -0

use strict;

my @files=();

# first arg is target dir, with default
my $targetdir = shift || '/path/to/rsync/target/dir/';

while (<>) {
    chomp;
    s/^.*\///;  # strip path
    push @files, quotemeta($_)
}
my $regexp=join("|",@files);

opendir(my $dh, $targetdir) || die "can't opendir $targetdir: $!\n";
my @delete = grep { ! /^($regexp)$/o && -f "$targetdir/$_" } readdir($dh);
closedir $dh;

print join(", ",@delete),"\n";
# uncomment next line if you're sure it will only delete what you want
# unlink @delete

像这样使用它：

find /volume1/cctv/ -type f -iname '*.mp4' -print0 | \
    ./select-newest-one-gig.pl 10 > /tmp/files.list

rsync --from0 --files-from /tmp/files.list ... /path/to/rsync/target/dir/

./unlink-others.pl /path/to/rsync/target/dir/ < /tmp/files.list

Answer

该perl脚本应该执行您想要的操作：给定一个以 NUL 分隔的文件名列表（例如，来自find -print0），输出最近修改的文件名列表，只要这些文件的总大小不超过 1GB（默认）。您可以在命令行上指定最大大小的演出数量 - 这可以是任何有效的数字、整数或浮点数。