如何根据文件夹中的顺序编号查找丢失的文件

如何根据文件夹中的顺序编号查找丢失的文件

我有包含 DPX 图像的文件夹,我希望能够检查文件命名是否顺序。

文件名的范围可以是:

帧 0000000.dpx 到帧 9999999.dpx

文件夹不可能包含这个完整范围,并且可能以上述序列中包含的任何数字开始和结束。开始的数字总是比结束的数字小。

任何帮助将不胜感激 :-)

答案1

#!/usr/bin/perl

# open the directory in the first arg (defaults to
# .) and get a sorted list of all files ending in
# .dpx into array @files
opendir(my $dir, shift // '.');
my @files = sort grep { /^Frame .*\.dpx$/ } readdir($dir);
close($dir);

# get the numeric value of the first and last
# element of the array
my ($first) = split /\./, $files[0];
my ($last)  = split /\./, $files[-1];

#print "$first\n$last\n";

# find and print any missing filenames
foreach my $i ($first..$last) {
  my $f = sprintf("%08i.dpx",$i);
  print "File '$f' is missing\n" unless -e $f
};

将其另存为,例如find-missing.pl,并使其可执行chmod +x find-missing.pl

首先,我需要随机创建一堆匹配文件以进行测试运行(对于此测试,十个或更少的文件就足够了):

$ for i in {0..9} ; do
    [ "$RANDOM" -gt 16384 ] && printf "%08i.dpx\0" "$i" ;
  done | xargs -0r touch

$ ls -l *.dpx
-rw-r--r-- 1 cas cas 0 Feb 24 13:30 00000000.dpx
-rw-r--r-- 1 cas cas 0 Feb 24 13:30 00000001.dpx
-rw-r--r-- 1 cas cas 0 Feb 24 13:30 00000003.dpx
-rw-r--r-- 1 cas cas 0 Feb 24 13:30 00000005.dpx
-rw-r--r-- 1 cas cas 0 Feb 24 13:30 00000006.dpx
-rw-r--r-- 1 cas cas 0 Feb 24 13:30 00000007.dpx
-rw-r--r-- 1 cas cas 0 Feb 24 13:30 00000008.dpx

在 bash 中,$RANDOM给出 0 到 32767 之间的随机数...因此for循环大约有 50% 的机会创建任何文件。在此运行中,您可以看到除了 00000002.dpx、00000004.dpx 和 00000009.dpx 之外的所有内容均已创建。

然后运行 ​​perl 脚本:

$ ./find-missing.pl .
File '00000002.dpx' is missing
File '00000004.dpx' is missing

注意:它没有提及,00000009.dpx因为这超出了找到的最大编号文件。如果您希望它这样做,则可以硬编码$last为合适的值,或者从命令行参数中获取它。


第二个版本,适用于以Frame.还允许通过脚本顶部的变量进行配置 - 顺便说一句,没有理由不能从命令行获取这些变量(在 array 中@ARGV,或者使用像获取选择::标准或者Getopt::长):

#!/usr/bin/perl

# Configuration variables
my $digits = 7;
my $prefix = 'Frame ';
my $suffix = '.dpx';

# Format string for printf
my $fmt = "$prefix%0${digits}i$suffix";

# Open the directory in the first arg (defaults to
# the current dir, ".") and get a sorted list of all
# files starting with $prefix and ending in $suffix
# into array @files
opendir(my $dir, shift // '.');
my @files = sort grep { /^$prefix.*$suffix$/ } readdir($dir);
close($dir);

# Get the numeric value of the first and last
# element of the array by removing the filename
# prefix (e.g. "Frame ") and suffix (e.g. ".dpx"):
my ($first, $last);
($first = $files[0])  =~ s/^$prefix|$suffix$//g;
($last  = $files[-1]) =~ s/^$prefix|$suffix$//g;

#print "$first\n$last\n";

# find and print any missing filenames
foreach my $i ($first..$last) {
  my $f = sprintf($fmt, $i);
  print "File '$f' is missing\n" unless -e $f
};

BTW,($first = $files[0]) =~ s/^$prefix|$suffix$//g;是一种常见的 Perl 习惯用法,用于为变量赋值并通过替换s///操作修改它。它相当于:

$first = $files[0];
$first =~ s/^$prefix|$suffix$//g;

要打印文件总数(以及丢失文件的数量),请将# find and print any missing filenames上述任一版本中的最后一个代码块(此后的所有内容)更改为:

# find and print any missing filenames
my $missing = 0;
foreach my $i ($first..$last) {
  my $f = sprintf($fmt, $i);
  if (! -e $f) {
    print "File '$f' is missing\n";
    $missing++;
  };
};

printf "\nTotal Number of files: %i\n", scalar @files;
printf "Number of missing files: %i\n", $missing;

这将产生如下输出:

$ ./find-missing2.pl 
File 'Frame 00000002.dpx' is missing
File 'Frame 00000003.dpx' is missing

Total Number of files: 7
Number of missing files: 2

答案2

蛮力法。

显示示例目录内容:

/tmp/dpx-test
-rw-------.  1 root root    0 Feb 23 21:02 0
-rw-------.  1 root root    0 Feb 23 18:59 0000000
-rw-------.  1 root root    0 Feb 23 21:03 0000000.aaa
-rw-------.  1 root root    0 Feb 23 18:57 0000000.dpx
-rw-------.  1 root root    0 Feb 23 18:58 0000001.dpx
-rw-------.  1 root root    0 Feb 23 18:58 0000002.dpx
-rw-------.  1 root root    0 Feb 23 18:58 0000003.dpx
-rw-------.  1 root root    0 Feb 23 18:58 0000004.dpx
-rw-------.  1 root root    0 Feb 23 18:58 0000005.dpx
-rw-------.  1 root root    0 Feb 23 18:58 0000006.dpx
-rw-------.  1 root root    0 Feb 23 18:58 0000007.dp
-rw-------.  1 root root    0 Feb 23 18:58 0000008.dpx
-rw-------.  1 root root    0 Feb 23 18:58 0000009.dpx
-rw-------.  1 root root    0 Feb 23 21:00 000000x.dpx
-rw-------.  1 root root    0 Feb 23 20:56 0000011.dpx
-rw-------.  1 root root    0 Feb 23 18:59 0000019.dpx
-rw-------.  1 root root    0 Feb 23 21:02 0000022.dpy
drwx------.  2 root root    6 Feb 23 19:05 x
-rw-------.  1 root root    0 Feb 23 21:00 x000999.dpx
-rw-------.  1 root root    0 Feb 23 18:59 xxxx
[user1:/dpx-test:]#
[user1:/tmp/dpx-test:]#
[user1:/tmp/dpx-test:]# ls -1 [0-9][0-9][0-9][0-9][0-9][0-9][0-9].dpx | wc -l
11
[user1:/tmp/dpx-test:]#
[user1:/tmp/dpx-test:]#
[user1:/tmp/dpx-test:]# ls -1 [0-9][0-9][0-9][0-9][0-9][0-9][0-9].dpx
0000000.dpx
0000001.dpx
0000002.dpx
0000003.dpx
0000004.dpx
0000005.dpx
0000006.dpx
0000008.dpx
0000009.dpx
0000011.dpx
0000019.dpx

这是脚本:

#!/bin/bash
D1="$1"  # Name of folder to check is passed in as argument #1
EXT='dpx'
echo "Checking for all missing ???????.dpx files."; echo
pushd "${D1}"
      # This work ASSUMES two or more '???????.dpx' files are in the given directory.
      # Gets first numbered file
      FstDPX="$(find . -type f -name '[0-9][0-9][0-9][0-9][0-9][0-9][0-9].dpx' | sort | head -1 | cut -d '/' -f2 | cut -d'.' -f1)"
      # Gets last numbered file
      LstDPX="$(find . -type f -name '[0-9][0-9][0-9][0-9][0-9][0-9][0-9].dpx' | sort | tail -1 | cut -d '/' -f2 | cut -d'.' -f1)"
      echo "First known file is:  ${FstDPX}.${EXT}"
      echo "Last  known file is:  ${LstDPX}.${EXT}"
      DPXcount="$(find . -type f -name '[0-9][0-9][0-9][0-9][0-9][0-9][0-9].'${EXT} | wc -l)"
      echo "Total number of '???????.dpx' files in $(pwd), is:  ${DPXcount}.";  echo
      if [ "${DPXcount}" -ge 3 ]; then
           # Convert value without leading zeros, and manually increment by 1(Fst) or 0(Lst).
           [ "${FstDPX}" == '0000000' ] && Fdpx="$(echo ${FstDPX} | awk '{print $0 + 1}')" \
                                        || Fdpx="$(echo ${FstDPX} | awk '{print $0 + 0}')"
           echo "FstDPX(${FstDPX}) ---- Fdpx(${Fdpx})  //First one to test for existance of//."
           Ldpx="$(echo ${LstDPX} | awk '{print $0 + 0}')"
           echo "LstDPX(${LstDPX}) ---- Ldpx(${Ldpx})."
           IDX="${Fdpx}"  # Established starting point to iterate through.
           echo "IDX(${IDX}) -- Fdpx(${Fdpx}) -- Ldpx(${Ldpx})"; echo

           echo "Now iterating through the directory and listing only those missing."; echo
           # Loop through UNTIL we've reached the end.
           until [ "${IDX}" -gt "${Ldpx}" ]; do
                   # Convert back to number with leading zeros.
                   IDXz=$(printf "%07d\n" ${IDX})
                   # Test if
                   [  ! -e "${IDXz}.dpx" ] && echo "File  '${IDXz}.dpx'  is missing"
                   let "IDX=IDX+1"
           done
      else
           echo; echo "Not enough '???????.dpx' files to process."; echo
      fi
popd

产生以下输出:

Checking for all missing ???????.dpx files.

/tmp/dpx-test ~
First known file is:  0000000.dpx
Last  known file is:  0000019.dpx
Total number of '???????.dpx' files in /tmp/dpx-test, is:  11.

FstDPX(0000000) ---- Fdpx(1)  //First one to test for existance of//.
LstDPX(0000019) ---- Ldpx(19).
IDX(1) -- Fdpx(1) -- Ldpx(19)

Now iterating through the directory and listing only those missing.

File  '0000007.dpx'  is missing
File  '0000010.dpx'  is missing
File  '0000012.dpx'  is missing
File  '0000013.dpx'  is missing
File  '0000014.dpx'  is missing
File  '0000015.dpx'  is missing
File  '0000016.dpx'  is missing
File  '0000017.dpx'  is missing
File  '0000018.dpx'  is missing

相关内容