我如何找到超过 200K 个不同名称的文件和输出

Question 1

我会做类似的事情（假设使用 GNU 工具）：

find /mnt/SAN/documents -type f -print0 | awk -F / '
  NR == FNR{check[$0]; next}
  $NF in check {print "found:", $0; delete check[$NF]}
  END {
    for (i in check)
      print "Not found:", i
  }' filename.list RS='\0' -

这将在filename.list.

或者报告所有发生的情况：

find /mnt/SAN/documents -type f -print0 | awk -F / '
  NR == FNR{check[$0]; notfound[$0]; next}
  $NF in check {print "found:", $0; delete notfound[$NF]}
  END {
    for (i in notfound)
      print "Not found:", i
  }' filename.list RS='\0' -

Answer

我会做类似的事情（假设使用 GNU 工具）：

find /mnt/SAN/documents -type f -print0 | awk -F / '
  NR == FNR{check[$0]; next}
  $NF in check {print "found:", $0; delete check[$NF]}
  END {
    for (i in check)
      print "Not found:", i
  }' filename.list RS='\0' -

这将在filename.list.

或者报告所有发生的情况：

find /mnt/SAN/documents -type f -print0 | awk -F / '
  NR == FNR{check[$0]; notfound[$0]; next}
  $NF in check {print "found:", $0; delete notfound[$NF]}
  END {
    for (i in notfound)
      print "Not found:", i
  }' filename.list RS='\0' -

Question 2

使用类似的东西

find /mnt/SAN/documents/ -type f | perl -ple 's,^.*/,,' > files_currently_present

生成当前在光盘上的文件列表，不带路径，然后使用

comm -2 -3 filelist_from database files_currently_present

将其与备份中的列表进行比较并生成消息文件列表。

Answer

使用类似的东西

find /mnt/SAN/documents/ -type f | perl -ple 's,^.*/,,' > files_currently_present

生成当前在光盘上的文件列表，不带路径，然后使用

comm -2 -3 filelist_from database files_currently_present

将其与备份中的列表进行比较并生成消息文件列表。

Question 3

最简单的方法是使用 shell 循环从文件中读取文件名，然后find在后台运行多个命令：

while IFS= read -r file; do
    find /mnt/SAN/documents/ -type f -name "$file" &
done < fileList.txt > foundFiles.txt

然而，这将启动 200K 以上的实例，find并且可能会让您的机器瘫痪。更好的方法是构建一个复杂的find命令，为其指定每个文件名：

$ printf 'find /mnt/SAN/documents/ -type f '; while IFS= read -r file; do printf -- '-name "%s" -o ' "$file"; done < fileList.txt | sed 's/-o $/\n/'
find /mnt/SAN/documents/ -type f -name "49" -o -name "50" -o -name "51" -o -name "52"

然后，您可以通过复制/粘贴或使用以下命令来运行命令本身：

eval $(printf 'find /mnt/SAN/documents/ -type f '; \
    while IFS= read -r file; do 
        printf -- '-name "%s" -o ' "$file"; done < fileList.txt | 
            sed 's/-o $/\n/')

但是，如果文件太多，这也会中断，因此您需要批量运行它：

for i in $(seq 1 100 $(wc -l < fileList.txt)); do 
    k=$((i+100)); 
    printf 'find /mnt/SAN/documents/ -type f '; 
    sed -n "$i,${k}p" fileList.txt | 
    while IFS= read -r file; do 
        printf -- '-name "%s" -o ' "$file"; 
    done  | sed 's/-o $/\n/';   
done

find这将为列表中的每批 100 个文件创建单独的命令，您可以eval如上所示执行这些命令，或者只是保存在文件中并运行该文件：

for i in $(seq 1 100 $(wc -l < fileList.txt)); do 
    k=$((i+100)); 
    printf 'find /mnt/SAN/documents/ -type f '; 
    sed -n "$i,${k}p" fileList.txt | 
    while IFS= read -r file; do 
        printf -- '-name "%s" -o ' "$file"; 
    done  | sed 's/-o $/\n/';   
done > script.sh && bash script.sh > foundFiles.txt

注意斯蒂芬的方法，从现有文件开始并检查丢失的文件几乎肯定会更好（除非现有文件比丢失的文件多）。同样，您可以首先构建所有现有文件的列表，然后将comm其与目标文件列表进行比较（因为您说您有一个文件列表，所以我假设您的文件名永远不会包含换行符）：

find /mnt/SAN/documents/ -type f | sort > found
comm -13 <(sort found) <(sort fileList.txt)

该命令将打印中但不在中的comm所有行。fileList.txtfound

Answer