Bash - 根据部分文件名列表检查目录中的文件

Question 1

遍历文件，根据文件名称中包含的 uuid 创建关联数组（我使用参数扩展来提取 uuid）。然后，读取列表，检查每个 uuid 的关联数组，并报告文件是否被记录。

#!/bin/bash
uuid_list=...

declare -A file_for
for file in *_*_* ; do
    uuid=${file%%_*}
    file_for[$uuid]=1
done

while read -r uuid name ; do
    [[ $uuid = \#* ]] && continue
    if [[ ${file_for[$uuid]} ]] ; then
        echo "File for $name has arrived."
    else
        echo "File for $name missing!"
    fi
done < "$uuid_list"

Answer

遍历文件，根据文件名称中包含的 uuid 创建关联数组（我使用参数扩展来提取 uuid）。然后，读取列表，检查每个 uuid 的关联数组，并报告文件是否被记录。

#!/bin/bash
uuid_list=...

declare -A file_for
for file in *_*_* ; do
    uuid=${file%%_*}
    file_for[$uuid]=1
done

while read -r uuid name ; do
    [[ $uuid = \#* ]] && continue
    if [[ ${file_for[$uuid]} ]] ; then
        echo "File for $name has arrived."
    else
        echo "File for $name missing!"
    fi
done < "$uuid_list"

Question 2

这是一个更“bashy”和简洁的方法：

#!/bin/bash

## Read the UUIDs into the array 'uuids'. Using awk
## lets us both skip comments and only keep the UUID
mapfile -t uuids < <(awk '!/^\s*#/{print $1}' uuids.txt)

## Iterate over each UUID
for uuid in ${uuids[@]}; do
        ## Set the special array $_ (the positional parameters: $1, $2 etc)
        ## to the glob matching the UUID. This will be all file/directory
        ## names that start with this UUID.
        set -- "${source_directory}"/"${uuid}"*
        ## If no files matched the glob, no file named $1 will exist
        [[ -e "$1" ]] && echo "YES : $1" || echo  "PANIC $uuid" 
done

请注意，虽然上述方法非常漂亮，并且对于一些文件来说可以正常工作，但它的速度取决于 UUID 的数量，并且非常如果需要处理很多数据，速度会很慢。如果是这种情况，请使用 @choroba 的解决方案，或者，为了获得真正快速的解决方案，请避免使用 shell 并调用perl：

#!/bin/bash

source_directory="."
perl -lne 'BEGIN{
            opendir(D,"'"$source_directory"'"); 
            foreach(readdir(D)){ /((.+?)_.*)/; $f{$2}=$1; }
           } 
           s/\s.*//; $f{$_} ? print "YES: $f{$_}" : print "PANIC: $_"' uuids.txt

为了说明时间差异，我在一个包含 20000 个 UUID 的文件上测试了我的 bash 方法、choroba 方法和 perl 方法，其中 18001 个 UUID 有对应的文件名。请注意，每个测试都是通过将脚本的输出重定向到来运行的/dev/null。

我的狂欢（约 3.5 分钟）

real   3m39.775s
user   1m26.083s
sys    2m13.400s

Choroba's（重击，~0.7 秒）

real   0m0.732s
user   0m0.697s
sys    0m0.037s

我的 perl (约 0.1 秒)：

real   0m0.100s
user   0m0.093s
sys    0m0.013s

Answer

这是一个更“bashy”和简洁的方法：

#!/bin/bash

## Read the UUIDs into the array 'uuids'. Using awk
## lets us both skip comments and only keep the UUID
mapfile -t uuids < <(awk '!/^\s*#/{print $1}' uuids.txt)

## Iterate over each UUID
for uuid in ${uuids[@]}; do
        ## Set the special array $_ (the positional parameters: $1, $2 etc)
        ## to the glob matching the UUID. This will be all file/directory
        ## names that start with this UUID.
        set -- "${source_directory}"/"${uuid}"*
        ## If no files matched the glob, no file named $1 will exist
        [[ -e "$1" ]] && echo "YES : $1" || echo  "PANIC $uuid" 
done

请注意，虽然上述方法非常漂亮，并且对于一些文件来说可以正常工作，但它的速度取决于 UUID 的数量，并且非常如果需要处理很多数据，速度会很慢。如果是这种情况，请使用 @choroba 的解决方案，或者，为了获得真正快速的解决方案，请避免使用 shell 并调用perl：

#!/bin/bash

source_directory="."
perl -lne 'BEGIN{
            opendir(D,"'"$source_directory"'"); 
            foreach(readdir(D)){ /((.+?)_.*)/; $f{$2}=$1; }
           } 
           s/\s.*//; $f{$_} ? print "YES: $f{$_}" : print "PANIC: $_"' uuids.txt

为了说明时间差异，我在一个包含 20000 个 UUID 的文件上测试了我的 bash 方法、choroba 方法和 perl 方法，其中 18001 个 UUID 有对应的文件名。请注意，每个测试都是通过将脚本的输出重定向到来运行的/dev/null。

我的狂欢（约 3.5 分钟）

real   3m39.775s
user   1m26.083s
sys    2m13.400s

Choroba's（重击，~0.7 秒）

real   0m0.732s
user   0m0.697s
sys    0m0.037s

我的 perl (约 0.1 秒)：

real   0m0.100s
user   0m0.093s
sys    0m0.013s

Question 3

这是纯 Bash（即没有外部命令），这是我能想到的最巧合的方法。

但性能方面确实没有比你现在拥有的更好多少。

它将从中读取每一行path/to/file；对于每一行，它将第一个字段存储在中，$uuid并打印一条消息，如果匹配该模式的文件path/to/directory/$uuid*是不是成立：

#! /bin/bash
[ -z "$2" ] && printf 'Not enough arguments.\n' && exit

while read uuid; do
    [ ! -f "$2/$uuid"* ] && printf '%s missing in %s\n' "$uuid" "$2"
done <"$1"

用来调用它path/to/script path/to/file path/to/directory。

在包含问题中的示例文件的测试目录层次结构上使用问题中的示例输入文件进行示例输出：

% tree
.
├── path
│   └── to
│       ├── directory
│       │   └── d6f60016-0011-49c4-8fca-e2b3496ad5a7_20160204_023-ERROR
│       └── file
└── script.sh

3 directories, 3 files
% ./script.sh path/to/file path/to/directory
d5873483-5b98-4895-ab09-9891d80a13da* missing in path/to/directory
be0ed6a6-e73a-4f33-b755-47226ff22401* missing in path/to/directory

Answer

这是纯 Bash（即没有外部命令），这是我能想到的最巧合的方法。

但性能方面确实没有比你现在拥有的更好多少。

它将从中读取每一行path/to/file；对于每一行，它将第一个字段存储在中，$uuid并打印一条消息，如果匹配该模式的文件path/to/directory/$uuid*是不是成立：

#! /bin/bash
[ -z "$2" ] && printf 'Not enough arguments.\n' && exit

while read uuid; do
    [ ! -f "$2/$uuid"* ] && printf '%s missing in %s\n' "$uuid" "$2"
done <"$1"

用来调用它path/to/script path/to/file path/to/directory。

在包含问题中的示例文件的测试目录层次结构上使用问题中的示例输入文件进行示例输出：

% tree
.
├── path
│   └── to
│       ├── directory
│       │   └── d6f60016-0011-49c4-8fca-e2b3496ad5a7_20160204_023-ERROR
│       └── file
└── script.sh

3 directories, 3 files
% ./script.sh path/to/file path/to/directory
d5873483-5b98-4895-ab09-9891d80a13da* missing in path/to/directory
be0ed6a6-e73a-4f33-b755-47226ff22401* missing in path/to/directory

Question 4

我的方法是从文件中获取 uuid，然后使用find

awk '{print $1}' listfile.txt  | while read fileName;do find /etc -name "$fileName*" -printf "%p FOUND\n" 2> /dev/null;done

为了可读性，

awk '{print $1}' listfile.txt  | \
    while read fileName;do \
    find /etc -name "$fileName*" -printf "%p FOUND\n" 2> /dev/null;
    done

使用文件列表的示例/etc/，查找 passwd、group、fstab 和 THISDOESNTEXIST 文件名。

$ awk '{print $1}' listfile.txt  | while read fileName;do find /etc -name "$fileName*" -printf "%p FOUND\n" 2> /dev/null; done
/etc/pam.d/passwd FOUND
/etc/cron.daily/passwd FOUND
/etc/passwd FOUND
/etc/group FOUND
/etc/iproute2/group FOUND
/etc/fstab FOUND

由于您提到目录是平面的，因此您可以使用-printf "%f\n"仅打印文件名本身的选项

它不会列出丢失的文件。find它的一个小缺点是，它不会告诉您它是否找不到文件，只有当它匹配某些文件时才会告诉您。但是，我们可以做的是检查输出 - 如果输出为空，则表示我们缺少一个文件

awk '{print $1}' listfile.txt  | while read fileName;do RESULT="$(find /etc -name "$fileName*" -printf "%p\n" 2> /dev/null )"; [ -z "$RESULT"  ] && echo "$fileName not found" || echo "$fileName found"  ;done

更具可读性：

awk '{print $1}' listfile.txt  | \
   while read fileName;do \
   RESULT="$(find /etc -name "$fileName*" -printf "%p\n" 2> /dev/null )"; \
   [ -z "$RESULT"  ] && echo "$fileName not found" || \
   echo "$fileName found"  
   done

以下是其作为小脚本的执行情况：

skolodya@ubuntu:$ ./listfiles.sh                                               
passwd found
group found
fstab found
THISDONTEXIST not found

skolodya@ubuntu:$ cat listfiles.sh                                             
#!/bin/bash
awk '{print $1}' listfile.txt  | \
   while read fileName;do \
   RESULT="$(find /etc -name "$fileName*" -printf "%p\n" 2> /dev/null )"; \
   [ -z "$RESULT"  ] && echo "$fileName not found" || \
   echo "$fileName found"  
   done

由于它是一个平面目录，因此可以将其用作stat替代方案，但是如果您决定添加子目录，则下面的代码将无法递归地起作用：

$ awk '{print $1}' listfile.txt  | while read fileName;do  stat /etc/"$fileName"* 1> /dev/null ;done        
stat: cannot stat ‘/etc/THISDONTEXIST*’: No such file or directory

如果我们采纳这个stat想法并付诸实施，我们可以使用 stat 的退出代码来指示文件是否存在。实际上，我们想这样做：

$ awk '{print $1}' listfile.txt  | while read fileName;do  if stat /etc/"$fileName"* &> /dev/null;then echo "$fileName found"; else echo "$fileName NOT found"; fi ;done

示例运行：

skolodya@ubuntu:$ awk '{print $1}' listfile.txt  | \                                                         
> while read FILE; do                                                                                        
> if stat /etc/"$FILE" &> /dev/null  ;then                                                                   
> echo "$FILE found"                                                                                         
> else echo "$FILE NOT found"                                                                                
> fi                                                                                                         
> done
passwd found
group found
fstab found
THISDONTEXIST NOT found

Answer