如果匹配的文件数量大于10，如何将与某个名称匹配的所有文件移动到新文件夹？

Question 1

检查它是否有效，我将添加解释，说明它是如何工作的。我在中测试了它dash。

笔记：文件名不应包含空格、换行符。

#!/bin/dash

limit=1

printf "%s\n" * |
sed 's/[-0-9]*\..*$//' |
uniq -c |
awk -v lim=${limit} '$1 >= lim {print $2}' |
sort -r |
while read -r i; do
    for j in "${i}"*; do
        [ -f "$j" ] || continue

        dir=${i}.${j#*.}

        [ -d "$dir" ] || mkdir "$dir"
        mv -v "$j" "$dir"
    done
done

这里有一个问题 - 当文件名等于未来的目录名时的情况，例如aaa.txt.在这种aaa.txt情况下，文件名没有任何额外的字符，因此不会从中删除任何内容，因此，新的目录名称将是相同的，这会导致错误：

mkdir: cannot create directory ‘aaa.txt’: File exists
mv: 'aaa.txt' and 'aaa.txt' are the same file

此问题的一种解决方法是检查假定的目录名是否等于文件名，然后在未来的目录名中添加一些数字，例如aaa1.txt.

示范

在脚本执行之前。

$ tree
.
├── aaa.txt
├── temp-098723.log
├── temp-123197.log
├── temp-203981.log
├── temp-734692.log
├── temp-new-file123.log
├── temp-new-file-2323-12.log
├── temp-new-file-342.log
├── test1.sh
├── test2.sh
└── test3.sh

0 directories, 11 files

脚本执行后： script.sh

$ tree
.
├── aaa.txt
├── temp.log
│   ├── temp-098723.log
│   ├── temp-123197.log
│   ├── temp-203981.log
│   └── temp-734692.log
├── temp-new-file.log
│   ├── temp-new-file123.log
│   ├── temp-new-file-2323-12.log
│   └── temp-new-file-342.log
└── test.sh
    ├── test1.sh
    ├── test2.sh
    └── test3.sh

3 directories, 11 files

Answer

检查它是否有效，我将添加解释，说明它是如何工作的。我在中测试了它dash。

笔记：文件名不应包含空格、换行符。

#!/bin/dash

limit=1

printf "%s\n" * |
sed 's/[-0-9]*\..*$//' |
uniq -c |
awk -v lim=${limit} '$1 >= lim {print $2}' |
sort -r |
while read -r i; do
    for j in "${i}"*; do
        [ -f "$j" ] || continue

        dir=${i}.${j#*.}

        [ -d "$dir" ] || mkdir "$dir"
        mv -v "$j" "$dir"
    done
done

这里有一个问题 - 当文件名等于未来的目录名时的情况，例如aaa.txt.在这种aaa.txt情况下，文件名没有任何额外的字符，因此不会从中删除任何内容，因此，新的目录名称将是相同的，这会导致错误：

mkdir: cannot create directory ‘aaa.txt’: File exists
mv: 'aaa.txt' and 'aaa.txt' are the same file

此问题的一种解决方法是检查假定的目录名是否等于文件名，然后在未来的目录名中添加一些数字，例如aaa1.txt.

示范

在脚本执行之前。

$ tree
.
├── aaa.txt
├── temp-098723.log
├── temp-123197.log
├── temp-203981.log
├── temp-734692.log
├── temp-new-file123.log
├── temp-new-file-2323-12.log
├── temp-new-file-342.log
├── test1.sh
├── test2.sh
└── test3.sh

0 directories, 11 files

脚本执行后： script.sh

$ tree
.
├── aaa.txt
├── temp.log
│   ├── temp-098723.log
│   ├── temp-123197.log
│   ├── temp-203981.log
│   └── temp-734692.log
├── temp-new-file.log
│   ├── temp-new-file123.log
│   ├── temp-new-file-2323-12.log
│   └── temp-new-file-342.log
└── test.sh
    ├── test1.sh
    ├── test2.sh
    └── test3.sh

3 directories, 11 files

Question 2

我可能误解了你在这里问的问题，但正如我所说，我认为这个问题有一些微妙之处，需要一个相对复杂的解决方案，即我不知道一个脚本可以有多简单来完成什么任务你要。例如，让我们仔细看看您的示例文件列表：

AAA.txt
temp-203981.log
temp-098723.log
temp-123197.log
temp-734692.log
测试1.sh
测试2.sh
测试3.sh

根据您的问题，您希望从此列表中提取的前缀为temp和test，其中aaa被排除，因为只有一个文件作为aaa前缀，并且您的示例阈值是三。但为什么没有te前缀，因为有 7 个以开头的文件te？或者，既然您似乎想首先根据文件名后缀对文件进行分组，为什么新的子目录之一不是t.log或temp-.log而是temp.log？我希望这个讨论清楚地表明，如果您确实希望您的程序自行确定潜在的前缀而不将前缀列表作为参数，那么您的问题陈述中存在一些歧义需要解决（以及一些相应的选择）需要制作）。

这是一个 Python 脚本，它使用了一个简单的特里树用于搜索满足一些约束的最长匹配前缀的数据结构（可以作为参数提供）：

#!/usr/bin/env python2
# -*- coding: ascii -*-
"""
trieganize.py

Use the trie data structure to look for prefixes of filenames in a given
directory and then reorganiz those files into subdirectories based on
those prefixes.

In this script the trie data structure is just a dictionary of the
following form:

    trie = {
        "count":    integer,
        "children": dictionary,
        "leaf":     boolean
    }

Where the dictionary keys have the following semantics.

count:
    stores the number of total descendents of the given trie node

children:
    stores the child trie nodes of the given node

leaf:
    denotes whether this trie corresponds to the final character in a word
"""

import sys
import os
import string

def add_word_to_trie(trie, word):
    """Add a new word to the trie."""
    if word:
        trie["count"] += 1
        if word[0] not in trie["children"]:
            trie["children"][word[0]] = \
                {"count": 0, "children": {}, "leaf": False}
        add_word_to_trie(trie=trie["children"][word[0]], word=word[1:])
    else:
        trie["leaf"] = True
    return(trie)

def expand_trie(trie, prefix='', words=None):
    """Given a trie, return the list of words it encodes."""
    if words is None:
        words = list()
    if trie["leaf"]:
        words.append(prefix)
    for character, child in trie["children"].iteritems():
        if trie["children"]:
            expand_trie(trie=child, prefix=prefix+character, words=words)
    return(words)

def extract_groups_from_trie(
    trie, threshold=0, prefix='', groups=None,
    minimum_prefix_length=0,
    maximum_prefix_length=float("inf"),
    prefix_charset=string.ascii_letters,
):
    """Given a trie and some prefix constraints, return a dictionary which
    groups together the words in the trie based on shared prefixes which
    satisfy the specified constraints.
    """
    if groups is None:
        groups = dict()
    if trie["count"] >= threshold:
        children = {
            character: child
            for character, child in trie["children"].iteritems()
            if (
                child["count"] >= threshold and
                len(prefix) + 1 >= minimum_prefix_length and
                len(prefix) + 1 <= maximum_prefix_length and
                character in prefix_charset
            )
        }
        if not children:
            groups[prefix] = expand_trie(trie, prefix)
        else:
            for character, child in children.iteritems():
                extract_groups_from_trie(
                    trie=child, threshold=threshold,
                    prefix=prefix+character, groups=groups
                )
    return(groups)

def reorganize_files(basedir, suffix_separator='.', threshold=3):
    """Takes a path to a directory and reorganizes the files in that
    directory into subdirectories based on the prefixes of their
    filenames."""

    # Get the list of file names
    filenames = os.listdir(basedir)

    # Group the filenames by suffix
    suffixes = {}
    for filename in filenames:
        basename, separator, suffix = filename.rpartition(suffix_separator)
        if suffix not in suffixes:
            suffixes[suffix] = []
        suffixes[suffix].append(basename)

    # For each suffix, search for prefixes
    for suffix, basenames in suffixes.iteritems():

        # Initialize a trie object
        trie = {"count":0, "children": {}, "leaf": False}

        # Add the filenames to the trie
        for basename in basenames:
            add_word_to_trie(trie, basename)

        # Break the filenames up into groups based on their prefixes
        groups = extract_groups_from_trie(trie, threshold)

        # Organize the groups of files into subdirectories
        for prefix, group in groups.iteritems():
            targetdir = os.path.join(basedir, prefix + suffix_separator + suffix)
            os.mkdir(targetdir)
            for basename in group:
                filename = basename + suffix_separator + suffix
                sourcefile = os.path.join(basedir, filename) 
                targetfile = os.path.join(targetdir, filename)
                os.rename(sourcefile, targetfile)

if __name__=="__main__":
    reorganize_files(basedir=sys.argv[1])

为了演示这个 Python 脚本，我编写了一个小 shell 脚本来创建和填充测试目录：

#!/usr/bin/bash

# create-test-dir.sh

rm -rf /tmp/testdir
mkdir -p /tmp/testdir

files=(
aaa.txt
temp-203981.log
temp-098723.log
temp-123197.log
temp-734692.log
test1.sh
test2.sh
test3.sh
)

for file in ${files[@]}; do touch "/tmp/testdir/${file}"; done

我们可以运行脚本：

bash create-test-dir.sh

之后，我们的测试目录如下所示（运行tree /tmp/testdir）：

/tmp/测试目录/
|-- aaa.txt
|-- temp-098723.log
|-- temp-123197.log
|-- temp-203981.log
|-- temp-734692.log
|-- test1.sh
|-- test2.sh
`--test3.sh

0个目录，8个文件

现在我们可以运行Python脚本：

python trieganize.py /tmp/testdir

之后文件组织如下：

/tmp/测试目录/
|-- aaa.txt
|-- 温度日志
| |-- temp-098723.log
| |-- temp-123197.log
| |-- temp-203981.log
| `--temp-734692.log
`--测试.sh
    |-- test1.sh
    |-- test2.sh
    `--test3.sh

2个目录，8个文件

Answer