如何在shell中管理大量文件？

2024-5-16 • tag-icon

$ ls ./dir_with_huge_amount_of_files/errors/

假设一个目录充满了带有 unix 时间戳的图片，我的意思是很多以许多 GB 甚至更多来衡量。像这样的 Shell 命令ls会收到溢出式警告，因为它们并非设计用于处理数百万（或更多）张图片。如何管理如此大量的文件？例如，如果我想找到中间的图片（根据名称中的时间戳和创建时间），是否有某种文件系统提供内置搜索功能？您会使用哪些命令？我尝试了舒适ls且find带有必要标志的方法，但它们要么非常慢，要么生成警告，所以我认为我需要更好的文件系统或数据库或类似的东西来预先索引图片。我基本上需要一个数组，照片的索引节点应按时间顺序放置在其中。怎么做？稍后，可以添加带有 unix 时间戳的元数据。

[更新]

目前的答案存在严重缺陷，人们只是发布排序答案而没有进行实证检验。如果他们测试了他们的建议，他们可能会失败。因此，我为您创建了一个命令行工具，您可以通过它创建沙箱来创建大量文件并测试您的建议，例如使用 1e7 数量的文件。生成文件可能需要很长时间，因此请耐心等待。如果有人知道更快的方法来做到这一点，请编辑代码。键入python code.py --help以获取帮助。玩得开心！

创建大量 dirred 文件的用法示例

$ ls ./data2
ls: ./data2: No such file or directory
$ python testFill.py -n 3 -d 7                                                 
$ tree data2/                                                                  
data2/
|-- 0
|   |-- 1302407302636973
|   |-- 1302407302638022
|   `-- 1302407302638829
|-- 1
|   |-- 1302407302639604
|   |-- 1302407302641652
|   `-- 1302407302642399
|-- 2
|   |-- 1302407302643158
|   |-- 1302407302645223
|   `-- 1302407302646026
|-- 3
|   |-- 1302407302646837
|   |-- 1302407302649110
|   `-- 1302407302649944
|-- 4
|   |-- 1302407302650771
|   |-- 1302407302652921
|   `-- 1302407302653685
|-- 5
|   |-- 1302407302654423
|   |-- 1302407302656352
|   `-- 1302407302656992
`-- 6
    |-- 1302407302657652
    |-- 1302407302659543
    `-- 1302407302660156

7 directories, 21 files

代码 testFill.py

# Author: hhh
# License: ISC license

import os, math, time, optparse, sys

def createHugeAmountOfFiles(fileAmount, dirAmount):
   counter = 0
   DENSITY = 1e7
   dir = "./data/"

   do = dir+str(counter)+"/"
   while (os.path.exists(do)):
      counter = counter+1
      do = dir+str(counter)+"/"

   os.mkdir(do)

   for d in range(int(dirAmount)):
      for f in range(int(fileAmount)):
         timeIt = int(time.time()*1e6)
         if (not os.path.exists(do)):
            os.mkdir(do)

         if (timeIt % DENSITY == 0):
            counter = counter+1
            do = dir+str(counter)+"/"

            if (not os.path.exists(do)):
               os.mkdir(do)


         do = dir+str(counter)+"/"
         if(not os.path.exists(do)):
            os.mkdir(do)

         f = open(do+str(timeIt), 'w')
         f.write("Automatically created file to test Huge amount of files.")
         f.close()
      counter = counter +1


def ls(dir):
   for root, dirs, files in os.walk("./data/"+dir):
      print(files)

def rm(dir):
   for root, dirs, files in os.walk("./data/"+dir):
      for f in files:
         os.remove("./data/"+dir+"/"+f)


def parseCli():
   parser = optparse.OptionParser()
   parser.add_option("-f", "--file", dest="filename",
                     help="Location to remove files only in ./Data.", metavar="FILE")
   parser.add_option("-n", "--number", dest="number",
                     help="Number of files to generate", metavar="NUMBER")
   parser.add_option("-r", "--remove", dest="remove",
                     help="Data -dir content to remove", metavar="NUMBER")
   parser.add_option("-d", "--dir", dest="dir",
                     help="Amount of dirs to generate", metavar="NUMBER")
   parser.add_option("-q", "--quiet",
                     action="store_false", dest="verbose", default=True,
                     help="don't print status messages to stdout")

   return parser.parse_args()

def main():
   (options, args) = parseCli()

   if (options.filename):
      ls(options.filename)
   if (options.number and options.dir):
      createHugeAmountOfFiles(options.number, options.dir)
   if (options.remove):
      rm(options.remove)


main()

答案1

尝试不同的外壳。我建议尝试桀骜例如，看看它是否允许更多参数。

如果我理解正确的话，文件名的一部分是 UNIX 时间戳。建议将文件划分到文件夹中。如果日期/时间格式是 UNIX 纪元数字，则将该数字的小数部分（例如 10000）放入单独的文件夹中。

如果ISO 8601时间戳是文件名的一部分，只需除以年、月或日即可。

答案2

locate（当然）会对updatedb您有帮助吗？

答案1

答案2

相关内容