Python 找不到 os.walk() 中显示的文件

Python 找不到 os.walk() 中显示的文件

简而言之,我的问题是,当我运行时os.walk(),我得到了一个准确的文件列表,但是当我尝试获取有关这些文件的信息(例如它们的上次修改日期、文件大小,甚至只是尝试使用open()它们)时,我收到一条错误消息,指出只有某些文件无法找到该文件。大约 0.2% 的原因尚不清楚。

背景

在工作中,我们有一台运行 Windows Server 2012 R2 的服务器(我知道,我知道……)。我们希望自动将目标共享文件夹移动到 Google Drive 中的特定共享驱动器。

我要做的第一件事是获取文件列表及其最后修改日期和文件大小,以供稍后使用。我编写的代码在运行 Windows 11 的笔记本电脑上运行良好,但当我尝试将其指向服务器上的几个不同的共享文件夹时,它反复遇到同样的问题。

故障排除

我不认为这是一个代码问题,并且已经多次修改了我的代码以使其更简单,但最终结果还是相同的 - 它可以本地工作但无法完全遍历共享文件夹。

我的第一个想法是这可能是由于路径名太长(旧系统上 255 个字符的限制)但它成功找到了路径长度 > 300 个字符的文件。

我的下一个想法是,也许存在一种明显的模式,即无法找到哪些类型的文件,但在给定的文件夹中,它可以成功找到大多数 PDF,但无法找到其他一个或几个。这只是一个观察到的例子,并不特定于 PDF。

我大概花了总共6-8个小时来尝试排除故障并调查此问题,但目前我还是很困惑。

代码

do_test.py - 使用 hurry.filesize 包获取大致文件大小

import os
import datetime
from hurry.filesize import size
from pprint import pprint

# Test directory
src = "//[DC]/PATH/TO/FOLDER"

def simple_file_check(src_dir):
    total_bytes = 0
    total_files = 0
    total_folders = 0
    total_not_found = 0
    files_not_found = []

    for (root, dirs, files) in os.walk(src_dir):
        # just count files and folders for now
        total_files += len(files)
        total_folders += len(dirs)
        # Get full-path file names
        fnames = [os.path.join(root, f).replace("\\","/") for f in files]

        # Get their sizes and sum it up
        fsizes = []
        for f in fnames:
            try:
                fsizes.append(os.stat(f).st_size)
            except Exception as e:
                files_not_found.append(f)
        total_bytes += sum(fsizes)

    total_size = size(total_bytes)
    total_not_found += len(files_not_found)
    pct_missing = total_not_found/total_not_found+total_files*100

    data = {
        "ttl-size": total_size,
        "ttl-files": total_files,
        "ttl-folders": total_folders,
        "ttl-not-found": total_not_found,
        "pct-missing": "{}%".format(pct_missing)
    }
    pprint(data)

def time_it_pls(func, *arg):
    begin_dt = datetime.datetime.now()
    begin = str(begin_dt)[:19]
    print("beginning execution at: {}".format(begin))
    func(*arg)
    end_dt = datetime.datetime.now()
    end = str(end_dt)[:19]
    print("ending execution at: {}".format(end))
    print("time taken: {}".format(end_dt - begin_dt))

time_it_pls(simple_file_check, src)

结果

beginning execution at: 2023-06-21 14:50:06
{'pct-missing': '0.19806269922322284%',
 'ttl-files': 193878,
 'ttl-folders': 18150,
 'ttl-not-found': 384,
 'ttl-size': '210G'}
ending execution at: 2023-06-21 14:51:11
time taken: 0:01:05.302772

没有异常块的特定错误消息

Traceback (most recent call last):
  File "C:\it_scripts\do_test.py", line 53, in <module>   
    time_it_pls(simple_file_check, src)
  File "C:\it_scripts\do_test.py", line 47, in time_it_pls
    func(*arg)
  File "C:\it_scripts\do_test.py", line 25, in simple_file_check
    fsizes.append(os.stat(f).st_size)
                  ^^^^^^^^^^
FileNotFoundError: [WinError 3] The system cannot find the path specified: '//DC/PATH/TO/FILE'

- 编辑 -

open()当我尝试在解释器中仅针对单个文件使用时,我遇到了类似的错误。

>>> f = "//DC/PATH/TO/FILE" # actual path length is 267 characters long and copied from the exception in the previous example.
>>> d = open(f)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or directory: '//DC/PATH/TO/FILE'

--编辑 2--

我们越来越接近了!尝试在 PowerShell 中列出文件夹,我可以看到文件存在,但如果我尝试对单个文件运行 ls,我会收到错误。所以这是不是python 特有的,并暗示 Windows 方面存在一些奇怪的事情。

这是 PS 端输出和错误的删减版。请理解,由于这些文件的敏感性,确实需要进行一定程度的删减。

PS C:\Users\sani> ls "\\DC\#CONSOLIDATION of Checklists, Declarations, Examples, and Documents for [REDACTED], U, and I751 Applications\Application- U Visa\Closing Letters\
Rejections\"


    Directory: \\DC\#CONSOLIDATION of Checklists, Declarations, Examples, and Documents for [REDACTED], U, and I751 Applications\Application- U Visa\Closing
    Letters\Rejections


Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a----         5/31/2019   3:06 PM          13025 Samole closing Letter - No DV Simple assault.docx
-a----        11/21/2018   3:10 PM          16232 Sample Closing Letter-Not a qualifying crime (Sp).dotx
-a----         7/26/2018  11:32 AM          13581 Sample Closing Letter-RE PC does not qualify a indirect victim.dotx
-a----        11/21/2018   3:14 PM          12908 Sample Closing Letter-RE U Cert Request Denied.dotx
-a----          7/9/2018   7:25 PM          13500 Sample Closing Letter-Unqualifying crime.dotx
-a----         7/26/2018   6:19 PM          12769 Sample Closing Ltr w Copy of File (Sp), Over Income.dotx
-a----         7/26/2018   1:24 PM          16432 Sample Rejection Letter, unqualifying crime.dotx


PS C:\Users\sani> ls "\\DC\#CONSOLIDATION of Checklists, Declarations, Examples, and Documents for [REDACTED], U, and I751 Applications\Application- U Visa\Closing Letters\
Rejections\Sample Closing Letter-RE PC does not qualify a indirect victim.dotx"
ls : Cannot find path '\\DC\#CONSOLIDATION of Checklists, Declarations, Examples, and Documents for [REDACTED], U, and I751 Applications\Application- U Visa\Closing 
Letters\Rejections\Sample Closing Letter-RE PC does not qualify a indirect victim.dotx' because it does not exist.
At line:1 char:1
+ ls "\\DC\#CONSOLIDATION  ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (\\DC\...ect victim.dotx:String) [Get-ChildItem], ItemNotFoundException
    + FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.GetChildItemCommand

相关内容