没有磁盘故障的输入/输出错误

没有磁盘故障的输入/输出错误

1 个月前,我在我的 ubuntu 服务器上运行了一个 Python 程序,它下载了视频并将其剪切成图像。我使用 urlretrieve 下载文件,然后使用 opencv 将它们剪切成单帧:

def reporthook(blocknum, blocksize, totalsize):
    readsofar = blocknum * blocksize
    if totalsize > 0:
        percent = readsofar * 1e2 / totalsize
        s = "\r%5.1f%% %*d / %d" % (
            percent, len(str(totalsize)), readsofar, totalsize)
        sys.stderr.write(s)
        if readsofar >= totalsize: # near the end
            sys.stderr.write("\n")
    else: # total size is unknown
        sys.stderr.write("read %d\n" % (readsofar,))


def vid_cut(base_path, video, video_id):
    print("\nstart cutting: " + str(video))
    path = base_path + "/" + video_id
    name_video = os.listdir(path)[0]
    video_path = os.path.join(path, name_video)
    # Path to video file
    vidcap = cv2.VideoCapture(video_path)
    fps = vidcap.get(cv2.CAP_PROP_FPS)
    frame_count = int(vidcap.get(cv2.CAP_PROP_FRAME_COUNT))
    if fps != 0:
        duration = frame_count/fps
        picture_folder = path + "/pictures"
        os.mkdir(picture_folder)
        # Used as counter variable
        count = 0

        # checks whether frames were extracted
        success = True
        while success:
            # vidObj object calls read
            # function extract frames
            vidcap.set(cv2.CAP_PROP_POS_MSEC, (count * 500))
            success, image = vidcap.read()

            # Saves the frames with frame-count
            if success:
                cv2.imwrite(picture_folder + "/frame%d.jpg" % count, image)

                count += 1
            if (count/2 > duration):
                success = False
        vidcap.release()
        os.remove(video_path)
        original = path
        target = base_path + "/fertig/" + video_id
        shutil.move(original, target)
        print("\nvideo done: " + str(video))


# Driver Code

if __name__ == '__main__':
    base_path = "/media/hdd/videos" #Wohin gedownloadet werden soll

    all_videos = pickle.load(open("videos.pkl", "rb"))

    for video in all_videos:
        video_id = str(video[0])
        download_link = video[1]
        print("starting: ", video)
        video_slug = video[1]
        path_video = base_path + "downloads" + "/" + video_id
        os.mkdir(path_video)
        urlretrieve(download_link, path_video + "/a.mp4", reporthook)
        print("\nDownloading abgeschlossen: ", video)

        vid_cut(base_path + "downloads", video, video_id)

突然,程序退出,因为模块操作系统遇到输入/输出错误。我也无法使用 ls 访问文件系统。在线搜索后,最可能的问题似乎是驱动器故障。我 2 个月前才买了这个驱动器,尽管如此,我还是使用 -c -c 运行了 e2fsck。没有发现任何错误。我重新格式化了驱动器并再次启动了程序。现在我再次收到输入/输出错误。有什么建议吗?

编辑: stat -f . 返回

File: "."
    ID: e94de2b03a229ef9 Namelen: 255     Type: ext2/ext3
Block size: 4096       Fundamental block size: 4096
Blocks: Total: 1211023539 Free: 622285068  Available: 561234266
Inodes: Total: 152616960  Free: 145572533

我猜问题不在于 inode

编辑 2:我收到了 Errno 5,我通过屏幕从命令行运行了 python 程序。我使用 ctrl a + d 分离屏幕并关闭与服务器的 ssh 连接。这个问题发生在连续运行程序几天后。

编辑 3:我不知道如何读取内核/系统日志。但是安装的 CPU 是 Ryzen 7900x。我明确想要监控温度,因此我将默认 Linux 内核升级到 6.6.1。我不知道这是否会导致任何问题。但是可以确认 CPU 温度在 5 分钟内从未超过 80 摄氏度。

硬盘的 Smartctl 输出:

smartctl 7.2 2020-12-30 r5155 [x86_64-linux-6.6.1Sensors_Kernel-sensorskernel] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD50NMZW-59BCBS0
Serial Number:    ***
LU WWN Device Id: 5 0014ee 26a11913b
Firmware Version: 01.01A01
User Capacity:    5,000,947,523,584 bytes [5.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    4800 rpm
Form Factor:      2.5 inches
TRIM Command:     Available, deterministic
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Dec 25 18:40:18 2023 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                    without error or no self-test has ever 
                    been run.
Total time to complete Offline 
data collection:        ( 3840) seconds.
Offline data collection
capabilities:            (0x1b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    No Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:    (   2) minutes.
Extended self-test routine
recommended polling time:    ( 242) minutes.
SCT capabilities:          (0x30b5) SCT Status supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   253   253   021    Pre-fail  Always       -       4175
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       58
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       423
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       21
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       2
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       179
194 Temperature_Celsius     0x0022   119   109   000    Old_age   Always       -       33
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

Selective Self-tests/Logging not supported

``

答案1

检查文件系统状态,包括可用的 inode:

$ stat -f . # 其中.代表‘当前目录’。

更多信息:man stat

有关 inode 的更多信息;参考:
https://www.a2hosting.com/kb/developer-corner/linux/inode-count/
https://www.stackscale.com/blog/inodes-linux/

相关内容