文件系统不断损坏

文件系统不断损坏

前言:文件系统设置:Ubuntu 使用两个分区:

  • /在我的 SSD 上(分区/dev/sda5
  • /home//dev/sdb2在我的硬盘(分区)上

问题(按时间顺序)

  1. 只读/home//home/:像往常一样使用我的 Ubuntu,我遇到了我的(在我的硬盘上)变成只读的问题 。
  2. 救援模式:重新启动后,Ubuntu直接进入救援模式。
  3. 文件系统检查:从实时 USB,我修复了家庭正在使用的 HDD 分区fsck.ext4。SSD 分区上没有什么可修复的。
  4. GUI,然后是只读:Ubuntu 再次在 GUI 上启动,但几分钟后,它又/home/变为“只读”。
  5. 循环 2-4:我已经重复了步骤 2-4 几次。每次fsck都能修复我的分区,我可以用 GUI 运行 Ubuntu,但经过一段随机的时间后,同样的问题又出现了。即使我什么都不做,只是每隔几分钟/home/尝试一个文件,也会发生这种情况。touch

可能的原因

  • 硬重启(我在问题发生前 3 小时就这么做了)
  • 软件包更新(我在问题发生前几十分钟执行了此操作)

软件包更新如下:

Start-Date: 2021-04-21  16:18:16
Commandline: apt-get dist-upgrade
Requested-By: xavier (1000)
Upgrade: libseccomp2:amd64 (2.4.3-1ubuntu3.18.04.3, 2.5.1-1ubuntu1~18.04.1), ruby2.5:amd64 (2.5.1-1ubuntu1.8, 2.5.1-1ubuntu1.9), libsystemd0:amd64 (237-3ubuntu10.45, 237-3ubuntu10.46), libsystemd0:i386 (237-3ubuntu10.45, 237-3ubuntu10.46), google-chrome-stable:amd64 (89.0.4389.128-1, 90.0.4430.85-1), skypeforlinux:amd64 (8.69.0.77, 8.71.0.36), udev:amd64 (237-3ubuntu10.45, 237-3ubuntu10.46), libudev1:amd64 (237-3ubuntu10.45, 237-3ubuntu10.46), libudev1:i386 (237-3ubuntu10.45, 237-3ubuntu10.46), libruby2.5:amd64 (2.5.1-1ubuntu1.8, 2.5.1-1ubuntu1.9), libcaca0:amd64 (0.99.beta19-2ubuntu0.18.04.1, 0.99.beta19-2ubuntu0.18.04.2), chromium-browser:amd64 (89.0.4389.90-0ubuntu0.18.04.2, 90.0.4430.72-0ubuntu0.18.04.1), systemd-sysv:amd64 (237-3ubuntu10.45, 237-3ubuntu10.46), chromium-codecs-ffmpeg-extra:amd64 (89.0.4389.90-0ubuntu0.18.04.2, 90.0.4430.72-0ubuntu0.18.04.1), zotero:amd64 (5.0.96, 5.0.96.2-1), libpam-systemd:amd64 (237-3ubuntu10.45, 237-3ubuntu10.46), systemd:amd64 (237-3ubuntu10.45, 237-3ubuntu10.46), libnss-systemd:amd64 (237-3ubuntu10.45, 237-3ubuntu10.46), chromium-browser-l10n:amd64 (89.0.4389.90-0ubuntu0.18.04.2, 90.0.4430.72-0ubuntu0.18.04.1)
End-Date: 2021-04-21  16:19:14

其中没有更新 linux-headers。我做了内核更新(在此登录),但那是三天前的事了,之后我一直在频繁使用我的笔记本电脑,没有遇到任何问题。

不太可能的原因

  • 硬件问题(?):同一块硬盘上有一个 NTFS 分区,而它的 Linux 分区不断损坏。我在 Windows 上使用这个 NTFS 分区来存储一些大文件。但是,我使用 Windows 的工具集检查了这个分区,没有检测到任何问题。这可能无法完全排除硬件问题,但我感觉我的硬盘并没有损坏。

我猜:我确实觉得我更新的某个软件包有问题。但是,我无法说出它们大多数的用途,因此我没有任何可以怀疑的方面。此外,我可能对更新的判断有误,但我不知道如何进一步诊断问题。如果有人知道该问题或如何进一步检查发生了什么,那将非常有帮助。

附加日志

  • 日志控制当 ubuntu 以救援模式启动时。
  • 消息当 ubuntu 以救援模式启动时。
  • 文件系统检查/dev/sdb2从实时 USB ubuntu修复(其中 /home/ 是)时输出此内容。

通用设置

  • 电脑:华硕 ROG 笔记本电脑
  • Linux:Ubuntu 18.04.5 LTS,Mate 版本 1.20.1,内核 4.15.0-142-generic x86-64

更新(2021-04-23)

根据 @guiverc 的评论,我使用实时 USB 驱动器运行了长时间的磁盘自检smartctl。以下是完整的 smartctl 日志。

据我所知(主要基于包创建者的解释),总体结论是我应该担心我的硬盘。

从积极的一面来看,SMART overall-health self-assessment test result: PASSED这是令人鼓舞的,因为 SMART 属性的值都远高于阈值。

不太乐观的一面是,在(我认为)运行时间内或在此之前出现了 60 个错误smartctl。如此高的数字似乎令人担忧。此外,自检日志显示状态Completed: read failure,这也让人感到担忧。

fsck请注意,在运行之前我还没有运行修复smartctl,但不知道这是否重要。

根据我的理解,我觉得阅读此日志需要谨慎,备份我的硬盘(大部分已经备份)并用新的硬盘替换它。

smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-42-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Toshiba 2.5" HDD MQ01ABD...
Device Model:     TOSHIBA MQ01ABD100
Serial Number:    17POPDQKT
LU WWN Device Id: 5 000039 782d0abaf
Firmware Version: AX0R5J
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      2.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Apr 22 11:53:17 2021 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      ( 112) The previous self-test completed having
                    the read element of the test failed.
Total time to complete Offline 
data collection:        (  120) seconds.
Offline data collection
capabilities:            (0x5b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:    (   2) minutes.
Extended self-test routine
recommended polling time:    ( 236) minutes.
SCT capabilities:          (0x003d) SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   050    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   100   100   050    Pre-fail  Offline      -       0
  3 Spin_Up_Time            0x0027   100   100   001    Pre-fail  Always       -       1660
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       3639
  5 Reallocated_Sector_Ct   0x0033   100   100   050    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   050    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   100   100   050    Pre-fail  Offline      -       0
  9 Power_On_Hours          0x0032   084   084   000    Old_age   Always       -       6487
 10 Spin_Retry_Count        0x0033   172   100   030    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       2859
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       630
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       85
193 Load_Cycle_Count        0x0032   098   098   000    Old_age   Always       -       27933
194 Temperature_Celsius     0x0022   100   100   000    Old_age   Always       -       32 (Min/Max 13/51)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       8
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       1
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
220 Disk_Shift              0x0002   100   100   000    Old_age   Always       -       0
222 Loaded_Hours            0x0032   088   088   000    Old_age   Always       -       5055
223 Load_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
224 Load_Friction           0x0022   100   100   000    Old_age   Always       -       0
226 Load-in_Time            0x0026   100   100   000    Old_age   Always       -       273
240 Head_Flying_Hours       0x0001   100   100   001    Pre-fail  Offline      -       0

SMART Error Log Version: 1
ATA Error Count: 60 (device log contains only the most recent five errors)
    CR = Command Register [HEX]
    FR = Features Register [HEX]
    SC = Sector Count Register [HEX]
    SN = Sector Number Register [HEX]
    CL = Cylinder Low Register [HEX]
    CH = Cylinder High Register [HEX]
    DH = Device/Head Register [HEX]
    DC = Device Command Register [HEX]
    ER = Error register [HEX]
    ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 60 occurred at disk power-on lifetime: 6480 hours (270 days + 0 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 41 10 78 dd dd 40  Error: UNC at LBA = 0x00dddd78 = 14540152

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 08 10 78 dd dd 40 00      00:29:14.662  READ FPDMA QUEUED
  ef 10 02 00 00 00 a0 00      00:29:14.661  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 00      00:29:14.661  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 00      00:29:14.660  IDENTIFY DEVICE
  ef 03 45 00 00 00 a0 00      00:29:14.660  SET FEATURES [Set transfer mode]

Error 59 occurred at disk power-on lifetime: 6480 hours (270 days + 0 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 41 d0 78 dd dd 40  Error: UNC at LBA = 0x00dddd78 = 14540152

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 08 d0 78 dd dd 40 00      00:29:10.778  READ FPDMA QUEUED
  60 08 c8 70 dd dd 40 00      00:29:10.777  READ FPDMA QUEUED
  60 08 c0 68 dd dd 40 00      00:29:10.765  READ FPDMA QUEUED
  60 08 b8 60 dd dd 40 00      00:29:10.764  READ FPDMA QUEUED
  60 08 b0 58 dd dd 40 00      00:29:10.764  READ FPDMA QUEUED

Error 58 occurred at disk power-on lifetime: 6480 hours (270 days + 0 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 41 40 78 dd dd 40  Error: UNC at LBA = 0x00dddd78 = 14540152

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 b8 48 00 e6 dd 40 00      00:29:10.543  READ FPDMA QUEUED
  60 00 40 00 dc dd 40 00      00:29:06.765  READ FPDMA QUEUED
  60 00 38 00 d2 dd 40 00      00:29:06.754  READ FPDMA QUEUED
  60 00 30 00 c8 dd 40 00      00:29:06.728  READ FPDMA QUEUED
  60 e8 28 a0 49 29 40 00      00:29:06.716  READ FPDMA QUEUED

Error 57 occurred at disk power-on lifetime: 6480 hours (270 days + 0 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 41 b8 78 dd dd 40  Error: UNC at LBA = 0x00dddd78 = 14540152

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 08 b8 78 dd dd 40 00      00:25:20.185  READ FPDMA QUEUED
  61 08 b0 a8 6b ad 40 00      00:25:20.184  WRITE FPDMA QUEUED
  ea 00 00 00 00 00 a0 00      00:25:20.184  FLUSH CACHE EXT
  ef 10 02 00 00 00 a0 00      00:25:20.184  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 00      00:25:20.184  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

Error 56 occurred at disk power-on lifetime: 6480 hours (270 days + 0 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 41 80 78 dd dd 40  Error: WP at LBA = 0x00dddd78 = 14540152

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 10 90 00 c8 27 40 00      00:25:19.960  WRITE FPDMA QUEUED
  61 00 88 a8 6a ad 40 00      00:25:19.960  WRITE FPDMA QUEUED
  60 08 80 78 dd dd 40 00      00:25:16.323  READ FPDMA QUEUED
  60 d8 78 38 bc 16 40 00      00:25:16.322  READ FPDMA QUEUED
  60 00 70 38 b2 16 40 00      00:25:16.310  READ FPDMA QUEUED

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       00%      6485         450747768

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

根据@paladin 的要求,这是我的/dev/fstab

  GNU nano 4.8                                                                                     fstab.txt                                                                                               
# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).
#
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
# / was on /dev/sda5 during installation
UUID=09c3311b-f37d-48ca-b0bf-1001574bf539 /               ext4    errors=remount-ro 0       1
# /boot/efi was on /dev/sda1 during installation
UUID=6896-40E1  /boot/efi       vfat    umask=0077      0       1
/swapfile                                 none            swap    sw              0       0
UUID=40d7f02e-01ff-4c43-80d9-4fcd8b0139a5   /home    ext4          nodev,nosuid       0       2

相关内容