SSD 上的根文件系统重复出现随机变为只读的问题

SSD 上的根文件系统重复出现随机变为只读的问题

我将系统 SSD 驱动器 1TB 替换为更大的 2TB,并使用 CloneZilla 实用程序克隆内容。然后在操作系统中,驱动器仍显示 1TB,但我能够将其扩展到 2TB。所有数据似乎都很好。

过了一段时间,文件系统变成只读。重启后fsck确实有帮助,但只持续了几天。从那以后这种情况一直发生。新的 SSD 驱动器可能有故障吗?我尝试将 Ubuntu 从 18.04 更新到 20.04,但无济于事。文件系统是 ext4。

编辑:Smartctl 报告:

=== START OF INFORMATION SECTION ===
Model Number:                       Samsung SSD 970 EVO Plus 2TB
Serial Number:                      S6P1NS0T501522T
Firmware Version:                   4B2QEXM7
PCI Vendor/Subsystem ID:            0x144d
IEEE OUI Identifier:                0x002538
Total NVM Capacity:                 2 000 398 934 016 [2,00 TB]
Unallocated NVM Capacity:           0
Controller ID:                      6
Number of Namespaces:               1
Namespace 1 Size/Capacity:          2 000 398 934 016 [2,00 TB]
Namespace 1 Utilization:            726 404 530 176 [726 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            002538 5521405351
Local Time is:                      Fri Sep  2 14:42:51 2022 CEST
Firmware Updates (0x16):            3 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x0057):     Comp Wr_Unc DS_Mngmt Sav/Sel_Feat Timestmp
Maximum Data Transfer Size:         128 Pages
Warning  Comp. Temp. Threshold:     82 Celsius
Critical Comp. Temp. Threshold:     85 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     7.59W       -        -    0  0  0  0        0       0
 1 +     7.59W       -        -    1  1  1  1        0     200
 2 +     7.59W       -        -    2  2  2  2        0    1000
 3 -   0.0500W       -        -    3  3  3  3     2000    1200
 4 -   0.0050W       -        -    4  4  4  4      500    9500

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        41 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    1 046 751 [535 GB]
Data Units Written:                 11 045 053 [5,65 TB]
Host Read Commands:                 21 511 754
Host Write Commands:                122 266 698
Controller Busy Time:               632
Power Cycles:                       20
Power On Hours:                     258
Unsafe Shutdowns:                   14
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               41 Celsius
Temperature Sensor 2:               46 Celsius

Error Information (NVMe Log 0x01, max 64 entries)
No Errors Logged

系统日志尾部:

Sep  1 07:35:45 containerd[1489]: time="2022-09-01T07:35:45.938574835+02:00" level=info msg="cleaning up dead shim"
Sep  1 07:35:45 dockerd[1609]: time="2022-09-01T07:35:45.938532925+02:00" level=info msg="ignoring event" container=c61acc2424d8b29cde658e65b0a12b21b7dd87a9c406532ee2fc75a68d565ab0 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Sep  1 07:35:45 containerd[1489]: time="2022-09-01T07:35:45.954480844+02:00" level=warning msg="cleanup warnings time=\"2022-09-01T07:35:45+02:00\" level=info msg=\"starting signal loop\" namespace=moby pid=3411558 runtime=io.containerd.runc.v2\n"
Sep  1 07:35:45 kernel: [598279.313677] veth0e65189: renamed from eth0
Sep  1 07:35:46 kernel: [598279.339095] br-9972a812410e: port 5(veth77e9014) entered disabled state
Sep  1 07:35:46 systemd-udevd[3408622]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Sep  1 07:35:46 NetworkManager[1276]: <info>  [1662010546.0671] manager: (veth0e65189): new Veth device (/org/freedesktop/NetworkManager/Devices/82537)
Sep  1 07:35:46 avahi-daemon[517479]: Interface veth77e9014.IPv6 no longer relevant for mDNS.
Sep  1 07:35:46 avahi-daemon[517479]: Leaving mDNS multicast group on interface veth77e9014.IPv6 with address fe80::b82c:fff:fe77:d9b4.
Sep  1 07:35:46 kernel: [598279.397005] br-9972a812410e: port 5(veth77e9014) entered disabled state
Sep  1 07:35:46 kernel: [598279.400491] device veth77e9014 left promiscuous mode
Sep  1 07:35:46 kernel: [598279.400494] br-9972a812410e: port 5(veth77e9014) entered disabled state
Sep  1 07:35:46 avahi-daemon[517479]: Withdrawing address record for fe80::b82c:fff:fe77:d9b4 on veth77e9014.
Sep  1 07:35:46 systemd-udevd[3408622]: veth0e65189: Failed to get link config: No such device
Sep  1 07:35:46 gnome-shell[1796]: Removing a network device that was not added
Sep  1 07:35:46 NetworkManager[1276]: <info>  [1662010546.1106] device (veth77e9014): released from master device br-9972a812410e
Sep  1 07:35:46 gnome-shell[1796]: Removing a network device that was not added
Sep  1 07:35:46 systemd[67738]: run-docker-netns-9bfc9b4bb9d2.mount: Succeeded.
Sep  1 07:35:46 systemd[67738]: var-lib-docker-containers-c61acc2424d8b29cde658e65b0a12b21b7dd87a9c406532ee2fc75a68d565ab0-mounts-shm.mount: Succeeded.
Sep  1 07:35:46 systemd[67738]: var-lib-docker-overlay2-a8794c75b463c71c93b29c9643accfb4e12fffe422f7060279f3d976db072b25-merged.mount: Succeeded.
Sep  1 07:35:46 systemd[361680]: run-docker-netns-9bfc9b4bb9d2.mount: Succeeded.
Sep  1 07:35:46 systemd[361680]: var-lib-docker-containers-c61acc2424d8b29cde658e65b0a12b21b7dd87a9c406532ee2fc75a68d565ab0-mounts-shm.mount: Succeeded.
Sep  1 07:35:46 systemd[361680]: var-lib-docker-overlay2-a8794c75b463c71c93b29c9643accfb4e12fffe422f7060279f3d976db072b25-merged.mount: Succeeded.
Sep  1 07:35:46 systemd[1712]: run-docker-netns-9bfc9b4bb9d2.mount: Succeeded.
Sep  1 07:35:46 systemd[1712]: var-lib-docker-containers-c61acc2424d8b29cde658e65b0a12b21b7dd87a9c406532ee2fc75a68d565ab0-mounts-shm.mount: Succeeded.
Sep  1 07:35:46 systemd[1712]: var-lib-docker-overlay2-a8794c75b463c71c93b29c9643accfb4e12fffe422f7060279f3d976db072b25-merged.mount: Succeeded.
Sep  1 07:35:46 systemd[960393]: run-docker-netns-9bfc9b4bb9d2.mount: Succeeded.
Sep  1 07:35:46 systemd[1]: run-docker-netns-9bfc9b4bb9d2.mount: Succeeded.
Sep  1 07:35:46 systemd[960393]: var-lib-docker-containers-c61acc2424d8b29cde658e65b0a12b21b7dd87a9c406532ee2fc75a68d565ab0-mounts-shm.mount: Succeeded.
Sep  1 07:35:46 systemd[960393]: var-lib-docker-overlay2-a8794c75b463c71c93b29c9643accfb4e12fffe422f7060279f3d976db072b25-merged.mount: Succeeded.
Sep  1 07:35:46 systemd[1]: var-lib-docker-containers-c61acc2424d8b29cde658e65b0a12b21b7dd87a9c406532ee2fc75a68d565ab0-mounts-shm.mount: Succeeded.
Sep  1 07:35:46 systemd[1]: var-lib-docker-overlay2-a8794c75b463c71c93b29c9643accfb4e12fffe422f7060279f3d976db072b25-merged.mount: Succeeded.
Sep  1 07:35:46 avahi-daemon[517479]: Joining mDNS multicast group on interface veth1b1eea5.IPv6 with address fe80::d05a:41ff:fe71:6d0f.
Sep  1 07:35:46 avahi-daemon[517479]: New relevant interface veth1b1eea5.IPv6 for mDNS.
Sep  1 07:35:46 avahi-daemon[517479]: Registering new address record for fe80::d05a:41ff:fe71:6d0f on veth1b1eea5.*.
Sep  1 07:35:46 kernel: [598279.963079] Route cache is full: consider increasing sysctl net.ipv[4|6].route.max_size.
Sep  1 07:35:46 kernel: [598279.963138] Route cache is full: consider increasing sysctl net.ipv[4|6].route.max_size.
Sep  1 07:35:46 kernel: [598279.974695] Route cache is full: consider increasing sysctl net.ipv[4|6].route.max_size.
Sep  1 07:35:46 kernel: [598280.063961] Route cache is full: consider increasing sysctl net.ipv[4|6].route.max_size.
Sep  1 07:35:46 kernel: [598280.114831] Route cache is full: consider increasing sysctl net.ipv[4|6].route.max_size.
Sep  1 07:35:46 kernel: [598280.182623] Route cache is full: consider increasing sysctl net.ipv[4|6].route.max_size.
Sep  1 07:35:46 kernel: [598280.241481] EXT4-fs error (device nvme0n1p2): __ext4_find_entry:1551: inode #40395175: comm updatedb.mlocat: checksumming directory block 0

EXT4-fs error (device nvme0n1p2): __ext4_find_entry:1551: inode #40395175: comm updatedb.mlocat: checksumming directory block 0似乎值得注意

lsblk -f输出:

nvme0n1
├─nvme0n1p1 vfat                     F062-0FA7 505,8M     1% /boot/efi
└─nvme0n1p2 ext4                     83f2e983-979f-4303-a7f9-837b7a8d65f0    1,1T    35% /
nvme1n1     ext4     filesystem_home 7af8bdbe-5605-4957-af95-69a790a8f67a 1009,1G    40% /home

答案1

如果有人偶然发现类似的问题,以下是我学到的东西。驱动器出现问题后,为了方便更换驱动器,我将一个高流量应用程序(sentry)的 docker 卷移至另一个驱动器。应用程序本身(docker-compose)位于另一个正常工作的驱动器上。

之后就没再出现问题。我怀疑这不是巧合,但应用程序和 docker 卷位于不同的物理驱动器上确实为问题创造了环境。此外,没有采取其他措施(没有更新等),因为这只是为下次故障时完全更换驱动器做准备。

编辑:又发生了。另外还有两个物理 2TB SSD 驱动器。一个作为系统,一个作为 /home。首先发生在系统驱动器上,我用 WD 的类似规格 SSD 驱动器替换了它。然后这些锁定开始出现在安装到 /home 的第二个三星驱动器上。所以我更换了两个。固件是最新版本,一切都已更新。似乎是批次错误或一些常见的 Ubuntu 固件问题。

相关内容