我正在对 Debian Buster 系统进行故障排除,该系统偶尔会变得无响应。看着dmesg
,我看到一些令人担忧的消息弹出:
[Wed Apr 19 19:39:47 2023] ata1.00: exception Emask 0x10 SAct 0x0 SErr 0x4050000 action 0xe frozen
[Wed Apr 19 19:39:47 2023] ata1.00: irq_stat 0x00000040, connection status changed
[Wed Apr 19 19:39:47 2023] ata1: SError: { PHYRdyChg CommWake DevExch }
[Wed Apr 19 19:39:47 2023] ata1.00: failed command: WRITE DMA EXT
[Wed Apr 19 19:39:47 2023] ata1.00: cmd 35/00:18:68:02:96/00:00:1d:00:00/e0 tag 19 dma 12288 out
res 50/00:00:00:00:00/00:00:00:00:00/a0 Emask 0x10 (ATA bus error)
[Wed Apr 19 19:39:47 2023] ata1.00: status: { DRDY }
[Wed Apr 19 19:39:47 2023] ata1: hard resetting link
[Wed Apr 19 19:39:48 2023] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[Wed Apr 19 19:39:48 2023] ata1.00: supports DRM functions and may not be fully accessible
[Wed Apr 19 19:39:48 2023] ata1.00: supports DRM functions and may not be fully accessible
[Wed Apr 19 19:39:48 2023] ata1.00: configured for UDMA/33
[Wed Apr 19 19:39:48 2023] ata1: EH complete
[Wed Apr 19 19:39:48 2023] ata1.00: Enabling discard_zeroes_data
这些消息(重复出现)似乎表明 SATA 链路需要每隔几分钟重置一次。
我运行了扩展的 SMART 测试/dev/sda
,它没有检测到任何故障(完整日志):
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-5.9.0-0.bpo.5-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: Samsung SSD 860 PRO 512GB
Serial Number: S5HTNE0N107136V
LU WWN Device Id: 5 002538 e2014235a
Firmware Version: RVM02B6Q
User Capacity: 512,110,190,592 bytes [512 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Thu Apr 20 08:10:54 2023 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is: Unavailable
APM feature is: Unavailable
Rd look-ahead is: Enabled
Write cache is: Enabled
DSN feature is: Unavailable
ATA Security is: Disabled, frozen [SEC2]
Wt Cache Reorder: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
[...]
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
5 Reallocated_Sector_Ct PO--CK 100 100 010 - 0
9 Power_On_Hours -O--CK 096 096 000 - 16420
12 Power_Cycle_Count -O--CK 099 099 000 - 372
177 Wear_Leveling_Count PO--C- 099 099 000 - 17
179 Used_Rsvd_Blk_Cnt_Tot PO--C- 100 100 010 - 0
181 Program_Fail_Cnt_Total -O--CK 100 100 010 - 0
182 Erase_Fail_Count_Total -O--CK 100 100 010 - 0
183 Runtime_Bad_Block PO--C- 100 100 010 - 0
187 Reported_Uncorrect -O--CK 100 100 000 - 0
190 Airflow_Temperature_Cel -O--CK 079 045 000 - 21
195 Hardware_ECC_Recovered -O-RC- 200 200 000 - 0
199 UDMA_CRC_Error_Count -OSRCK 100 100 000 - 0
235 Unknown_Attribute -O--C- 099 099 000 - 230
241 Total_LBAs_Written -O--CK 099 099 000 - 9965553603
[...]
SMART Extended Self-test Log Version: 1 (1 sectors)
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 16411 -
# 2 Short offline Completed without error 00% 16406 -
# 3 Short offline Completed without error 00% 16405 -
我不认为这是文件系统的错误,但尽管如此,我尝试了内核命令行选项fsck.mode=force
,但它似乎并没有实际检查除 EFI 分区之外的磁盘。
这是否表明某种故障模式,例如磁盘故障、连接不良或文件系统损坏?