我买了一台新电脑还不到一年(4cpus 英特尔 I5、32GB 内存、250GB SSD)。我全新安装了 Debian 10。我安装的东西非常精简 - 将臃肿的东西降到最低,这样我就可以快速运行新操作系统。
在过去的几天里,我注意到一种奇怪的模式。我有这些非常大的文件(用 zst 压缩),有时需要解压缩。它们压缩后大约有 1GB,解压缩后大约有 15GB(这不是一项艰巨的任务,但对我的系统来说肯定不容忽视)。我使用 解压缩它们zstd -cd 20201216.zst > 20201216.log
。运行时,zstd
打印目前的进度。我注意到它有时会停止 20-30 秒,然后恢复。起初我以为是我不小心启动了多个任务,是某种争用导致了这种情况。但检查后htop
您会发现操作系统上同时发生的情况很少(大量可用 RAM,所有 4 个 CPU 约占 1%)。此外,我检查了它,iotop
发现当zstd
它说它正在工作时,iotop
显示非常大的 100MB/s 读写速度。当zstd
没有进展时,iotop
显示 0B/s 读写。所以问题既不是 CPU 争用也不是磁盘争用。
有时,但很少,整个系统会在此过程中冻结。大多数情况下,在zstd
冻结期间我都可以正常使用系统。
我还应该看看什么来调试这个问题?
编辑:我已运行 smartctl,以下是报告。我还不知道如何解释它,正在研究它。
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.19.0-9-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: VENO SCORP SSD 240GB
Serial Number: GSDMC206010008
Firmware Version: XKR905
User Capacity: 240,057,409,536 bytes [240 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-3 T13/2161-D revision 4
SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sun Dec 20 17:36:11 2020 GMT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 120) seconds.
Offline data collection
capabilities: (0x11) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
No Selective Self-test supported.
SMART capabilities: (0x0002) Does not save SMART data before
entering power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 10) minutes.
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x0032 100 100 050 Old_age Always - 0
5 Reallocated_Sector_Ct 0x0032 100 100 050 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 050 Old_age Always - 816
12 Power_Cycle_Count 0x0032 100 100 050 Old_age Always - 138
160 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 0
161 Unknown_Attribute 0x0033 100 100 050 Pre-fail Always - 100
163 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 17
164 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 9546
165 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 30
166 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 3
167 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 13
168 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 1500
169 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 100
175 Program_Fail_Count_Chip 0x0032 100 100 050 Old_age Always - 0
176 Erase_Fail_Count_Chip 0x0032 100 100 050 Old_age Always - 0
177 Wear_Leveling_Count 0x0032 100 100 050 Old_age Always - 0
178 Used_Rsvd_Blk_Cnt_Chip 0x0032 100 100 050 Old_age Always - 0
181 Program_Fail_Cnt_Total 0x0032 100 100 050 Old_age Always - 0
182 Erase_Fail_Count_Total 0x0032 100 100 050 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 050 Old_age Always - 27
194 Temperature_Celsius 0x0022 100 100 050 Old_age Always - 40
195 Hardware_ECC_Recovered 0x0032 100 100 050 Old_age Always - 7896
196 Reallocated_Event_Count 0x0032 100 100 050 Old_age Always - 0
197 Current_Pending_Sector 0x0032 100 100 050 Old_age Always - 0
198 Offline_Uncorrectable 0x0032 100 100 050 Old_age Always - 0
232 Available_Reservd_Space 0x0032 100 100 050 Old_age Always - 100
241 Total_LBAs_Written 0x0030 100 100 050 Old_age Offline - 19968
242 Total_LBAs_Read 0x0030 100 100 050 Old_age Offline - 5880
245 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 25733
SMART Error Log Version: 1
Warning: ATA error count 0 inconsistent with error log pointer 4
ATA Error Count: 0
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error -4 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
00 00 00 00 00 00 00
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
b0 d0 01 00 4f c2 00 08 00:00:00.000 SMART READ DATA
b0 d1 01 01 4f c2 00 08 00:00:00.000 SMART READ ATTRIBUTE THRESHOLDS [OBS-4]
b0 da 00 00 4f c2 00 08 00:00:00.000 SMART RETURN STATUS
b0 d5 01 00 4f c2 00 08 00:00:00.000 SMART READ LOG
b0 d5 01 01 4f c2 00 08 00:00:00.000 SMART READ LOG
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
Selective Self-tests/Logging not supported
答案1
如果您想在 Linux 终端中检查 HDD/SSD 是否存在错误,我建议使用 Linux 的 HDSentinel 而不是 smartctl,因为结果更容易读取...
只需下载,解压到 /usr/bin,chmod 为 755 并以 root 身份运行
sudo hdsentinel -dev /dev/sda (或将 sda 替换为您的驱动器名称)