在我的 Debian/Sid 台式家用计算机(AMD Ryzen 2970WX、一些 MSI 399 主板)上,我有一个 SSD M2 NVME,并且我收到root
自动邮件,例如:
The following warning/error was logged by the smartd daemon: Device: /dev/nvme0, number of Error Log entries increased from 423 to 424 Device info: Samsung SSD 970 EVO 2TB, S/N:S464NB0KA03837J, FW:2B2QEXE7, 2.00 TB For details see host's SYSLOG.
该 SSD 磁盘包含根分区(根据 填充至 29% df -h
)和/home
(填充至 5%)。
台式机(通过 UPS 24 小时/24 供电,位于法国巴黎附近)主要用于开发参考系统以及通常的软件开发人员活动(软件构建、调试、测试以及邮件和网页浏览、LaTeX、emacs、运行./refpersys
等)。
我的理解是SSD NVME磁盘的正常磨损。
在smartctl --test=short /dev/nvme0
(以 root 身份运行)之后,命令smartctl -a /dev/nvme0
给出
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-3-amd64] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: Samsung SSD 970 EVO 2TB
Serial Number: S464NB0KA03837J
Firmware Version: 2B2QEXE7
PCI Vendor/Subsystem ID: 0x144d
IEEE OUI Identifier: 0x002538
Total NVM Capacity: 2,000,398,934,016 [2.00 TB]
Unallocated NVM Capacity: 0
Controller ID: 4
Number of Namespaces: 1
Namespace 1 Size/Capacity: 2,000,398,934,016 [2.00 TB]
Namespace 1 Utilization: 258,943,426,560 [258 GB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 002538 5a81b50e6f
Local Time is: Mon Feb 3 10:28:49 2020 MET
Firmware Updates (0x16): 3 Slots, no Reset required
Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Maximum Data Transfer Size: 512 Pages
Warning Comp. Temp. Threshold: 82 Celsius
Critical Comp. Temp. Threshold: 82 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 6.20W - - 0 0 0 0 0 0
1 + 4.30W - - 1 1 1 1 0 0
2 + 2.10W - - 2 2 2 2 0 0
3 - 0.0400W - - 3 3 3 3 210 1200
4 - 0.0050W - - 4 4 4 4 2000 8000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 0
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 39 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 0%
Data Units Read: 170,671,345 [87.3 TB]
Data Units Written: 6,787,146 [3.47 TB]
Host Read Commands: 1,072,794,583
Host Write Commands: 62,979,313
Controller Busy Time: 1,480
Power Cycles: 196
Power On Hours: 906
Unsafe Shutdowns: 136
Media and Data Integrity Errors: 0
Error Information Log Entries: 427
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 39 Celsius
Temperature Sensor 2: 43 Celsius
Error Information (NVMe Log 0x01, max 64 entries)
No Errors Logged
这是输出smartctl -x /dev/nvme0
:
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-3-amd64] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: Samsung SSD 970 EVO 2TB
Serial Number: S464NB0KA03837J
Firmware Version: 2B2QEXE7
PCI Vendor/Subsystem ID: 0x144d
IEEE OUI Identifier: 0x002538
Total NVM Capacity: 2,000,398,934,016 [2.00 TB]
Unallocated NVM Capacity: 0
Controller ID: 4
Number of Namespaces: 1
Namespace 1 Size/Capacity: 2,000,398,934,016 [2.00 TB]
Namespace 1 Utilization: 258,943,426,560 [258 GB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 002538 5a81b50e6f
Local Time is: Mon Feb 3 10:42:30 2020 MET
Firmware Updates (0x16): 3 Slots, no Reset required
Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Maximum Data Transfer Size: 512 Pages
Warning Comp. Temp. Threshold: 82 Celsius
Critical Comp. Temp. Threshold: 82 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 6.20W - - 0 0 0 0 0 0
1 + 4.30W - - 1 1 1 1 0 0
2 + 2.10W - - 2 2 2 2 0 0
3 - 0.0400W - - 3 3 3 3 210 1200
4 - 0.0050W - - 4 4 4 4 2000 8000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 0
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 39 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 0%
Data Units Read: 170,671,351 [87.3 TB]
Data Units Written: 6,787,156 [3.47 TB]
Host Read Commands: 1,072,794,690
Host Write Commands: 62,980,162
Controller Busy Time: 1,480
Power Cycles: 196
Power On Hours: 906
Unsafe Shutdowns: 136
Media and Data Integrity Errors: 0
Error Information Log Entries: 427
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 39 Celsius
Temperature Sensor 2: 42 Celsius
Error Information (NVMe Log 0x01, max 64 entries)
No Errors Logged
我发现这样的测试和no errors logged
信息令人安心。
问题:
我什么时候应该担心?
在理想的情况下,我希望在 SSD 意外崩溃之前更换它。我听说有传言说 SSD 完全失效(不像旋转硬盘那样逐渐失效)。我根本不是硬件专家。
我应该运行什么 Linux 命令(例如每月)来评估 SSD NVME 磁盘的磨损情况?
答案1
尝试(需要通过 apt、synaptic 或类似方式安装sudo nvme error-log /dev/nvme0
软件包)。nvme-cli
答案2
SSD 的使用寿命通常受到存储单元可以处理的写入次数的限制。 SSD/NVMe 中的存储单元在发生故障之前只能承受有限数量的写入操作。
通常,固态器件中的存储单元在每次写入操作期间都会经历磨损,并且每个存储单元仅容忍有限(有限)次数的重写次数。 (随着每个单元中存储的位数增加,该值通常越来越小:SLC->MLC->TLC->QLC)。大多数固态磁盘(SSD 设备)通过各种属性报告内存单元的整体运行状况。
您可以用来smartctl -a /dev/nvme0n1
查看有关 nvme 磁盘的运行状况信息。
显示percentage_used
磁盘制造商计算出的有关磁盘寿命的数字。该available_spare
参数指示有多少可用的备用存储单元。除非“available_spare”下降到 1% 并且“percentage_used”上升到 100%,否则 SSD 仍然可以正常工作。
一旦达到这些限制,您应该考虑更换 SSD。另一个指标是Critical Warning
应始终0x00
检查NVMe规范对于其他号码。
失败/磨损的 NVMe 的截断示例输出:
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
- available spare has fallen below threshold
- media has been placed in read only mode
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x09
Temperature: 54 Celsius
Available Spare: 0%
Available Spare Threshold: 10%
Percentage Used: 3%
Data Units Read: 152,049,475 [77.8 TB]
Data Units Written: 123,071,212 [63.0 TB]
健康 NVMe 的示例:
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 38 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 1%
Data Units Read: 8,602,182 [4.40 TB]
Data Units Written: 13,527,143 [6.92 TB]
已使用一些备件但仍然正常的 NVMe 示例:
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 45 Celsius
Available Spare: 74%
Available Spare Threshold: 10%
Percentage Used: 3%
Data Units Read: 435,391,613 [222 TB]
Data Units Written: 47,171,668 [24.1 TB]
好读: NVMe 驱动器的使用寿命