我的理解是SSD NVME磁盘的正常磨损。

我的理解是SSD NVME磁盘的正常磨损。

在我的 Debian/Sid 台式家用计算机(AMD Ryzen 2970WX、一些 MSI 399 主板)上,我有一个 SSD M2 NVME,并且我收到root自动邮件,例如:

 The following warning/error was logged by the smartd daemon:

 Device: /dev/nvme0, number of Error Log entries increased from 423 to
 424

 Device info: Samsung SSD 970 EVO 2TB, S/N:S464NB0KA03837J,
 FW:2B2QEXE7, 2.00 TB

For details see host's SYSLOG.

该 SSD 磁盘包含根分区(根据 填充至 29% df -h)和/home(填充至 5%)。

台式机(通过 UPS 24 小时/24 供电,位于法国巴黎附近)主要用于开发参考系统以及通常的软件开发人员活动(软件构建、调试、测试以及邮件和网页浏览、LaTeX、emacs、运行./refpersys等)。

我的理解是SSD NVME磁盘的正常磨损。

smartctl --test=short /dev/nvme0(以 root 身份运行)之后,命令smartctl -a /dev/nvme0 给出

smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-3-amd64] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       Samsung SSD 970 EVO 2TB
Serial Number:                      S464NB0KA03837J
Firmware Version:                   2B2QEXE7
PCI Vendor/Subsystem ID:            0x144d
IEEE OUI Identifier:                0x002538
Total NVM Capacity:                 2,000,398,934,016 [2.00 TB]
Unallocated NVM Capacity:           0
Controller ID:                      4
Number of Namespaces:               1
Namespace 1 Size/Capacity:          2,000,398,934,016 [2.00 TB]
Namespace 1 Utilization:            258,943,426,560 [258 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            002538 5a81b50e6f
Local Time is:                      Mon Feb  3 10:28:49 2020 MET
Firmware Updates (0x16):            3 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Maximum Data Transfer Size:         512 Pages
Warning  Comp. Temp. Threshold:     82 Celsius
Critical Comp. Temp. Threshold:     82 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     6.20W       -        -    0  0  0  0        0       0
 1 +     4.30W       -        -    1  1  1  1        0       0
 2 +     2.10W       -        -    2  2  2  2        0       0
 3 -   0.0400W       -        -    3  3  3  3      210    1200
 4 -   0.0050W       -        -    4  4  4  4     2000    8000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        39 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    170,671,345 [87.3 TB]
Data Units Written:                 6,787,146 [3.47 TB]
Host Read Commands:                 1,072,794,583
Host Write Commands:                62,979,313
Controller Busy Time:               1,480
Power Cycles:                       196
Power On Hours:                     906
Unsafe Shutdowns:                   136
Media and Data Integrity Errors:    0
Error Information Log Entries:      427
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               39 Celsius
Temperature Sensor 2:               43 Celsius

Error Information (NVMe Log 0x01, max 64 entries)
No Errors Logged

这是输出smartctl -x /dev/nvme0

smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-3-amd64] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       Samsung SSD 970 EVO 2TB
Serial Number:                      S464NB0KA03837J
Firmware Version:                   2B2QEXE7
PCI Vendor/Subsystem ID:            0x144d
IEEE OUI Identifier:                0x002538
Total NVM Capacity:                 2,000,398,934,016 [2.00 TB]
Unallocated NVM Capacity:           0
Controller ID:                      4
Number of Namespaces:               1
Namespace 1 Size/Capacity:          2,000,398,934,016 [2.00 TB]
Namespace 1 Utilization:            258,943,426,560 [258 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            002538 5a81b50e6f
Local Time is:                      Mon Feb  3 10:42:30 2020 MET
Firmware Updates (0x16):            3 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Maximum Data Transfer Size:         512 Pages
Warning  Comp. Temp. Threshold:     82 Celsius
Critical Comp. Temp. Threshold:     82 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     6.20W       -        -    0  0  0  0        0       0
 1 +     4.30W       -        -    1  1  1  1        0       0
 2 +     2.10W       -        -    2  2  2  2        0       0
 3 -   0.0400W       -        -    3  3  3  3      210    1200
 4 -   0.0050W       -        -    4  4  4  4     2000    8000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        39 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    170,671,351 [87.3 TB]
Data Units Written:                 6,787,156 [3.47 TB]
Host Read Commands:                 1,072,794,690
Host Write Commands:                62,980,162
Controller Busy Time:               1,480
Power Cycles:                       196
Power On Hours:                     906
Unsafe Shutdowns:                   136
Media and Data Integrity Errors:    0
Error Information Log Entries:      427
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               39 Celsius
Temperature Sensor 2:               42 Celsius

Error Information (NVMe Log 0x01, max 64 entries)
No Errors Logged

我发现这样的测试和no errors logged信息令人安心。

问题:

我什么时候应该担心?

在理想的情况下,我希望在 SSD 意外崩溃之前更换它。我听说有传言说 SSD 完全失效(不像旋转硬盘那样逐渐失效)。我根本不是硬件专家。

我应该运行什么 Linux 命令(例如每月)来评估 SSD NVME 磁盘的磨损情况?

答案1

尝试(需要通过 apt、synaptic 或类似方式安装sudo nvme error-log /dev/nvme0软件包)。nvme-cli

答案2

SSD 的使用寿命通常受到存储单元可以处理的写入次数的限制。 SSD/NVMe 中的存储单元在发生故障之前只能承受有限数量的写入操作。

通常,固态器件中的存储单元在每次写入操作期间都会经历磨损,并且每个存储单元仅容忍有限(有限)次数的重写次数。 (随着每个单元中存储的位数增加,该值通常越来越小:SLC->MLC->TLC->QLC)。大多数固态磁盘(SSD 设备)通过各种属性报告内存单元的整体运行状况。

您可以用来smartctl -a /dev/nvme0n1查看有关 nvme 磁盘的运行状况信息。

显示percentage_used磁盘制造商计算出的有关磁盘寿命的数字。该available_spare参数指示有多少可用的备用存储单元。除非“available_spare”下降到 1% 并且“percentage_used”上升到 100%,否则 SSD 仍然可以正常工作。

一旦达到这些限制,您应该考虑更换 SSD。另一个指标是Critical Warning应始终0x00检查NVMe规范对于其他号码。

失败/磨损的 NVMe 的截断示例输出:

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
- available spare has fallen below threshold
- media has been placed in read only mode

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x09
Temperature:                        54 Celsius
Available Spare:                    0%
Available Spare Threshold:          10%
Percentage Used:                    3%
Data Units Read:                    152,049,475 [77.8 TB]
Data Units Written:                 123,071,212 [63.0 TB]

健康 NVMe 的示例:

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        38 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    1%
Data Units Read:                    8,602,182 [4.40 TB]
Data Units Written:                 13,527,143 [6.92 TB]

已使用一些备件但仍然正常的 NVMe 示例:

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        45 Celsius
Available Spare:                    74%
Available Spare Threshold:          10%
Percentage Used:                    3%
Data Units Read:                    435,391,613 [222 TB]
Data Units Written:                 47,171,668 [24.1 TB]

好读: NVMe 驱动器的使用寿命

相关内容