NVMe 错误诊断

NVMe 错误诊断

我想了解为什么我收到以下有关新 NVMe 驱动器 SMART 的邮件。

DMESG

$ dmesg --ctime | grep -i nvm

[Mon Aug  8 10:48:31 2022] nvme nvme0: pci function 0000:3d:00.0
[Mon Aug  8 10:48:31 2022] nvme nvme0: missing or invalid SUBNQN field.
[Mon Aug  8 10:48:31 2022] nvme nvme0: Shutdown timeout set to 8 seconds
[Mon Aug  8 10:48:31 2022] nvme nvme0: 8/0/0 default/read/poll queues
[Mon Aug  8 10:48:31 2022]  nvme0n1: p1 p2
[Mon Aug  8 10:48:37 2022] EXT4-fs (nvme0n1p2): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
[Mon Aug  8 10:48:37 2022] EXT4-fs (nvme0n1p2): re-mounted. Opts: errors=remount-ro. Quota mode: none.

NVME错误

$ sudo nvme error-log /dev/nvme0

...
 Entry[63]   
.................
error_count     : 0
sqid            : 0
cmdid           : 0
status_field    : 0(SUCCESS: The command completed successfully)
phase_tag       : 0
parm_err_loc    : 0
lba             : 0
nsid            : 0
vs              : 0
trtype          : The transport type is not indicated or the error is not transport related.
cs              : 0
trtype_spec_info: 0
.................
...

谁能解释一下为什么我会收到这样的新邮件:

邮件

# mail

Message 44:
From root@dell-laptop-CENSORED  Sun Aug  7 08:13:07 2022
X-Original-To: root
To: root@dell-laptop-CENSORED
Subject: SMART error (ErrorCount) detected on host: dell-inspiron-15
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8bit
Date: Sun,  7 Aug 2022 08:12:59 +0200 (CEST)
From: root <root@dell-laptop-CENSORED>


This message was generated by the smartd daemon running on:

   host name:  dell-inspiron-15
   DNS domain: [Empty]

The following warning/error was logged by the smartd daemon:

Device: /dev/nvme0, number of Error Log entries increased from 485 to 486

Device info:
Samsung SSD 970 EVO Plus 2TB, S/N:<!--CENSORED-->, FW:2B2QEXM7, 2.00 TB
                                    
For details see host's SYSLOG.

You can also use the smartctl utility for further investigation.
The original message about this issue was sent at Fri Apr 22 09:53:56 2022 CEST
Another message will be sent in 24 hours if the problem persists.

聪明的

# smartctl -a /dev/nvme0n1

smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-43-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       Samsung SSD 970 EVO Plus 2TB
Serial Number:                      <CENSORED>
Firmware Version:                   2B2QEXM7
PCI Vendor/Subsystem ID:            0x144d
IEEE OUI Identifier:                0x002538
Total NVM Capacity:                 2,000,398,934,016 [2.00 TB]
Unallocated NVM Capacity:           0
Controller ID:                      4
NVMe Version:                       1.3
Number of Namespaces:               1
Namespace 1 Size/Capacity:          2,000,398,934,016 [2.00 TB]
Namespace 1 Utilization:            544,784,187,392 [544 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            002538 5221904ad7
Local Time is:                      Mon Aug  8 11:13:10 2022 CEST
Firmware Updates (0x16):            3 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x03):         S/H_per_NS Cmd_Eff_Lg
Maximum Data Transfer Size:         512 Pages
Warning  Comp. Temp. Threshold:     85 Celsius
Critical Comp. Temp. Threshold:     85 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     7.50W       -        -    0  0  0  0        0       0
 1 +     5.90W       -        -    1  1  1  1        0       0
 2 +     3.60W       -        -    2  2  2  2        0       0
 3 -   0.0700W       -        -    3  3  3  3      210    1200
 4 -   0.0050W       -        -    4  4  4  4     2000    8000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        44 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    5,565,230 [2.84 TB]
Data Units Written:                 2,658,490 [1.36 TB]
Host Read Commands:                 29,877,415
Host Write Commands:                18,211,598
Controller Busy Time:               112
Power Cycles:                       240
Power On Hours:                     215
Unsafe Shutdowns:                   5
Media and Data Integrity Errors:    0
Error Information Log Entries:      502
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               44 Celsius
Temperature Sensor 2:               39 Celsius

Error Information (NVMe Log 0x01, 16 of 64 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
  0        502     0  0x1005  0x4004      -            0     0     -

系统日志

# cat /var/log/syslog | grep -i smart | grep -i nvm

Aug  7 16:08:27 dell-inspiron-15 smartd[1001]: Device: /dev/nvme0, opened
Aug  7 16:08:27 dell-inspiron-15 smartd[1001]: Device: /dev/nvme0, Samsung SSD 970 EVO Plus 2TB, S/N:S4J4NM0T201785H, FW:2B2QEXM7, 2.00 TB
Aug  7 16:08:27 dell-inspiron-15 smartd[1001]: Device: /dev/nvme0, is SMART capable. Adding to "monitor" list.
Aug  7 16:08:27 dell-inspiron-15 smartd[1001]: Device: /dev/nvme0, state read from /var/lib/smartmontools/smartd.Samsung_SSD_970_EVO_Plus_2TB-S4J4NM0T201785H.nvme.state
Aug  7 16:08:27 dell-inspiron-15 smartd[1001]: Monitoring 1 ATA/SATA, 0 SCSI/SAS and 1 NVMe devices
Aug  7 16:08:28 dell-inspiron-15 smartd[1001]: Device: /dev/nvme0, number of Error Log entries increased from 486 to 487
Aug  7 16:08:28 dell-inspiron-15 smartd[1001]: Device: /dev/nvme0, state written to /var/lib/smartmontools/smartd.Samsung_SSD_970_EVO_Plus_2TB-S4J4NM0T201785H.nvme.state
Aug  8 07:17:38 dell-inspiron-15 smartd[973]: Device: /dev/nvme0, opened
Aug  8 07:17:38 dell-inspiron-15 smartd[973]: Device: /dev/nvme0, Samsung SSD 970 EVO Plus 2TB, S/N:S4J4NM0T201785H, FW:2B2QEXM7, 2.00 TB
Aug  8 07:17:38 dell-inspiron-15 smartd[973]: Device: /dev/nvme0, is SMART capable. Adding to "monitor" list.
Aug  8 07:17:38 dell-inspiron-15 smartd[973]: Device: /dev/nvme0, state read from /var/lib/smartmontools/smartd.Samsung_SSD_970_EVO_Plus_2TB-S4J4NM0T201785H.nvme.state
Aug  8 07:17:38 dell-inspiron-15 smartd[973]: Monitoring 1 ATA/SATA, 0 SCSI/SAS and 1 NVMe devices
Aug  8 07:17:38 dell-inspiron-15 smartd[973]: Device: /dev/nvme0, number of Error Log entries increased from 487 to 488
Aug  8 07:17:38 dell-inspiron-15 smartd[973]: Device: /dev/nvme0, state written to /var/lib/smartmontools/smartd.Samsung_SSD_970_EVO_Plus_2TB-S4J4NM0T201785H.nvme.state
Aug  8 08:21:16 dell-inspiron-15 smartd[973]: Device: /dev/nvme0, state written to /var/lib/smartmontools/smartd.Samsung_SSD_970_EVO_Plus_2TB-S4J4NM0T201785H.nvme.state
Aug  8 11:14:00 dell-inspiron-15 smartd[971]: Device: /dev/nvme0, opened
Aug  8 11:14:00 dell-inspiron-15 smartd[971]: Device: /dev/nvme0, Samsung SSD 970 EVO Plus 2TB, S/N:S4J4NM0T201785H, FW:2B2QEXM7, 2.00 TB
Aug  8 11:14:00 dell-inspiron-15 smartd[971]: Device: /dev/nvme0, is SMART capable. Adding to "monitor" list.
Aug  8 11:14:00 dell-inspiron-15 smartd[971]: Device: /dev/nvme0, state read from /var/lib/smartmontools/smartd.Samsung_SSD_970_EVO_Plus_2TB-S4J4NM0T201785H.nvme.state
Aug  8 11:14:00 dell-inspiron-15 smartd[971]: Monitoring 1 ATA/SATA, 0 SCSI/SAS and 1 NVMe devices
Aug  8 11:14:00 dell-inspiron-15 smartd[971]: Device: /dev/nvme0, number of Error Log entries increased from 488 to 494
Aug  8 11:14:01 dell-inspiron-15 smartd[971]: Device: /dev/nvme0, state written to /var/lib/smartmontools/smartd.Samsung_SSD_970_EVO_Plus_2TB-S4J4NM0T201785H.nvme.state
Aug  8 12:48:40 dell-inspiron-15 smartd[1024]: Device: /dev/nvme0, opened
Aug  8 12:48:40 dell-inspiron-15 smartd[1024]: Device: /dev/nvme0, Samsung SSD 970 EVO Plus 2TB, S/N:S4J4NM0T201785H, FW:2B2QEXM7, 2.00 TB
Aug  8 12:48:40 dell-inspiron-15 smartd[1024]: Device: /dev/nvme0, is SMART capable. Adding to "monitor" list.
Aug  8 12:48:40 dell-inspiron-15 smartd[1024]: Device: /dev/nvme0, state read from /var/lib/smartmontools/smartd.Samsung_SSD_970_EVO_Plus_2TB-S4J4NM0T201785H.nvme.state
Aug  8 12:48:40 dell-inspiron-15 smartd[1024]: Monitoring 1 ATA/SATA, 0 SCSI/SAS and 1 NVMe devices
Aug  8 12:48:40 dell-inspiron-15 smartd[1024]: Device: /dev/nvme0, number of Error Log entries increased from 494 to 502
Aug  8 12:48:40 dell-inspiron-15 smartd[1024]: Device: /dev/nvme0, state written to /var/lib/smartmontools/smartd.Samsung_SSD_970_EVO_Plus_2TB-S4J4NM0T201785H.nvme.state

相关内容