当我查看其中一台服务器的系统磁盘的 SMART 报告时,我注意到其数量Power On Hours
比我预期的要少。
#> sudo smartctl -a /dev/disk/by-id/nvme-Samsung_SSD_970_EVO_500GB_S466NX0KB88026N
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-4.15.0-44-generic] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: Samsung SSD 970 EVO 500GB
Serial Number: S466NX0KB88026N
Firmware Version: 2B2QEXE7
PCI Vendor/Subsystem ID: 0x144d
IEEE OUI Identifier: 0x002538
Total NVM Capacity: 500,107,862,016 [500 GB]
Unallocated NVM Capacity: 0
Controller ID: 4
Number of Namespaces: 1
Namespace 1 Size/Capacity: 500,107,862,016 [500 GB]
Namespace 1 Utilization: 287,812,485,120 [287 GB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 002538 5b81b2c6fe
Local Time is: Fri May 31 10:50:46 2019 CEST
Firmware Updates (0x16): 3 Slots, no Reset required
Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Maximum Data Transfer Size: 512 Pages
Warning Comp. Temp. Threshold: 85 Celsius
Critical Comp. Temp. Threshold: 85 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 6.20W - - 0 0 0 0 0 0
1 + 4.30W - - 1 1 1 1 0 0
2 + 2.10W - - 2 2 2 2 0 0
3 - 0.0400W - - 3 3 3 3 210 1200
4 - 0.0050W - - 4 4 4 4 2000 8000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 0
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 37 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 0%
Data Units Read: 2,078,942 [1.06 TB]
Data Units Written: 7,888,803 [4.03 TB]
Host Read Commands: 8,436,608
Host Write Commands: 252,956,650
Controller Busy Time: 241
Power Cycles: 54
Power On Hours: 775
Unsafe Shutdowns: 24
Media and Data Integrity Errors: 0
Error Information Log Entries: 3
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 37 Celsius
Temperature Sensor 2: 40 Celsius
Error Information (NVMe Log 0x01, max 64 entries)
No Errors Logged
该服务器数月来一直 24/7 运行(Ubuntu 18.04 amd64,smartmontools
版本号7.0-0ubuntu1~ubuntu18.04.1
),但计算一下(775/24),似乎磁盘已经运行了 32 天。
事实上,uptime
该服务器已运行 3 个月了。
#> uptime
10:33:44 up 100 days, 11:56, 1 user, load average: 0.35, 0.14, 0.08
这个计数器坏了吗,是我读错了还是有第三个假设?
(不仅因为它们的可靠性较低,而且我不再购买希捷硬盘的另一个原因是我无法“用肉眼”读取它们的 SMART 报告。)
答案1
据我所知,970 EVO 以 8 分之一小时为单位报告其通电时间,因此乘以 8 即可得到真实值。对于我在 Windows 上使用 smartmontools 和 CrystalDiskInfo 时,情况确实如此。
我从 1TB 970 EVO Plus 读取了数据。时间戳 1674634233 处报告了 585 小时,时间戳 1675031309 处报告了 599 小时。乘以 8 可得出通电时间约为 200 天,这与我安装时的情况相符。