诊断 HDD 性能

2024-8-1 • tag-icon

我有一个大约一年前新买的 2.5 英寸硬盘，连接到 Raspberry Pi 4 用作 NAS。

HDD（WD Blue WD10SPZX，1TB，5400rpm，SATA 3）安装在 USB 2.0 铝制外壳中（我认为它可能有助于冷却）。本地网络全部采用有线连接（笔记本电脑除外），使用千兆 CAT6 电缆。NAS 吞吐量峰值约为 30MB/秒，无论是读取还是写入。

整个系统安装在一个主动冷却的盒子里，我认为气流足以提供足够的冷却，因为 PI CPU 在空闲时通常在 50°C 左右振荡，在 100% 使用时在 68-70°C 左右振荡。

自三月份以来，这块特定的 HDD 似乎出现了性能问题，我可以这样描述：当将大量数据复制到 HDD 时，大约 15-20 GB 之后，传输速率就会下降到 3-4MB/s 左右。

这样做还有其他副作用，因为在传输过程中无法通过 SSH 进入 Pi，并且已经与 Pi 建立的连接将被断开。

我的发现：

分区已对齐。

它并没有用尽 inode：

$ df -i .
Filesystem       Inodes IUsed    IFree IUse% Mounted on
/dev/sda3      59809792 13848 59795944    1% /mnt/media

smartctl似乎表明一切都井然有序（完整输出在这里)。或者我不知道如何读取输出。部分输出如下：


ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   194   193   021    Pre-fail  Always       -       1283
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       28
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   091   091   000    Old_age   Always       -       7161
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       21
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       18
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       24
194 Temperature_Celsius     0x0022   095   093   000    Old_age   Always       -       48
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0

hdparm似乎也表明一切井然有序：

$ sudo hdparm -Tt /dev/sda3
/dev/sda3:
 Timing cached reads:   1708 MB in  2.00 seconds = 854.45 MB/sec
 Timing buffered disk reads:  92 MB in  3.04 seconds =  30.25 MB/sec

仅在进行写入测试时才会出现问题：

$ dd if=/dev/zero of=out.bin bs=1G count=5
5+0 records in
5+0 records out
5368709120 bytes (5.4 GB, 5.0 GiB) copied, 168.331 s, 31.9 MB/s

$ dd if=/dev/zero of=out.bin bs=1G count=15
15+0 records in
15+0 records out
16106127360 bytes (16 GB, 15 GiB) copied, 514.209 s, 31.3 MB/s

$ dd if=/dev/zero of=out.bin bs=1G count=40
40+0 records in
40+0 records out
42949672960 bytes (43 GB, 40 GiB) copied, 12551.7 s, 3.4 MB/s

三个测试均按顺序进行，共写入约 60GB 的数据。

总结一下：

Data |  Time (s) | Time (h) | Write average
-----|-----------|----------|--------------
 5GB |   2.8 min |          |     31.8 MB/s
15GB |   8.5 min |          |     31.3 MB/s
40GB | 209.2 min |    3.5h  |      3.4 MB/s

在这些写入测试中我没有实时监控实际的写入速度，所以我无法说在最后的 40GB 测试中，整个测试是否以 3.4 MB/s 运行，或者测试是否从 30 MB/s 开始然后降至 3.4 MB/s 以下以达到该平均速度。

由于这些测试直接从读取数据/dev/zero，因此排除了由网络引起的任何问题。

我认为我唯一能做的就是将驱动器从铝制外壳中取出并再次检查性能 - 假设性能因过热而下降。

我的问题是，我还能做些什么来找出明确的原因。

此外，由于驱动器仍在保修期内，我正在考虑退货并更换它，但我不确定退货的原因是什么以及如何证明其性能不佳。

答案1

是的你可以。

碎片整理可能是一个问题。尝试在另一个新格式化的驱动器上触发错误行为。

从另一台 Linux 机器运行该过程以排除 Pi 是罪魁祸首。

使用不带合金外壳的驱动器来排除/验证热问题。

我的发现：

答案1

相关内容