系统真的受到磁盘 IO 的限制吗?顶部速率远低于预期

系统真的受到磁盘 IO 的限制吗?顶部速率远低于预期

我有一个系统,其中两个 2TByte SATA 磁盘配置为 Raid1 阵列。

有时 CPU 等待 I/O 的时间超过 20%(输出自sar),例如

09:25:01        CPU     %user     %nice   %system   %iowait    %steal     %idle
09:35:01        all     57,65      0,00      6,53     25,54      0,05     10,23
15:45:01        all      0,90      0,00      1,47     54,90      0,06     42,68
15:55:04        all      1,74      0,00      1,58     88,52      0,10      8,06
16:25:03        all      0,59      0,00      0,38     24,14      0,05     74,84
23:45:05        all      2,45      0,00      1,43     31,56      0,05     64,50

我收集了其他信息atop,结果表明其中一个 raid 磁盘上的磁盘 I/O 已达到上限(磁盘 sda,繁忙度达到 90%),例如:

MDD | md1 | busy 0% | | read 10174 | write 425 | | KiB/r 6 | KiB/w 7 | MBr/s 1.2 | | MBw/s 0.1 | avq 0.00 | | avio 0.00 ms |
DSK | sda | busy 90% | | read 9091 | write 507 | | KiB/r 6 | KiB/w 7 | MBr/s 0.9 | | MBw/s 0.1 | avq 1.14 | | avio 5.65 ms |
DSK | sdb | busy 18% | | read 1082 | write 507 | | KiB/r 11 | KiB/w 7 | MBr/s 0.2 | | MBw/s 0.1 | avq 1.39 | | avio 6.82 ms |

手册页指出atop

此行显示名称(例如,逻辑卷的 VolGroup00-lvtmp 或硬盘的 sda)、繁忙百分比,即设备忙于处理请求的时间部分(busy)、发出的读取请求数(read)、发出的写入请求数(write)、每次读取的 KiByte 数(KiB/r)、每次写入的 KiByte 数(KiB/w)、每秒读取的 MiByte 数(MBr/s)、每秒写入的 MiByte 数(MBw/s)、平均队列深度(avq)以及请求(avio)在寻道、延迟和数据传输方面所需的平均毫秒数。

对于 raid1,可以从两个磁盘并行读取信息,但根据md 手册页,解释第二个磁盘未完全使用的事实

查看 sda 的 MBr/s 和 MBw/s 条目,看起来磁盘有 90% 处于繁忙状态

0.9 + 0.1 MiBytes/秒 = 1 MiBytes/秒 = 8 MiBit/秒

但是,那当前磁盘的预期速率大约是 1000 Mbit/s,大约高 100 倍(忽略从 MiBit 到 Mbit 的转换)。

磁盘是(输出hdparm -I /dev/sda

/dev/sda:

ATA device, with non-removable media
Model Number: TOSHIBA DT01ACA200
Serial Number: 54A8UH4GS
Firmware Revision: MX4OABB0
Transport: Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0; Revision: ATA8-AST T13 Project D1697 Revision 0b
Standards:
Used: unknown (minor revision code 0x0029)
Supported: 8 7 6 5
Likely used: 8
Configuration:
Logical max current
cylinders 16383 16383
heads 16 16
sectors/track 63 63
--
CHS current addressable sectors: 16514064
LBA user addressable sectors: 268435455
LBA48 user addressable sectors: 3907029168
Logical Sector size: 512 bytes
Physical Sector size: 4096 bytes
Logical Sector-0 offset: 0 bytes
device size with M = 1024*1024: 1907729 MBytes
device size with M = 1000*1000: 2000398 MBytes (2000 GB)
cache/buffer size = unknown
Form Factor: 3.5 inch
Nominal Media Rotation Rate: 7200
Capabilities:
LBA, IORDY(can be disabled)
Queue depth: 32
Standby timer values: spec'd by Standard, no device specific minimum
R/W multiple sector transfer: Max = 16 Current = 16
Advanced power management level: disabled
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=120ns IORDY flow control=120ns
Commands/features:
Enabled Supported:
* SMART feature set
Security Mode feature set
* Power Management feature set
* Write cache
* Look-ahead
* Host Protected Area feature set
* WRITE_BUFFER command
* READ_BUFFER command
* NOP cmd
* DOWNLOAD_MICROCODE
Advanced Power Management feature set
Power-Up In Standby feature set
* SET_FEATURES required to spinup after power up
SET_MAX security extension
* 48-bit Address feature set
* Device Configuration Overlay feature set
* Mandatory FLUSH_CACHE
* FLUSH_CACHE_EXT
* SMART error logging
* SMART self-test
Media Card Pass-Through
* General Purpose Logging feature set
* WRITE_{DMA|MULTIPLE}_FUA_EXT
* 64-bit World wide name
* URG for READ_STREAM[_DMA]_EXT
* URG for WRITE_STREAM[_DMA]_EXT
* WRITE_UNCORRECTABLE_EXT command
* {READ,WRITE}_DMA_EXT_GPL commands
* Segmented DOWNLOAD_MICROCODE
* unknown 119[7]
* Gen1 signaling speed (1.5Gb/s)
* Gen2 signaling speed (3.0Gb/s)
* Gen3 signaling speed (6.0Gb/s)
* Native Command Queueing (NCQ)
* Host-initiated interface power management
* Phy event counters
* NCQ priority information
Non-Zero buffer offsets in DMA Setup FIS
* DMA Setup Auto-Activate optimization
Device-initiated interface power management
In-order data delivery
* Software settings preservation
* SMART Command Transport (SCT) feature set
* SCT Write Same (AC2)
* SCT Error Recovery Control (AC3)
* SCT Features Control (AC4)
* SCT Data Tables (AC5)
Security:
Master password revision code = 65534
supported
not enabled
not locked
frozen
not expired: security count
not supported: enhanced erase
320min for SECURITY ERASE UNIT.
Logical Unit WWN Device Identifier: 5000039ffac402a6
NAA : 5
IEEE OUI : 000039
Unique ID : ffac402a6
Checksum: correct

输出或手册页是否atop有误,或者硬盘性能与预期值相比是否严重不佳,还是我这边存在误解?

或者更广泛的问题是:我的系统真的受到磁盘 I/O 容量的限制吗?

答案1

预期顺序的目前磁盘的速率约为 1000 Mbit/s,但这不会改变任何东西随机的IO。

7200 RPM 磁盘将执行大约 120 个随机 IOPS。因此,在最坏的情况下,您始终只写入 1 个字节,最终吞吐量将达到 120 字节/秒。

是的,这意味着最佳情况(仅顺序)和最坏情况之间大约有 6 个小数数量级。您的 1KByte/s 结果介于两者之间,但更接近最坏情况。

使用 SSD 会给你带来很多好处:即使是小型 SSD,当用作bcache或时也LV cache可以将你的随机 IOPS(尤其是那些痛苦的小写入)提高几个数量级。

相关内容