我有一个系统,其中两个 2TByte SATA 磁盘配置为 Raid1 阵列。
有时 CPU 等待 I/O 的时间超过 20%(输出自sar
),例如
09:25:01 CPU %user %nice %system %iowait %steal %idle
09:35:01 all 57,65 0,00 6,53 25,54 0,05 10,23
15:45:01 all 0,90 0,00 1,47 54,90 0,06 42,68
15:55:04 all 1,74 0,00 1,58 88,52 0,10 8,06
16:25:03 all 0,59 0,00 0,38 24,14 0,05 74,84
23:45:05 all 2,45 0,00 1,43 31,56 0,05 64,50
我收集了其他信息atop
,结果表明其中一个 raid 磁盘上的磁盘 I/O 已达到上限(磁盘 sda,繁忙度达到 90%),例如:
MDD | md1 | busy 0% | | read 10174 | write 425 | | KiB/r 6 | KiB/w 7 | MBr/s 1.2 | | MBw/s 0.1 | avq 0.00 | | avio 0.00 ms |
DSK | sda | busy 90% | | read 9091 | write 507 | | KiB/r 6 | KiB/w 7 | MBr/s 0.9 | | MBw/s 0.1 | avq 1.14 | | avio 5.65 ms |
DSK | sdb | busy 18% | | read 1082 | write 507 | | KiB/r 11 | KiB/w 7 | MBr/s 0.2 | | MBw/s 0.1 | avq 1.39 | | avio 6.82 ms |
手册页指出atop
:
此行显示名称(例如,逻辑卷的 VolGroup00-lvtmp 或硬盘的 sda)、繁忙百分比,即设备忙于处理请求的时间部分(busy)、发出的读取请求数(read)、发出的写入请求数(write)、每次读取的 KiByte 数(KiB/r)、每次写入的 KiByte 数(KiB/w)、每秒读取的 MiByte 数(MBr/s)、每秒写入的 MiByte 数(MBw/s)、平均队列深度(avq)以及请求(avio)在寻道、延迟和数据传输方面所需的平均毫秒数。
对于 raid1,可以从两个磁盘并行读取信息,但根据md
手册页,解释第二个磁盘未完全使用的事实
查看 sda 的 MBr/s 和 MBw/s 条目,看起来磁盘有 90% 处于繁忙状态
0.9 + 0.1 MiBytes/秒 = 1 MiBytes/秒 = 8 MiBit/秒
但是,那当前磁盘的预期速率大约是 1000 Mbit/s,大约高 100 倍(忽略从 MiBit 到 Mbit 的转换)。
磁盘是(输出hdparm -I /dev/sda
)
/dev/sda:
ATA device, with non-removable media
Model Number: TOSHIBA DT01ACA200
Serial Number: 54A8UH4GS
Firmware Revision: MX4OABB0
Transport: Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0; Revision: ATA8-AST T13 Project D1697 Revision 0b
Standards:
Used: unknown (minor revision code 0x0029)
Supported: 8 7 6 5
Likely used: 8
Configuration:
Logical max current
cylinders 16383 16383
heads 16 16
sectors/track 63 63
--
CHS current addressable sectors: 16514064
LBA user addressable sectors: 268435455
LBA48 user addressable sectors: 3907029168
Logical Sector size: 512 bytes
Physical Sector size: 4096 bytes
Logical Sector-0 offset: 0 bytes
device size with M = 1024*1024: 1907729 MBytes
device size with M = 1000*1000: 2000398 MBytes (2000 GB)
cache/buffer size = unknown
Form Factor: 3.5 inch
Nominal Media Rotation Rate: 7200
Capabilities:
LBA, IORDY(can be disabled)
Queue depth: 32
Standby timer values: spec'd by Standard, no device specific minimum
R/W multiple sector transfer: Max = 16 Current = 16
Advanced power management level: disabled
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=120ns IORDY flow control=120ns
Commands/features:
Enabled Supported:
* SMART feature set
Security Mode feature set
* Power Management feature set
* Write cache
* Look-ahead
* Host Protected Area feature set
* WRITE_BUFFER command
* READ_BUFFER command
* NOP cmd
* DOWNLOAD_MICROCODE
Advanced Power Management feature set
Power-Up In Standby feature set
* SET_FEATURES required to spinup after power up
SET_MAX security extension
* 48-bit Address feature set
* Device Configuration Overlay feature set
* Mandatory FLUSH_CACHE
* FLUSH_CACHE_EXT
* SMART error logging
* SMART self-test
Media Card Pass-Through
* General Purpose Logging feature set
* WRITE_{DMA|MULTIPLE}_FUA_EXT
* 64-bit World wide name
* URG for READ_STREAM[_DMA]_EXT
* URG for WRITE_STREAM[_DMA]_EXT
* WRITE_UNCORRECTABLE_EXT command
* {READ,WRITE}_DMA_EXT_GPL commands
* Segmented DOWNLOAD_MICROCODE
* unknown 119[7]
* Gen1 signaling speed (1.5Gb/s)
* Gen2 signaling speed (3.0Gb/s)
* Gen3 signaling speed (6.0Gb/s)
* Native Command Queueing (NCQ)
* Host-initiated interface power management
* Phy event counters
* NCQ priority information
Non-Zero buffer offsets in DMA Setup FIS
* DMA Setup Auto-Activate optimization
Device-initiated interface power management
In-order data delivery
* Software settings preservation
* SMART Command Transport (SCT) feature set
* SCT Write Same (AC2)
* SCT Error Recovery Control (AC3)
* SCT Features Control (AC4)
* SCT Data Tables (AC5)
Security:
Master password revision code = 65534
supported
not enabled
not locked
frozen
not expired: security count
not supported: enhanced erase
320min for SECURITY ERASE UNIT.
Logical Unit WWN Device Identifier: 5000039ffac402a6
NAA : 5
IEEE OUI : 000039
Unique ID : ffac402a6
Checksum: correct
输出或手册页是否atop
有误,或者硬盘性能与预期值相比是否严重不佳,还是我这边存在误解?
或者更广泛的问题是:我的系统真的受到磁盘 I/O 容量的限制吗?
答案1
预期顺序的目前磁盘的速率约为 1000 Mbit/s,但这不会改变任何东西随机的IO。
7200 RPM 磁盘将执行大约 120 个随机 IOPS。因此,在最坏的情况下,您始终只写入 1 个字节,最终吞吐量将达到 120 字节/秒。
是的,这意味着最佳情况(仅顺序)和最坏情况之间大约有 6 个小数数量级。您的 1KByte/s 结果介于两者之间,但更接近最坏情况。
使用 SSD 会给你带来很多好处:即使是小型 SSD,当用作bcache
或时也LV cache
可以将你的随机 IOPS(尤其是那些痛苦的小写入)提高几个数量级。