我在一台基于 AMD E-350 的机器上使用 Ubuntu Server 12.04 LTS。在 I/O 负载较重时(例如通过 NFS 或 HTTP 传输文件、提取档案、进行备份等),CPU 负载会变得非常疯狂。我看到这台双核机器的平均负载远高于 8……而且响应速度要慢得多。
我认为问题肯定出在内核方面,但请您自己看一下:
$ sudo hdparm -I /dev/sdb
/dev/sdb:
ATA device, with non-removable media
Model Number: SAMSUNG HD501LJ
Firmware Revision: CR100-10
Transport: Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5
Standards:
Used: ATA-8-ACS revision 3b
Supported: 8 7 6 5
Configuration:
Logical max current
cylinders 16383 16383
heads 16 16
sectors/track 63 63
--
CHS current addressable sectors: 16514064
LBA user addressable sectors: 268435455
LBA48 user addressable sectors: 976771055
Logical/Physical Sector size: 512 bytes
device size with M = 1024*1024: 476938 MBytes
device size with M = 1000*1000: 500106 MBytes (500 GB)
cache/buffer size = 16384 KBytes (type=DualPortCache)
Capabilities:
LBA, IORDY(can be disabled)
Queue depth: 32
Standby timer values: spec'd by Standard, no device specific minimum
R/W multiple sector transfer: Max = 16 Current = 16
Recommended acoustic management value: 254, current value: 128
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6 udma7
Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=120ns IORDY flow control=120ns
Commands/features:
Enabled Supported:
* SMART feature set
Security Mode feature set
* Power Management feature set
* Write cache
* Look-ahead
* Host Protected Area feature set
* WRITE_BUFFER command
* READ_BUFFER command
* NOP cmd
* DOWNLOAD_MICROCODE
SET_MAX security extension
* Automatic Acoustic Management feature set
* 48-bit Address feature set
* Device Configuration Overlay feature set
* Mandatory FLUSH_CACHE
* FLUSH_CACHE_EXT
* SMART error logging
* SMART self-test
* General Purpose Logging feature set
* 64-bit World wide name
* Segmented DOWNLOAD_MICROCODE
* Gen1 signaling speed (1.5Gb/s)
* Gen2 signaling speed (3.0Gb/s)
* Native Command Queueing (NCQ)
* Host-initiated interface power management
* Phy event counters
* DMA Setup Auto-Activate optimization
Device-initiated interface power management
* Software settings preservation
* SMART Command Transport (SCT) feature set
* SCT Long Sector Access (AC1)
* SCT LBA Segment Access (AC2)
* SCT Error Recovery Control (AC3)
* SCT Features Control (AC4)
* SCT Data Tables (AC5)
Security:
Master password revision code = 65534
supported
not enabled
not locked
frozen
not expired: security count
supported: enhanced erase
168min for SECURITY ERASE UNIT. 168min for ENHANCED SECURITY ERASE UNIT.
Logical Unit WWN Device Identifier: 50000f001b301090
NAA : 5
IEEE OUI : 0000f0
Unique ID : 01b301090
Checksum: correct
$ iostat 1 # 下面一片
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 355.00 60544.00 0.00 60544 0
$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
1 9 0 152864 12 3302740 0 0 61952 0 18115 1999 1 24 12 63
0 8 0 153316 12 3302060 0 0 59648 0 20060 2393 1 33 9 57
0 10 0 153432 12 3302060 0 0 54784 0 18430 2205 1 24 11 65
1 8 0 154848 12 3301216 0 0 59392 0 19011 2291 1 31 8 60
0 9 0 149676 12 3306324 0 0 59392 0 21149 2417 2 29 6 64
0 9 0 150460 12 3305268 0 0 61952 0 18664 2117 1 28 11 60
1 8 0 152084 12 3304028 0 0 59392 0 20045 2245 2 31 6 62
1 8 0 152548 12 3303452 0 0 60160 0 20105 2426 2 29 9 60
我能做些什么呢?
答案1
这是预期的行为。您需要检查负载和 CPU 使用率之间的差异(例如使用top
)。很有可能 CPU用法非常低,而 CPU加载很高。这是由 CPU 负载指示的,大多数情况下是完全无害的。
从uptime
手册页中:
System load averages is the average number of processes that are either in a runnable or uninterrupt‐
able state. A process in a runnable state is either using the CPU or waiting to use the CPU. A
process in uninterruptable state is waiting for some I/O access, eg waiting for disk. The averages are
taken over the three time intervals. Load averages are not normalized for the number of CPUs in a sys‐
tem, so a load average of 1 means a single CPU system is loaded all the time while on a 4 CPU system it
means it was idle 75% of the time.
换句话说,它是等待服务的平均进程数。但由于所有这些进程都在等待磁盘数据,因此当安排大量磁盘 I/O 时,这个数字可能会很大。
解决方案:不要担心或购买更快的磁盘(或合适的 RAID、SAN 等)。
我个人喜欢dstat
解决这些问题。
答案2
在大量磁盘活动期间负载较高是很正常的,您应该检查“top”和“iowait”或者在这种情况下“0.7%wa”是iowait时间。我怀疑你的负载会很高。
Cpu0 : 17.4%us, 3.0%sy, 0.0%ni, 78.9%id, 0.7%wa, 0.0%hi, 0.0%si, 0.0%st
在您的示例中,我不确定这些测量值的单位是什么,但如果 wa 是 %,那么它就相当高。
问题 我发现你的三星磁盘在基准测试中运行速度特别慢:http://usb.userbenchmark.com/SpeedTest/4007/SAMSUNG-HD501LJ 您看到的这种糟糕的性能肯定是基准测试的结果。事实上,我从未见过速度更慢的 7200 RPM 驱动器!
解决方案: 1.) 用性能更高的型号替换您的磁盘(强烈推荐任何较新的东芝 SATA3 7200 3.5 英寸型号)。
例如,我发现在 mdadm RAID 10 中使用 2x2TB Toshiba 7200 RPM 时我的 IO 速度约为 389-400MB/s(实际上是单个驱动器速度的两倍)。