I/O 期间 CPU 负载高

I/O 期间 CPU 负载高

我在一台基于 AMD E-350 的机器上使用 Ubuntu Server 12.04 LTS。在 I/O 负载较重时(例如通过 NFS 或 HTTP 传输文件、提取档案、进行备份等),CPU 负载会变得非常疯狂。我看到这台双核机器的平均负载远高于 8……而且响应速度要慢得多。

我认为问题肯定出在内核方面,但请您自己看一下:

$ sudo hdparm -I /dev/sdb

/dev/sdb:

ATA device, with non-removable media
    Model Number:       SAMSUNG HD501LJ                             
    Firmware Revision:  CR100-10
    Transport:          Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5
Standards:
    Used: ATA-8-ACS revision 3b 
    Supported: 8 7 6 5 
Configuration:
    Logical        max    current
    cylinders    16383    16383
    heads        16    16
    sectors/track    63    63
    --
    CHS current addressable sectors:   16514064
    LBA    user addressable sectors:  268435455
    LBA48  user addressable sectors:  976771055
    Logical/Physical Sector size:           512 bytes
    device size with M = 1024*1024:      476938 MBytes
    device size with M = 1000*1000:      500106 MBytes (500 GB)
    cache/buffer size  = 16384 KBytes (type=DualPortCache)
Capabilities:
    LBA, IORDY(can be disabled)
    Queue depth: 32
    Standby timer values: spec'd by Standard, no device specific minimum
    R/W multiple sector transfer: Max = 16    Current = 16
    Recommended acoustic management value: 254, current value: 128
    DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6 udma7 
         Cycle time: min=120ns recommended=120ns
    PIO: pio0 pio1 pio2 pio3 pio4 
         Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
    Enabled    Supported:
       *    SMART feature set
            Security Mode feature set
       *    Power Management feature set
       *    Write cache
       *    Look-ahead
       *    Host Protected Area feature set
       *    WRITE_BUFFER command
       *    READ_BUFFER command
       *    NOP cmd
       *    DOWNLOAD_MICROCODE
            SET_MAX security extension
       *    Automatic Acoustic Management feature set
       *    48-bit Address feature set
       *    Device Configuration Overlay feature set
       *    Mandatory FLUSH_CACHE
       *    FLUSH_CACHE_EXT
       *    SMART error logging
       *    SMART self-test
       *    General Purpose Logging feature set
       *    64-bit World wide name
       *    Segmented DOWNLOAD_MICROCODE
       *    Gen1 signaling speed (1.5Gb/s)
       *    Gen2 signaling speed (3.0Gb/s)
       *    Native Command Queueing (NCQ)
       *    Host-initiated interface power management
       *    Phy event counters
       *    DMA Setup Auto-Activate optimization
            Device-initiated interface power management
       *    Software settings preservation
       *    SMART Command Transport (SCT) feature set
       *    SCT Long Sector Access (AC1)
       *    SCT LBA Segment Access (AC2)
       *    SCT Error Recovery Control (AC3)
       *    SCT Features Control (AC4)
       *    SCT Data Tables (AC5)
Security: 
    Master password revision code = 65534
        supported
    not    enabled
    not    locked
        frozen
    not    expired: security count
        supported: enhanced erase
    168min for SECURITY ERASE UNIT. 168min for ENHANCED SECURITY ERASE UNIT.
Logical Unit WWN Device Identifier: 50000f001b301090
    NAA        : 5
    IEEE OUI    : 0000f0
    Unique ID    : 01b301090
Checksum: correct

$ iostat 1 # 下面一片

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda               0.00         0.00         0.00          0          0
sdb             355.00     60544.00         0.00      60544          0

$ vmstat 1

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 1  9      0 152864     12 3302740    0    0 61952     0 18115 1999  1 24 12 63
 0  8      0 153316     12 3302060    0    0 59648     0 20060 2393  1 33  9 57
 0 10      0 153432     12 3302060    0    0 54784     0 18430 2205  1 24 11 65
 1  8      0 154848     12 3301216    0    0 59392     0 19011 2291  1 31  8 60
 0  9      0 149676     12 3306324    0    0 59392     0 21149 2417  2 29  6 64
 0  9      0 150460     12 3305268    0    0 61952     0 18664 2117  1 28 11 60
 1  8      0 152084     12 3304028    0    0 59392     0 20045 2245  2 31  6 62
 1  8      0 152548     12 3303452    0    0 60160     0 20105 2426  2 29  9 60

我能做些什么呢?

答案1

这是预期的行为。您需要检查负载和 CPU 使用率之间的差异(例如使用top)。很有可能 CPU用法非常低,而 CPU加载很高。这是由 CPU 负载指示的,大多数情况下是完全无害的。

uptime手册页中:

   System  load  averages is the average number of processes that are either in a runnable or uninterrupt‐
   able state.  A process in a runnable state is either using the CPU  or  waiting  to  use  the  CPU.   A
   process in uninterruptable state is waiting for some I/O access, eg waiting for disk.  The averages are
   taken over the three time intervals.  Load averages are not normalized for the number of CPUs in a sys‐
   tem, so a load average of 1 means a single CPU system is loaded all the time while on a 4 CPU system it
   means it was idle 75% of the time.

换句话说,它是等待服务的平均进程数。但由于所有这些进程都在等待磁盘数据,因此当安排大量磁盘 I/O 时,这个数字可能会很大。

解决方案:不要担心或购买更快的磁盘(或合适的 RAID、SAN 等)。

我个人喜欢dstat解决这些问题。

答案2

在大量磁盘活动期间负载较高是很正常的,您应该检查“top”和“iowait”或者在这种情况下“0.7%wa”是iowait时间。我怀疑你的负载会很高。

Cpu0  : 17.4%us,  3.0%sy,  0.0%ni, 78.9%id,  0.7%wa,  0.0%hi,  0.0%si,  0.0%st

在您的示例中,我不确定这些测量值的单位是什么,但如果 wa 是 %,那么它就相当高。

问题 我发现你的三星磁盘在基准测试中运行速度特别慢:http://usb.userbenchmark.com/SpeedTest/4007/SAMSUNG-HD501LJ 您看到的这种糟糕的性能肯定是基准测试的结果。事实上,我从未见过速度更慢的 7200 RPM 驱动器!

解决方案: 1.) 用性能更高的型号替换您的磁盘(强烈推荐任何较新的东芝 SATA3 7200 3.5 英寸型号)。

例如,我发现在 mdadm RAID 10 中使用 2x2TB Toshiba 7200 RPM 时我的 IO 速度约为 389-400MB/s(实际上是单个驱动器速度的两倍)。

相关内容