由于未知原因导致 iowait 较高

由于未知原因导致 iowait 较高

我很难找出导致我的服务器 iowait 过高的原因。

这是iostat -xm 5 5

Linux 2.6.32-358.6.1.el6.x86_64 (prod-1.localdomain)    09/28/2013  _x86_64_    (16 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           6.98    0.05    3.72    3.54    0.00   85.71

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               0.08   120.88   30.27    1.72     0.96     0.48    92.20     0.34   10.67   3.79  12.13
sda               7.63    37.19    8.96    4.89     0.35     0.16    76.40     0.16   11.63   2.19   3.04

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           5.41    0.00    6.20   37.65    0.00   50.74

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               0.00   109.80  186.20    1.40     3.75     0.43    45.66    98.21  519.80   5.33 100.00
sda              33.20     3.40   18.00    2.00     0.37     0.02    40.32     0.07    3.41   3.17   6.34

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           5.55    0.00    7.42   30.06    0.00   56.97

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               0.00     0.00  196.00    0.00     3.91     0.00    40.85   100.41  506.01   5.10 100.00
sda               0.00     2.40    1.80    2.60     0.05     0.02    30.91     0.01    2.95   2.73   1.20

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           5.71    0.00    7.04   31.76    0.00   55.49

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               0.00   100.00  189.00    1.20     3.72     0.40    44.33    95.32  514.88   5.26 100.00
sda              33.20     4.20   19.20    5.20     0.39     0.04    35.80     0.02    1.01   0.79   1.92

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          61.93    0.00   10.08   14.99    0.00   12.99

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               0.00     0.40  185.40    1.40     3.76     0.01    41.31    83.22  431.16   5.28  98.62
sda              33.20     5.40    9.60    4.00     0.21     0.04    37.65     0.02    1.24   1.04   1.42

如您所见,除 await 和 %util 非常高之外,所有指标均正常。所以我认为 /dev/sdb 可能有问题。

smartctl没有报告任何有用的信息。

smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-358.6.1.el6.x86_64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital RE4 Serial ATA
Device Model:     WDC WD2003FYYS-02W0B1
Serial Number:    WD-WMAY04093732
LU WWN Device Id: 5 0014ee 05877b196
Firmware Version: 01.01D02
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Sat Sep 28 09:05:30 2013 ICT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
                    was suspended by an interrupting command from host.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                    without error or no self-test has ever 
                    been run.
Total time to complete Offline 
data collection:        (29160) seconds.
Offline data collection
capabilities:            (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:    (   2) minutes.
Extended self-test routine
recommended polling time:    ( 283) minutes.
Conveyance self-test routine
recommended polling time:    (   5) minutes.
SCT capabilities:          (0x303f) SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   253   253   021    Pre-fail  Always       -       9100
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       42
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   090   090   000    Old_age   Always       -       7373
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       40
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       31
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       10
194 Temperature_Celsius     0x0022   123   107   000    Old_age   Always       -       29
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

我被困在这里,不知道下一步该如何解决这个问题。

任何帮助将不胜感激!

更新:

@MichaelHampton

我的自测日志,没有感兴趣的信息。smartctl -l selftest /dev/sdb

smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-358.6.1.el6.x86_64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      7380       

@Mife

我的pidstat -d 1 30结果。

Linux 2.6.32-358.6.1.el6.x86_64 (cass-23_120.localdomain)   09/28/2013  _x86_64_    (16 CPU)

05:57:43 PM       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command

05:57:44 PM       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command

05:57:45 PM       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
05:57:46 PM      1555    736.00      0.00      0.00  java
05:57:46 PM     16698      0.00      4.00      0.00  java

05:57:46 PM       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
05:57:47 PM       552      0.00     68.00      0.00  jbd2/sda3-8
05:57:47 PM      1555    352.00      0.00      0.00  java
05:57:47 PM     16698      0.00     12.00      0.00  java
05:57:47 PM     18074      0.00      4.00      0.00  java
05:57:47 PM     19295   1564.00      0.00      0.00  java

05:57:47 PM       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
05:57:48 PM      1554   3128.00      8.00      4.00  xinetd
05:57:48 PM      1570    840.00      0.00      0.00  gmond
05:57:48 PM      2183      0.00      4.00      0.00  java
05:57:48 PM      2394     64.00      0.00      0.00  rsync
05:57:48 PM      2395    324.00      0.00      0.00  ssh
05:57:48 PM     13280     28.00      0.00      0.00  downloadm_new.s
05:57:48 PM     19295   1724.00      0.00      0.00  java

05:57:48 PM       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
05:57:49 PM     19295   1744.00      0.00      0.00  java

05:57:49 PM       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
05:57:50 PM      1083      0.00      8.00      0.00  flush-8:0
05:57:50 PM      1086      0.00      8.00      0.00  java
05:57:50 PM      2183      0.00     12.00      0.00  java
05:57:50 PM     13280    388.00      0.00      0.00  downloadm_new.s
05:57:50 PM     18074      0.00      4.00      0.00  java
05:57:50 PM     19295   1728.00      0.00      0.00  java

05:57:50 PM       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
05:57:51 PM      2183      0.00      4.00      0.00  java
05:57:51 PM      2400      8.00      0.00      0.00  sleep
05:57:51 PM     18074      0.00      4.00      0.00  java
05:57:51 PM     19295   1680.00      0.00      0.00  java

05:57:51 PM       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
05:57:52 PM       552      0.00     28.00      0.00  jbd2/sda3-8
05:57:52 PM      1112      0.00      4.00      0.00  jbd2/sda4-8
05:57:52 PM      2183      0.00      8.00      0.00  java
05:57:52 PM     16698      0.00      4.00      0.00  java
05:57:52 PM     18074      0.00      4.00      0.00  java
05:57:52 PM     19295   1672.00      0.00      0.00  java

05:57:52 PM       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
05:57:53 PM      1555    376.00     20.00      0.00  java
05:57:53 PM      1570    792.00      0.00      0.00  gmond
05:57:53 PM     19295   1568.00      8.00      0.00  java

05:57:53 PM       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
05:57:54 PM      3734    844.00    188.00      0.00  java
05:57:54 PM     19295   1672.00      0.00      0.00  java

05:57:54 PM       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
05:57:55 PM      1083      0.00     20.00      0.00  flush-8:0
05:57:55 PM      3734   2620.00   1156.00      0.00  java
05:57:55 PM      4327      0.00      8.00      0.00  java
05:57:55 PM      9677      0.00      8.00      0.00  java
05:57:55 PM     16613      0.00      8.00      0.00  java
05:57:55 PM     19295   1272.00      8.00      0.00  java
05:57:55 PM     19426      0.00      8.00      0.00  java

05:57:55 PM       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
05:57:56 PM      3734   3592.00   1200.00      0.00  java
05:57:56 PM     19295    332.00      0.00      0.00  java
05:57:56 PM     19426      0.00      4.00      0.00  java

05:57:56 PM       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
05:57:57 PM       552      0.00     36.00      0.00  jbd2/sda3-8
05:57:57 PM      2405   1068.00     32.00      0.00  java
05:57:57 PM      3734   2972.00    828.00      0.00  java
05:57:57 PM      5457      0.00      8.00      0.00  java
05:57:57 PM      9677  28424.00    144.00     20.00  java
05:57:57 PM     16698      0.00      8.00      0.00  java
05:57:57 PM     18074      0.00      4.00      0.00  java
05:57:57 PM     19295      0.00      4.00      0.00  java

05:57:57 PM       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
05:57:58 PM      2183      0.00      4.00      0.00  java
05:57:58 PM      2222      0.00      4.00      0.00  pidstat
05:57:58 PM      2405    500.00      0.00      0.00  java
05:57:58 PM      3734   4016.00    720.00      0.00  java
05:57:58 PM      5457      0.00      8.00      0.00  java
05:57:58 PM     16698      0.00      4.00      0.00  java

05:57:58 PM       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
05:57:59 PM      1112      0.00      8.00      0.00  jbd2/sda4-8
05:57:59 PM      3734   4572.00    372.00      0.00  java

05:57:59 PM       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
05:58:00 PM      1083      0.00     32.00      0.00  flush-8:0
05:58:00 PM      2405    496.00      0.00      0.00  java
05:58:00 PM      3734   5412.00      4.00      0.00  java
05:58:00 PM      5457      0.00     16.00      0.00  java
05:58:00 PM     11681      0.00      8.00      0.00  java
05:58:00 PM     14824      0.00      8.00      0.00  java
05:58:00 PM     16698      0.00     12.00      0.00  java
05:58:00 PM     17694      0.00      8.00      0.00  java
05:58:00 PM     18074      0.00     12.00      0.00  java
05:58:00 PM     18129      0.00      8.00      0.00  java
05:58:00 PM     19542      0.00      8.00      0.00  java

05:58:00 PM       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
05:58:01 PM      3734   3888.00      0.00      0.00  java
05:58:01 PM      3813      8.00     12.00      0.00  java
05:58:01 PM     13280     28.00      0.00      0.00  downloadm_new.s
05:58:01 PM     18074      0.00      8.00      0.00  java

05:58:01 PM       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
05:58:02 PM       552      0.00     44.00      0.00  jbd2/sda3-8
05:58:02 PM      1129      0.00     16.00      0.00  jbd2/sdb1-8
05:58:02 PM      2405    256.00      0.00      0.00  java
05:58:02 PM      3734   1200.00   1128.00      0.00  java
05:58:02 PM     16698      0.00      4.00      0.00  java
05:58:02 PM     18074      0.00      8.00      0.00  java

05:58:02 PM       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
05:58:03 PM      1570   1172.00      0.00      0.00  gmond
05:58:03 PM      2183      0.00      4.00      0.00  java
05:58:03 PM      2405    256.00      0.00      0.00  java
05:58:03 PM      2442     92.00      0.00      0.00  rsync
05:58:03 PM      2443    916.00      0.00      0.00  ssh
05:58:03 PM      3734    576.00      0.00      0.00  java
05:58:03 PM      9677      0.00      4.00      0.00  java
05:58:03 PM     13280     56.00   8996.00      0.00  downloadm_new.s

05:58:03 PM       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
05:58:04 PM      2183      0.00      4.00      0.00  java
05:58:04 PM      2405    256.00      0.00      0.00  java
05:58:04 PM      2443      8.00      0.00      0.00  ssh
05:58:04 PM      3734   2032.00     16.00      0.00  java

05:58:04 PM       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
05:58:05 PM      1083      0.00      4.00      0.00  flush-8:0
05:58:05 PM      2405    224.00      0.00      0.00  java
05:58:05 PM      2446    160.00      0.00      0.00  sleep
05:58:05 PM      3734   5344.00    648.00      0.00  java
05:58:05 PM      3813      0.00      8.00      0.00  java
05:58:05 PM     13280   1016.00      0.00      0.00  downloadm_new.s

05:58:05 PM       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
05:58:06 PM      2405     16.00      0.00      0.00  java
05:58:06 PM      3734   6196.00    344.00      0.00  java

05:58:06 PM       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
05:58:07 PM       552      0.00      8.00      0.00  jbd2/sda3-8
05:58:07 PM      2405    112.00      0.00      0.00  java
05:58:07 PM      3734   3532.00      0.00      0.00  java
05:58:07 PM     16698      0.00      4.00      0.00  java
05:58:07 PM     18074      0.00      4.00      0.00  java

05:58:07 PM       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
05:58:08 PM      1570   1172.00      0.00      0.00  gmond
05:58:08 PM      2183      0.00      4.00      0.00  java
05:58:08 PM      2405    352.00      0.00      0.00  java
05:58:08 PM      3734   4588.00      0.00      0.00  java
05:58:08 PM     16698      0.00      8.00      0.00  java

05:58:08 PM       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
05:58:09 PM      2222      0.00      4.00      0.00  pidstat
05:58:09 PM      2405    368.00      0.00      0.00  java
05:58:09 PM      3734   1720.00      0.00      0.00  java
05:58:09 PM     16698      0.00      4.00      0.00  java

05:58:09 PM       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
05:58:10 PM      1083      0.00      8.00      0.00  flush-8:0
05:58:10 PM      2405    480.00      0.00      0.00  java
05:58:10 PM      3734     40.00     16.00      0.00  java
05:58:10 PM     17768      0.00      8.00      0.00  java

05:58:10 PM       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
05:58:11 PM      2405    608.00      0.00      0.00  java
05:58:11 PM      3734    264.00      0.00      0.00  java
05:58:11 PM     19426      0.00      4.00      0.00  java

05:58:11 PM       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
05:58:12 PM      1129      0.00     24.00      0.00  jbd2/sdb1-8
05:58:12 PM      2405    240.00      0.00      0.00  java
05:58:12 PM     18074      0.00      8.00      0.00  java

05:58:12 PM       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
05:58:13 PM      1570   1172.00      0.00      0.00  gmond
05:58:13 PM      2183      0.00      4.00      0.00  java
05:58:13 PM      2405    128.00      0.00      0.00  java
05:58:13 PM     18074      0.00      4.00      0.00  java

Average:          PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
Average:          552      0.00      6.13      0.00  jbd2/sda3-8
Average:         1083      0.00      2.40      0.00  flush-8:0
Average:         1086      0.00      0.27      0.00  java
Average:         1112      0.00      0.40      0.00  jbd2/sda4-8
Average:         1129      0.00      1.33      0.00  jbd2/sdb1-8
Average:         1554    104.16      0.27      0.13  xinetd
Average:         1570    171.43      0.00      0.00  gmond
Average:         2183      0.00      1.60      0.00  java
Average:         2222      0.00      0.27      0.00  pidstat
Average:         2405    178.49      1.07      0.00  java
Average:         2446      5.33      0.00      0.00  sleep
Average:         3734   1778.49    220.45      0.00  java
Average:         3813      0.27      0.67      0.00  java
Average:         4327      0.00      0.27      0.00  java
Average:         5457      0.00      1.07      0.00  java
Average:         9677    946.52      5.19      0.67  java
Average:        11681      0.00      0.27      0.00  java
Average:        13280     50.48    299.57      0.00  downloadm_new.s
Average:        14824      0.00      0.27      0.00  java
Average:        16613      0.00      0.27      0.00  java
Average:        16698      0.00      2.13      0.00  java
Average:        17694      0.00      0.27      0.00  java
Average:        17768      0.00      0.27      0.00  java
Average:        18074      0.00      2.13      0.00  java
Average:        18129      0.00      0.27      0.00  java
Average:        19295    498.04      0.67      0.00  java
Average:        19426      0.00      0.53      0.00  java
Average:        19542      0.00      0.27      0.00  java

@kworr

这是我的安装选项/dev/sdb1

% mount | grep sdb
/dev/sdb1 on /backup type ext4 (rw,noatime,commit=100)

更新2 您的硬盘预期有多少 IOPS。

7,200 rpm SATA drives   HDD     ~75-100 IOPS[2]     SATA 3 Gb/s     
10,000 rpm SATA drives  HDD     ~125-150 IOPS[2]    SATA 3 Gbit/s   
10,000 rpm SAS drives   HDD     ~140 IOPS[2]            SAS     
15,000 rpm SAS drives   HDD     ~175-210 IOPS[2]    SAS     

答案1

这里发生了很多事情,但 pid 3734,一个 java 进程似乎是罪魁祸首。你应该找出它在做什么,传递给它的参数是什么,它的父 pid 是什么,以及它是什么意味着去做。

在 1 秒的 30 秒时间内,样本 java 使用 1778.49 读取 kb/秒,还有其他 java 进程,pid 9677 和 19295 分别使用 946.52 和 498.04 读取 kb/秒。

我没有资格告诉你他们所做的是对是错,但你的高 I/O 主要是由于那些 java 进程造成的。

相关内容