我很难找出导致我的服务器 iowait 过高的原因。
这是iostat -xm 5 5
Linux 2.6.32-358.6.1.el6.x86_64 (prod-1.localdomain) 09/28/2013 _x86_64_ (16 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
6.98 0.05 3.72 3.54 0.00 85.71
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdb 0.08 120.88 30.27 1.72 0.96 0.48 92.20 0.34 10.67 3.79 12.13
sda 7.63 37.19 8.96 4.89 0.35 0.16 76.40 0.16 11.63 2.19 3.04
avg-cpu: %user %nice %system %iowait %steal %idle
5.41 0.00 6.20 37.65 0.00 50.74
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdb 0.00 109.80 186.20 1.40 3.75 0.43 45.66 98.21 519.80 5.33 100.00
sda 33.20 3.40 18.00 2.00 0.37 0.02 40.32 0.07 3.41 3.17 6.34
avg-cpu: %user %nice %system %iowait %steal %idle
5.55 0.00 7.42 30.06 0.00 56.97
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdb 0.00 0.00 196.00 0.00 3.91 0.00 40.85 100.41 506.01 5.10 100.00
sda 0.00 2.40 1.80 2.60 0.05 0.02 30.91 0.01 2.95 2.73 1.20
avg-cpu: %user %nice %system %iowait %steal %idle
5.71 0.00 7.04 31.76 0.00 55.49
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdb 0.00 100.00 189.00 1.20 3.72 0.40 44.33 95.32 514.88 5.26 100.00
sda 33.20 4.20 19.20 5.20 0.39 0.04 35.80 0.02 1.01 0.79 1.92
avg-cpu: %user %nice %system %iowait %steal %idle
61.93 0.00 10.08 14.99 0.00 12.99
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdb 0.00 0.40 185.40 1.40 3.76 0.01 41.31 83.22 431.16 5.28 98.62
sda 33.20 5.40 9.60 4.00 0.21 0.04 37.65 0.02 1.24 1.04 1.42
如您所见,除 await 和 %util 非常高之外,所有指标均正常。所以我认为 /dev/sdb 可能有问题。
但smartctl
没有报告任何有用的信息。
smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-358.6.1.el6.x86_64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Model Family: Western Digital RE4 Serial ATA
Device Model: WDC WD2003FYYS-02W0B1
Serial Number: WD-WMAY04093732
LU WWN Device Id: 5 0014ee 05877b196
Firmware Version: 01.01D02
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Sat Sep 28 09:05:30 2013 ICT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x84) Offline data collection activity
was suspended by an interrupting command from host.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (29160) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 283) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x303f) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 253 253 021 Pre-fail Always - 9100
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 42
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 090 090 000 Old_age Always - 7373
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 40
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 31
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 10
194 Temperature_Celsius 0x0022 123 107 000 Old_age Always - 29
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
我被困在这里,不知道下一步该如何解决这个问题。
任何帮助将不胜感激!
更新:
@MichaelHampton
我的自测日志,没有感兴趣的信息。smartctl -l selftest /dev/sdb
smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-358.6.1.el6.x86_64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 7380
@Mife
我的pidstat -d 1 30
结果。
Linux 2.6.32-358.6.1.el6.x86_64 (cass-23_120.localdomain) 09/28/2013 _x86_64_ (16 CPU)
05:57:43 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command
05:57:44 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command
05:57:45 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command
05:57:46 PM 1555 736.00 0.00 0.00 java
05:57:46 PM 16698 0.00 4.00 0.00 java
05:57:46 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command
05:57:47 PM 552 0.00 68.00 0.00 jbd2/sda3-8
05:57:47 PM 1555 352.00 0.00 0.00 java
05:57:47 PM 16698 0.00 12.00 0.00 java
05:57:47 PM 18074 0.00 4.00 0.00 java
05:57:47 PM 19295 1564.00 0.00 0.00 java
05:57:47 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command
05:57:48 PM 1554 3128.00 8.00 4.00 xinetd
05:57:48 PM 1570 840.00 0.00 0.00 gmond
05:57:48 PM 2183 0.00 4.00 0.00 java
05:57:48 PM 2394 64.00 0.00 0.00 rsync
05:57:48 PM 2395 324.00 0.00 0.00 ssh
05:57:48 PM 13280 28.00 0.00 0.00 downloadm_new.s
05:57:48 PM 19295 1724.00 0.00 0.00 java
05:57:48 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command
05:57:49 PM 19295 1744.00 0.00 0.00 java
05:57:49 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command
05:57:50 PM 1083 0.00 8.00 0.00 flush-8:0
05:57:50 PM 1086 0.00 8.00 0.00 java
05:57:50 PM 2183 0.00 12.00 0.00 java
05:57:50 PM 13280 388.00 0.00 0.00 downloadm_new.s
05:57:50 PM 18074 0.00 4.00 0.00 java
05:57:50 PM 19295 1728.00 0.00 0.00 java
05:57:50 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command
05:57:51 PM 2183 0.00 4.00 0.00 java
05:57:51 PM 2400 8.00 0.00 0.00 sleep
05:57:51 PM 18074 0.00 4.00 0.00 java
05:57:51 PM 19295 1680.00 0.00 0.00 java
05:57:51 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command
05:57:52 PM 552 0.00 28.00 0.00 jbd2/sda3-8
05:57:52 PM 1112 0.00 4.00 0.00 jbd2/sda4-8
05:57:52 PM 2183 0.00 8.00 0.00 java
05:57:52 PM 16698 0.00 4.00 0.00 java
05:57:52 PM 18074 0.00 4.00 0.00 java
05:57:52 PM 19295 1672.00 0.00 0.00 java
05:57:52 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command
05:57:53 PM 1555 376.00 20.00 0.00 java
05:57:53 PM 1570 792.00 0.00 0.00 gmond
05:57:53 PM 19295 1568.00 8.00 0.00 java
05:57:53 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command
05:57:54 PM 3734 844.00 188.00 0.00 java
05:57:54 PM 19295 1672.00 0.00 0.00 java
05:57:54 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command
05:57:55 PM 1083 0.00 20.00 0.00 flush-8:0
05:57:55 PM 3734 2620.00 1156.00 0.00 java
05:57:55 PM 4327 0.00 8.00 0.00 java
05:57:55 PM 9677 0.00 8.00 0.00 java
05:57:55 PM 16613 0.00 8.00 0.00 java
05:57:55 PM 19295 1272.00 8.00 0.00 java
05:57:55 PM 19426 0.00 8.00 0.00 java
05:57:55 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command
05:57:56 PM 3734 3592.00 1200.00 0.00 java
05:57:56 PM 19295 332.00 0.00 0.00 java
05:57:56 PM 19426 0.00 4.00 0.00 java
05:57:56 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command
05:57:57 PM 552 0.00 36.00 0.00 jbd2/sda3-8
05:57:57 PM 2405 1068.00 32.00 0.00 java
05:57:57 PM 3734 2972.00 828.00 0.00 java
05:57:57 PM 5457 0.00 8.00 0.00 java
05:57:57 PM 9677 28424.00 144.00 20.00 java
05:57:57 PM 16698 0.00 8.00 0.00 java
05:57:57 PM 18074 0.00 4.00 0.00 java
05:57:57 PM 19295 0.00 4.00 0.00 java
05:57:57 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command
05:57:58 PM 2183 0.00 4.00 0.00 java
05:57:58 PM 2222 0.00 4.00 0.00 pidstat
05:57:58 PM 2405 500.00 0.00 0.00 java
05:57:58 PM 3734 4016.00 720.00 0.00 java
05:57:58 PM 5457 0.00 8.00 0.00 java
05:57:58 PM 16698 0.00 4.00 0.00 java
05:57:58 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command
05:57:59 PM 1112 0.00 8.00 0.00 jbd2/sda4-8
05:57:59 PM 3734 4572.00 372.00 0.00 java
05:57:59 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command
05:58:00 PM 1083 0.00 32.00 0.00 flush-8:0
05:58:00 PM 2405 496.00 0.00 0.00 java
05:58:00 PM 3734 5412.00 4.00 0.00 java
05:58:00 PM 5457 0.00 16.00 0.00 java
05:58:00 PM 11681 0.00 8.00 0.00 java
05:58:00 PM 14824 0.00 8.00 0.00 java
05:58:00 PM 16698 0.00 12.00 0.00 java
05:58:00 PM 17694 0.00 8.00 0.00 java
05:58:00 PM 18074 0.00 12.00 0.00 java
05:58:00 PM 18129 0.00 8.00 0.00 java
05:58:00 PM 19542 0.00 8.00 0.00 java
05:58:00 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command
05:58:01 PM 3734 3888.00 0.00 0.00 java
05:58:01 PM 3813 8.00 12.00 0.00 java
05:58:01 PM 13280 28.00 0.00 0.00 downloadm_new.s
05:58:01 PM 18074 0.00 8.00 0.00 java
05:58:01 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command
05:58:02 PM 552 0.00 44.00 0.00 jbd2/sda3-8
05:58:02 PM 1129 0.00 16.00 0.00 jbd2/sdb1-8
05:58:02 PM 2405 256.00 0.00 0.00 java
05:58:02 PM 3734 1200.00 1128.00 0.00 java
05:58:02 PM 16698 0.00 4.00 0.00 java
05:58:02 PM 18074 0.00 8.00 0.00 java
05:58:02 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command
05:58:03 PM 1570 1172.00 0.00 0.00 gmond
05:58:03 PM 2183 0.00 4.00 0.00 java
05:58:03 PM 2405 256.00 0.00 0.00 java
05:58:03 PM 2442 92.00 0.00 0.00 rsync
05:58:03 PM 2443 916.00 0.00 0.00 ssh
05:58:03 PM 3734 576.00 0.00 0.00 java
05:58:03 PM 9677 0.00 4.00 0.00 java
05:58:03 PM 13280 56.00 8996.00 0.00 downloadm_new.s
05:58:03 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command
05:58:04 PM 2183 0.00 4.00 0.00 java
05:58:04 PM 2405 256.00 0.00 0.00 java
05:58:04 PM 2443 8.00 0.00 0.00 ssh
05:58:04 PM 3734 2032.00 16.00 0.00 java
05:58:04 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command
05:58:05 PM 1083 0.00 4.00 0.00 flush-8:0
05:58:05 PM 2405 224.00 0.00 0.00 java
05:58:05 PM 2446 160.00 0.00 0.00 sleep
05:58:05 PM 3734 5344.00 648.00 0.00 java
05:58:05 PM 3813 0.00 8.00 0.00 java
05:58:05 PM 13280 1016.00 0.00 0.00 downloadm_new.s
05:58:05 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command
05:58:06 PM 2405 16.00 0.00 0.00 java
05:58:06 PM 3734 6196.00 344.00 0.00 java
05:58:06 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command
05:58:07 PM 552 0.00 8.00 0.00 jbd2/sda3-8
05:58:07 PM 2405 112.00 0.00 0.00 java
05:58:07 PM 3734 3532.00 0.00 0.00 java
05:58:07 PM 16698 0.00 4.00 0.00 java
05:58:07 PM 18074 0.00 4.00 0.00 java
05:58:07 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command
05:58:08 PM 1570 1172.00 0.00 0.00 gmond
05:58:08 PM 2183 0.00 4.00 0.00 java
05:58:08 PM 2405 352.00 0.00 0.00 java
05:58:08 PM 3734 4588.00 0.00 0.00 java
05:58:08 PM 16698 0.00 8.00 0.00 java
05:58:08 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command
05:58:09 PM 2222 0.00 4.00 0.00 pidstat
05:58:09 PM 2405 368.00 0.00 0.00 java
05:58:09 PM 3734 1720.00 0.00 0.00 java
05:58:09 PM 16698 0.00 4.00 0.00 java
05:58:09 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command
05:58:10 PM 1083 0.00 8.00 0.00 flush-8:0
05:58:10 PM 2405 480.00 0.00 0.00 java
05:58:10 PM 3734 40.00 16.00 0.00 java
05:58:10 PM 17768 0.00 8.00 0.00 java
05:58:10 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command
05:58:11 PM 2405 608.00 0.00 0.00 java
05:58:11 PM 3734 264.00 0.00 0.00 java
05:58:11 PM 19426 0.00 4.00 0.00 java
05:58:11 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command
05:58:12 PM 1129 0.00 24.00 0.00 jbd2/sdb1-8
05:58:12 PM 2405 240.00 0.00 0.00 java
05:58:12 PM 18074 0.00 8.00 0.00 java
05:58:12 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command
05:58:13 PM 1570 1172.00 0.00 0.00 gmond
05:58:13 PM 2183 0.00 4.00 0.00 java
05:58:13 PM 2405 128.00 0.00 0.00 java
05:58:13 PM 18074 0.00 4.00 0.00 java
Average: PID kB_rd/s kB_wr/s kB_ccwr/s Command
Average: 552 0.00 6.13 0.00 jbd2/sda3-8
Average: 1083 0.00 2.40 0.00 flush-8:0
Average: 1086 0.00 0.27 0.00 java
Average: 1112 0.00 0.40 0.00 jbd2/sda4-8
Average: 1129 0.00 1.33 0.00 jbd2/sdb1-8
Average: 1554 104.16 0.27 0.13 xinetd
Average: 1570 171.43 0.00 0.00 gmond
Average: 2183 0.00 1.60 0.00 java
Average: 2222 0.00 0.27 0.00 pidstat
Average: 2405 178.49 1.07 0.00 java
Average: 2446 5.33 0.00 0.00 sleep
Average: 3734 1778.49 220.45 0.00 java
Average: 3813 0.27 0.67 0.00 java
Average: 4327 0.00 0.27 0.00 java
Average: 5457 0.00 1.07 0.00 java
Average: 9677 946.52 5.19 0.67 java
Average: 11681 0.00 0.27 0.00 java
Average: 13280 50.48 299.57 0.00 downloadm_new.s
Average: 14824 0.00 0.27 0.00 java
Average: 16613 0.00 0.27 0.00 java
Average: 16698 0.00 2.13 0.00 java
Average: 17694 0.00 0.27 0.00 java
Average: 17768 0.00 0.27 0.00 java
Average: 18074 0.00 2.13 0.00 java
Average: 18129 0.00 0.27 0.00 java
Average: 19295 498.04 0.67 0.00 java
Average: 19426 0.00 0.53 0.00 java
Average: 19542 0.00 0.27 0.00 java
@kworr
这是我的安装选项/dev/sdb1
。
% mount | grep sdb
/dev/sdb1 on /backup type ext4 (rw,noatime,commit=100)
更新2 您的硬盘预期有多少 IOPS。
7,200 rpm SATA drives HDD ~75-100 IOPS[2] SATA 3 Gb/s
10,000 rpm SATA drives HDD ~125-150 IOPS[2] SATA 3 Gbit/s
10,000 rpm SAS drives HDD ~140 IOPS[2] SAS
15,000 rpm SAS drives HDD ~175-210 IOPS[2] SAS
答案1
这里发生了很多事情,但 pid 3734,一个 java 进程似乎是罪魁祸首。你应该找出它在做什么,传递给它的参数是什么,它的父 pid 是什么,以及它是什么意味着去做。
在 1 秒的 30 秒时间内,样本 java 使用 1778.49 读取 kb/秒,还有其他 java 进程,pid 9677 和 19295 分别使用 946.52 和 498.04 读取 kb/秒。
我没有资格告诉你他们所做的是对是错,但你的高 I/O 主要是由于那些 java 进程造成的。