查找 99.99% iowait 和 0% SWAPON 的根本原因

查找 99.99% iowait 和 0% SWAPON 的根本原因

用户和 DBA 抱怨我们的 OEL 服务器上的“Oracle 运行缓慢”。从操作系统角度来看,我发现的唯一问题是有一些奇怪的 IOWAIT 统计数据iotop

输出自iotop

Total DISK READ: 27.24 M/s | Total DISK WRITE: 2.32 M/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
10374 be/4 root      190.28 K/s    0.00 B/s  0.00 % 99.99 % clBackup -child 22862 -j ~jt 202777:7:1 -cn xxxxxx12844 be/4 xxxxxx     0.00 B/s  303.15 K/s  0.00 % 99.99 % ora_dbw0_oaprod
14460 be/4 oracleuser   251.55 K/s    0.00 B/s  0.00 % 99.99 % oracleoaprod (LOCAL=NO)
 6795 be/4 oracleuser  1012.65 K/s    0.00 B/s  0.00 % 99.99 % oracleoaprod (LOCAL=NO)
 4336 be/4 oracleuser   812.70 K/s    0.00 B/s  0.00 % 99.99 % oracleoaprod (LOCAL=NO)
17725 be/4 oracleuser   193.50 K/s    0.00 B/s  0.00 % 99.99 % oracleoaprod (LOCAL=NO)
14456 be/4 oracleuser   109.65 K/s    0.00 B/s  0.00 % 99.99 % oracleoaprod (LOCAL=NO)
12831 be/4 oracleuser    51.60 K/s    0.00 B/s  0.00 % 99.99 % oracleoaprod (LOCAL=NO)
 9756 be/4 oracleuser    83.85 K/s    0.00 B/s  0.00 % 99.99 % oracleoaprod (LOCAL=NO)
24916 be/4 oracleuser  1128.75 K/s    0.00 B/s  0.00 % 99.99 % oracleoaprod (LOCAL=NO)
19701 be/4 oracleuser   361.20 K/s    0.00 B/s  0.00 % 99.99 % oracleoaprod (LOCAL=NO)
27920 be/4 oracleuser   432.15 K/s    0.00 B/s  0.00 % 99.99 % oracleoaprod (LOCAL=NO)
16132 be/4 oracleuser    90.30 K/s    0.00 B/s  0.00 % 99.99 % oracleoaprod (LOCAL=NO)
27967 be/4 oracleuser    64.50 K/s    0.00 B/s  0.00 % 97.87 % oracleoaprod (LOCAL=NO)
16615 be/4 oracleuser    64.50 K/s    0.00 B/s  0.00 % 97.17 % oracleoaprod (LOCAL=NO)
 4465 be/4 oracleuser     7.46 M/s    0.00 B/s  0.00 % 97.15 % oracleoaprod (LOCAL=NO)
28044 be/4 oracleuser    14.51 M/s    0.00 B/s  0.00 % 97.02 % oracleoaprod (DESCRIPTION~(ADDRESS=(PROTOCOL=beq)))32283 be/4 oracleuser    77.40 K/s    0.00 B/s  0.00 % 95.48 % oracleoaprod (LOCAL=NO)
12851 be/4 oracleuser    19.35 K/s  590.18 K/s  0.00 % 91.77 % ora_lgwr_oaprod
12846 be/4 oracleuser     0.00 B/s 1077.15 K/s  0.00 % 91.41 % ora_dbw1_oaprod
23153 be/4 oracleuser    96.75 K/s    0.00 B/s  0.00 % 72.37 % oracleoaprod (LOCAL=NO)
27710 be/4 oracleuser    19.35 K/s    0.00 B/s  0.00 % 41.50 % oracleoaprod (LOCAL=NO)
25775 be/4 oracleuser    51.60 K/s    0.00 B/s  0.00 % 30.11 % oracleoaprod (LOCAL=NO)
13323 be/4 oracleuser    19.35 K/s   51.60 K/s  0.00 % 21.98 % oracleoaprod (LOCAL=NO)
24345 be/4 oracleuser    12.90 K/s    0.00 B/s  0.00 % 19.34 % oracleoaprod (LOCAL=NO)
12853 be/4 oracleuser     0.00 B/s   38.70 K/s  0.00 % 11.72 % ora_ckpt_oaprod
 7234 be/4 oracleuser     6.45 K/s    0.00 B/s  0.00 %  7.52 % oracleoaprod (LOCAL=NO)
17820 be/4 apps     0.00 B/s    9.68 K/s  0.00 %  0.00 % rwrun P_CONC_REQUEST_ID=8~2211170.out desformat=XML20562 be/4 apps     0.00 B/s    3.23 K/s  0.00 %  0.00 % java -DCLIENT_PROCESSID=2~.GSMSvcComponentContainer 5849 be/4 apps     3.23 K/s    0.00 B/s  0.00 %  0.00 % FNDLIBR
 7232 be/4 apps     0.00 B/s    3.23 K/s  0.00 %  0.00 % RVCTP
    1 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % init [5]
    2 rt/3 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/0]
    3 be/7 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/0]
    4 rt/3 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [watchdog/0]
    5 rt/3 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/1]
    6 be/7 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/1]
    7 rt/3 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [watchdog/1]
    8 rt/3 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % a[migration/2]
    9 be/7 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/2]
   10 rt/3 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [watchdog/2]
   11 rt/3 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/3]
   12 be/7 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/3]
   13 rt/3 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [watchdog/3]
   14 rt/3 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/4]

输出自sar

# sar 1 7
    Linux 2.6.18-371.3.1.0.1.el5    01/16/2014

    10:13:41 AM       CPU     %user     %nice   %system   %iowait    %steal     %idle
    10:13:42 AM       all     65.32      0.00      2.56     22.08      0.00     10.04
    10:13:43 AM       all     65.94      0.00      2.50     23.02      0.00      8.55
    10:13:44 AM       all     65.15      0.00      2.06     24.17      0.00      8.62
    10:13:45 AM       all     62.16      0.00      2.06     26.06      0.00      9.73
    10:13:46 AM       all     54.00      0.00      1.81     31.96      0.00     12.23
    10:13:47 AM       all     51.03      0.00      1.62     35.17      0.00     12.18
    10:13:48 AM       all     51.97      0.00      1.25     27.61      0.00     19.18
    Average:          all     59.37      0.00      1.98     27.15      0.00     11.50

NetApp除以下磁盘外,所有磁盘均来自LogVol00

Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup01-LogVol00
                       97G   76G   17G  83% /
/dev/cciss/c0d0p1      99M   32M   63M  34% /boot
tmpfs                 127G  500M  126G   1% /dev/shm
/dev/mapper/mpath4p1  5.4T  3.2T  2.0T  62% /oracle/x1
/dev/mapper/mpath6p1  6.3T  4.3T  1.7T  72% /oracle/x2
/dev/mapper/mpath1p1  184G  188M  174G   1% /oracle/x1/db/apps_st/redo
/dev/mapper/mpath2p1  184G  188M  174G   1% /oracle/x1/db/apps_st/redo02

答案1

我猜只是缺少可用的 iops。对于大型数据库服务器,我始终建议使用 SSD 存储或更大的 SAS 阵列(最好是本地存储)。

相关内容