追踪高 Linux 负载 - HDD 故障或中断过多?(ksoftirqd 时间 437:44.13)

追踪高 Linux 负载 - HDD 故障或中断过多?(ksoftirqd 时间 437:44.13)

服务器统计:

“cat /proc/version”输出

Linux version 2.6.18-308.24.1.el5 ([email protected]) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-52)) #1 SMP Tue Dec 4 17:43:34 EST 2012

ethtool eth0 输出:

Settings for eth0:
        Supported ports: [ TP ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Advertised auto-negotiation: Yes
        Speed: 1000Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: pumbg
        Wake-on: g
        Current message level: 0x00000001 (1)
        Link detected: yes

cat /proc/cpuinfo | grep MHz 输出:

cpu MHz         : 3201.000
cpu MHz         : 3201.000
cpu MHz         : 3201.000
cpu MHz         : 3201.000
cpu MHz         : 3201.000
cpu MHz         : 3201.000
cpu MHz         : 3201.000
cpu MHz         : 3201.000

我对 Linux 不是很在行,一直在试图找出这台服务器负载如此之高的原因。我认为要么是硬盘速度不够快,要么是中断太多,因为“ksoftirqd”进程有时会占用大量 CPU,而且似乎运行时间很长。

我一直在互联网上研究如何正确诊断这个问题,我相信我已经找到了如何正确地提供有用的信息,但不幸的是结果仍然让我感到困惑。

顶部输出

top - 08:40:31 up 132 days,  2:06,  2 users,  load average: 84.25, 63.29, 63.02
Tasks: 3214 total,   8 running, 3206 sleeping,   0 stopped,   0 zombie
Cpu(s): 18.6%us,  3.2%sy,  0.0%ni, 41.1%id, 26.8%wa,  0.3%hi,  9.9%si,  0.0%st
Mem:  32934596k total, 25811556k used,  7123040k free,   329988k buffers
Swap:  4194296k total,      128k used,  4194168k free, 10888060k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 1846 nobody    16   0  125m  12m 4944 S  3.9  0.0   0:00.47 httpd
13490 nobody    15   0  126m  13m 5064 S  2.9  0.0   0:02.37 httpd
20137 everprox  16   0  127m  13m 4908 D  2.6  0.0   0:00.76 httpd
 1827 everprox  15   0  127m  13m 4924 S  2.0  0.0   0:00.50 httpd
16574 root      15   0 15120 3480  812 R  2.0  0.0   0:00.15 top
16894 nobody    15   0  126m  84m  944 S  2.0  0.3 946:10.13 nginx
 6347 root      16   0 15112 3552  816 S  1.6  0.0   3:16.46 top
 7115 named     25   0  422m  52m 2084 S  1.6  0.2   4089:16 named
16575 everprox  16   0  126m  11m 3992 D  1.6  0.0   0:00.05 httpd
16891 nobody    15   0  149m  89m  944 S  1.6  0.3 939:49.39 nginx
16892 nobody    15   0  126m  84m  944 S  1.6  0.3 940:41.47 nginx
26041 everprox  15   0  126m  13m 5076 S  1.6  0.0   0:01.55 httpd
26113 nobody    15   0  126m  13m 5024 S  1.6  0.0   0:02.46 httpd
 4345 everprox  15   0  126m  13m 5040 S  1.3  0.0   0:01.82 httpd
13131 everprox  15   0  125m  12m 5072 S  1.3  0.0   0:01.82 httpd
14058 everprox  15   0  127m  13m 5132 D  1.3  0.0   0:01.57 httpd
14554 nobody    15   0  126m  13m 4896 S  1.3  0.0   0:00.74 httpd
26209 everprox  15   0  126m  13m 5044 S  1.3  0.0   0:03.08 httpd
26283 everprox  16   0  125m  12m 5108 D  1.3  0.0   0:02.06 httpd
 4360 everprox  15   0  126m  13m 5088 S  1.0  0.0   0:01.93 httpd
12997 everprox  15   0  126m  13m 5052 S  1.0  0.0   0:03.33 httpd
13351 nobody    15   0  127m  13m 5168 S  1.0  0.0   0:02.43 httpd
13705 everprox  15   0  126m  13m 5076 D  1.0  0.0   0:01.55 httpd
13870 nobody    16   0  126m  13m 5088 S  1.0  0.0   0:02.73 httpd
13931 nobody    15   0  126m  13m 5064 S  1.0  0.0   0:02.57 httpd
14008 everprox  15   0  127m  13m 5156 D  1.0  0.0   0:03.39 httpd
14009 everprox  15   0  126m  13m 5064 D  1.0  0.0   0:01.94 httpd
14215 everprox  15   0  126m  13m 5044 S  1.0  0.0   0:01.68 httpd
14550 everprox  16   0  126m  12m 5088 D  1.0  0.0   0:02.73 httpd
14556 nobody    15   0  126m  13m 5096 S  1.0  0.0   0:03.57 httpd
14587 everprox  15   0  126m  12m 5072 S  1.0  0.0   0:03.74 httpd
14625 nobody    15   0  126m  13m 5108 S  1.0  0.0   0:02.93 httpd
14671 everprox  15   0  126m  13m 5048 S  1.0  0.0   0:02.92 httpd
16893 nobody    15   0  125m  81m  944 R  1.0  0.3 936:15.00 nginx
16896 nobody    15   0  127m  87m  944 S  1.0  0.3 939:30.33 nginx
16897 nobody    15   0  122m  84m  944 R  1.0  0.3 939:11.18 nginx
20121 nobody    16   0  125m  11m 4752 S  1.0  0.0   0:00.63 httpd
20122 everprox  16   0  126m  13m 5036 D  1.0  0.0   0:00.60 httpd
25391 everprox  16   0  126m  13m 5108 D  1.0  0.0   0:02.74 httpd
25463 everprox  15   0  126m  13m 5036 D  1.0  0.0   0:02.45 httpd
25514 everprox  16   0  126m  13m 5096 D  1.0  0.0   0:01.03 httpd
26130 everprox  15   0  126m  13m 5048 D  1.0  0.0   0:01.42 httpd
26220 nobody    15   0  126m  13m 5068 S  1.0  0.0   0:03.15 httpd
 1833 nobody    16   0  126m  12m 4976 S  0.7  0.0   0:00.40 httpd
 4364 everprox  15   0  125m  12m 5020 S  0.7  0.0   0:02.01 httpd
 4370 nobody    16   0  126m  13m 5076 S  0.7  0.0   0:02.02 httpd
 5499 everprox  15   0  126m  12m 4972 S  0.7  0.0   0:00.54 httpd
 5507 everprox  16   0  126m  13m 5004 D  0.7  0.0   0:00.50 httpd
12984 everprox  16   0  127m  13m 5064 D  0.7  0.0   0:01.84 httpd
13004 everprox  15   0  126m  13m 5056 S  0.7  0.0   0:02.81 httpd
13029 everprox  16   0  126m  13m 5048 D  0.7  0.0   0:02.65 httpd

free -mt 输出

root@echo [~]# free -mt
             total       used       free     shared    buffers     cached
Mem:         32162      25219       6943          0        322      10690
-/+ buffers/cache:      14206      17956
Swap:         4095          0       4095
Total:       36258      25219      11039

iostat:

root@echo [~]# iostat
Linux 2.6.18-308.24.1.el5 (echo.uk7.org)        10/17/2013

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          26.95    0.08   12.17    3.42    0.00   57.38

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda             111.64        19.88      2038.06  226836250 23259888204
sda1              0.00         0.00         0.00       2688       1076
sda2            111.64        19.88      2038.06  226833282 23259887128
dm-0            255.26        19.88      2038.06  226831554 23259887880
dm-1              0.00         0.00         0.00       1160        344

sar -I SUM输出:

Linux 2.6.18-308.24.1.el5 (echo.uk7.org)        10/17/2013

12:00:01 AM      INTR    intr/s
12:10:01 AM       sum  17315.21
12:20:01 AM       sum  23640.63
12:30:05 AM       sum  26005.42
12:40:05 AM       sum  27051.29
12:50:01 AM       sum  25887.09
01:00:01 AM       sum  25915.91
01:10:02 AM       sum  25643.99
01:20:01 AM       sum  25590.73
01:30:01 AM       sum  25843.38
01:40:01 AM       sum  25817.66
01:50:01 AM       sum  25937.93
02:00:03 AM       sum  25836.42
02:10:01 AM       sum  25850.17
02:20:01 AM       sum  25788.77
02:30:01 AM       sum  25680.55
02:40:01 AM       sum  25871.60
02:50:01 AM       sum  27089.20
03:00:01 AM       sum  26069.86
03:10:01 AM       sum  26368.91
03:20:01 AM       sum  25977.64
03:30:04 AM       sum  26038.12
03:40:05 AM       sum  26278.10
03:50:02 AM       sum  25988.70
04:00:04 AM       sum  26723.36
04:10:05 AM       sum  26150.12
04:20:03 AM       sum  25904.27
04:30:01 AM       sum  26030.90
04:40:09 AM       sum  25714.96
04:50:10 AM       sum  25732.73
05:00:01 AM       sum  24374.81
05:10:01 AM       sum  21990.37
05:20:01 AM       sum  22917.79
05:30:03 AM       sum  22847.98
05:40:03 AM       sum  24926.45
05:50:01 AM       sum  24986.11
06:00:01 AM       sum  24935.01
06:10:04 AM       sum  25438.65
06:20:01 AM       sum  25430.91
06:30:03 AM       sum  26959.88
06:40:01 AM       sum  26723.60
06:50:01 AM       sum  26422.57
07:00:01 AM       sum  26052.94
07:10:07 AM       sum  27915.00
07:20:01 AM       sum  25868.20
07:30:06 AM       sum  25811.18
07:40:05 AM       sum  25843.82
07:50:01 AM       sum  25814.03
08:00:01 AM       sum  25554.51
08:10:01 AM       sum  24948.75
08:20:01 AM       sum  25413.89
08:30:06 AM       sum  25860.78
08:40:01 AM       sum  25819.49
Average:          sum  25512.26

sar -w 输出:

Linux 2.6.18-308.24.1.el5 (echo.uk7.org)        10/17/2013

12:00:01 AM   cswch/s
12:10:01 AM 150959.09
12:20:01 AM 108496.38
12:30:05 AM  32508.30
12:40:05 AM  17555.99
12:50:01 AM  21667.90
01:00:01 AM  89007.13
01:10:02 AM  95902.66
01:20:01 AM  83193.93
01:30:01 AM  76984.23
01:40:01 AM  82111.94
01:50:01 AM  77520.72
02:00:03 AM  39197.94
02:10:01 AM  22047.28
02:20:01 AM  21469.65
02:30:01 AM  26522.87
02:40:01 AM  63104.71
02:50:01 AM  85472.19
03:00:01 AM  40869.59
03:10:01 AM  34278.48
03:20:01 AM  15844.37
03:30:04 AM  16504.44
03:40:05 AM  25177.02
03:50:02 AM  18018.24
04:00:04 AM  27187.20
04:10:05 AM  29010.02
04:20:03 AM  40022.62
04:30:01 AM  69535.67
04:40:09 AM  96043.34
04:50:10 AM  82239.90
05:00:01 AM 128834.10
05:10:01 AM 167916.98
05:20:01 AM 130773.27
05:30:03 AM 125977.75
05:40:03 AM 112561.88
05:50:01 AM  94872.38
06:00:01 AM  98417.10
06:10:04 AM  91611.66
06:20:01 AM  94804.15
06:30:03 AM  75834.69
06:40:01 AM  54488.51
06:50:01 AM  24460.81
07:00:01 AM  16950.60
07:10:07 AM  24471.96
07:20:01 AM  16379.81
07:30:06 AM  15711.76
07:40:05 AM  15708.03
07:50:01 AM  16305.04
08:00:01 AM  18454.64
08:10:01 AM  73621.10
08:20:01 AM  57868.75
08:30:06 AM  15440.36
08:40:01 AM  14954.61
08:50:01 AM  14906.57
Average:     58290.70

sar -d 5 0 输出:

root@echo [~]# sar -d 5 0
Linux 2.6.18-308.24.1.el5 (echo.uk7.org)        10/17/2013

08:52:50 AM       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
08:52:55 AM    dev8-0    104.40      0.00   1760.00     16.86     19.86    190.26      1.64     17.12
08:52:55 AM    dev8-1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
08:52:55 AM    dev8-2    104.40      0.00   1760.00     16.86     19.86    190.26      1.64     17.12
08:52:55 AM  dev253-0    220.00      0.00   1760.00      8.00     40.12    182.36      0.78     17.12
08:52:55 AM  dev253-1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00

08:52:55 AM       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
08:53:00 AM    dev8-0     98.40      0.00   1771.20     18.00     17.44    177.22      1.62     15.92
08:53:00 AM    dev8-1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
08:53:00 AM    dev8-2     98.40      0.00   1771.20     18.00     17.44    177.22      1.62     15.92
08:53:00 AM  dev253-0    221.40      0.00   1771.20      8.00     36.61    165.36      0.72     15.92
08:53:00 AM  dev253-1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00

08:53:00 AM       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
08:53:05 AM    dev8-0    109.20      0.00   1916.80     17.55     18.26    167.25      1.75     19.14
08:53:05 AM    dev8-1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
08:53:05 AM    dev8-2    109.20      0.00   1916.80     17.55     18.26    167.25      1.75     19.14
08:53:05 AM  dev253-0    239.60      0.00   1916.80      8.00     26.30    109.78      0.80     19.14
08:53:05 AM  dev253-1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00

08:53:05 AM       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
08:53:10 AM    dev8-0    104.79      0.00   2000.80     19.09     18.60    177.46      1.68     17.62
08:53:10 AM    dev8-1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
08:53:10 AM    dev8-2    104.79      0.00   2000.80     19.09     18.60    177.46      1.68     17.62
08:53:10 AM  dev253-0    250.10      0.00   2000.80      8.00     38.19    152.70      0.70     17.62
08:53:10 AM  dev253-1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00

08:53:10 AM       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
08:53:15 AM    dev8-0    174.35      0.00   3148.70     18.06     21.08    120.73      1.63     28.36
08:53:15 AM    dev8-1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
08:53:15 AM    dev8-2    174.35      0.00   3148.70     18.06     21.08    120.73      1.63     28.36
08:53:15 AM  dev253-0    393.59      0.00   3148.70      8.00     39.29     99.81      0.72     28.36
08:53:15 AM  dev253-1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00

sar -W 输出:

12:00:01 AM  pswpin/s pswpout/s
12:10:01 AM      0.00      0.00
12:20:01 AM      0.00      0.00
12:30:05 AM      0.00      0.00
12:40:05 AM      0.00      0.00
12:50:01 AM      0.00      0.00
01:00:01 AM      0.00      0.00
01:10:02 AM      0.00      0.00
01:20:01 AM      0.00      0.00
01:30:01 AM      0.00      0.00
01:40:01 AM      0.00      0.00
01:50:01 AM      0.00      0.00
02:00:03 AM      0.00      0.00
02:10:01 AM      0.00      0.00
02:20:01 AM      0.00      0.00
02:30:01 AM      0.00      0.00
02:40:01 AM      0.00      0.00
02:50:01 AM      0.00      0.00
03:00:01 AM      0.00      0.00
03:10:01 AM      0.00      0.00
03:20:01 AM      0.00      0.00
03:30:04 AM      0.00      0.00
03:40:05 AM      0.00      0.00
03:50:02 AM      0.00      0.00
04:00:04 AM      0.00      0.00
04:10:05 AM      0.00      0.00
04:20:03 AM      0.00      0.00
04:30:01 AM      0.00      0.00
04:40:09 AM      0.00      0.00
04:50:10 AM      0.00      0.00
05:00:01 AM      0.00      0.00
05:10:01 AM      0.00      0.00
05:20:01 AM      0.00      0.00
05:30:03 AM      0.00      0.00
05:40:03 AM      0.00      0.00
05:50:01 AM      0.00      0.00
06:00:01 AM      0.00      0.00
06:10:04 AM      0.00      0.00
06:20:01 AM      0.00      0.00
06:30:03 AM      0.00      0.00
06:40:01 AM      0.00      0.00
06:50:01 AM      0.00      0.00
07:00:01 AM      0.00      0.00
07:10:07 AM      0.00      0.00
07:20:01 AM      0.00      0.00
07:30:06 AM      0.01      0.00
07:40:05 AM      0.00      0.00
07:50:01 AM      0.00      0.00
08:00:01 AM      0.00      0.00
08:10:01 AM      0.00      0.00
08:20:01 AM      0.00      0.00
08:30:06 AM      0.00      0.00
08:40:01 AM      0.00      0.00
08:50:01 AM      0.00      0.00
Average:         0.00      0.00

只是想知道是否有什么东西真正引起了比我更了解的人的注意,就像我上面说的,我认为它是一个速度很慢的 HDD,可能 SSD 会做得更好或者中断太多。

该服务器主要是托管基于 Web 的代理的 Web 托管服务器。它运行带有 mod_ruid2 和 nginxcp (cpanel 插件) 的 Apache 2.2.23。

谢谢。

答案1

在我看来,您受到 I/O 限制。从top,您会看到许多带有标志的任务D。这意味着它们在 I/O 上被阻止,等待磁盘的响应。“平均负载”基本上意味着“在 x 时间内等待 x 个任务”。

如果它们都是 Apache,那么您还会有大量(可能太多)工作线程。请考虑稍微调整一下您的服务器或购买更快的硬件。

相关内容