分析服务器高负载

分析服务器高负载

我们经历过一些非常高的服务器负载:

下午 1:49 启动 2 天 1:51,1 个用户,平均负载:79.05、101.35、111.53

然后网站就崩溃了。我的意思是,网页无法加载,我无法通过 ssh 或 ftp 访问,我们必须进行硬重置。

它是随机发生的。

崩溃时的错误日志(类似于这个但重复了数百次):

[Sat Jul 09 13:02:54 2011] [notice] child pid 1966 exit signal Segmentation fault (11)
[Sat Jul 09 13:02:57 2011] [notice] child pid 1967 exit signal Segmentation fault (11)
[Sat Jul 09 13:02:57 2011] [notice] child pid 1969 exit signal Segmentation fault (11)
[Sat Jul 09 13:02:57 2011] [notice] child pid 1970 exit signal Segmentation fault (11)
[Sat Jul 09 13:02:57 2011] [notice] child pid 1971 exit signal Segmentation fault (11)
[Sat Jul 09 13:02:57 2011] [notice] child pid 1972 exit signal Segmentation fault (11)
[Sat Jul 09 13:02:57 2011] [notice] child pid 1973 exit signal Segmentation fault (11)
[Sat Jul 09 13:02:57 2011] [notice] child pid 1974 exit signal Segmentation fault (11)
[Sat Jul 09 13:02:58 2011] [notice] child pid 1976 exit signal Segmentation fault (11)
[Sat Jul 09 13:02:58 2011] [notice] child pid 1977 exit signal Segmentation fault (11)
[Sat Jul 09 13:02:58 2011] [notice] child pid 1978 exit signal Segmentation fault (11)
[Sat Jul 09 13:02:58 2011] [notice] child pid 1979 exit signal Segmentation fault (11)
[Sat Jul 09 13:02:58 2011] [notice] child pid 1980 exit signal Segmentation fault (11)
[Sat Jul 09 13:02:58 2011] [notice] child pid 1981 exit signal Segmentation fault (11)
[Sat Jul 09 13:02:58 2011] [notice] child pid 1982 exit signal Segmentation fault (11)
[Sat Jul 09 13:02:58 2011] [notice] child pid 1983 exit signal Segmentation fault (11)
[Sat Jul 09 13:02:58 2011] [notice] child pid 1984 exit signal Segmentation fault (11)
[Sat Jul 09 13:02:58 2011] [notice] child pid 1985 exit signal Segmentation fault (11)
[Sat Jul 09 13:02:58 2011] [notice] child pid 1986 exit signal Segmentation fault (11)
[Sat Jul 09 13:02:58 2011] [notice] child pid 1987 exit signal Segmentation fault (11)
[Sat Jul 09 13:02:58 2011] [notice] child pid 1988 exit signal Segmentation fault (11)
[Sat Jul 09 13:02:58 2011] [notice] child pid 1989 exit signal Segmentation fault (11)
[Sat Jul 09 13:02:58 2011] [notice] child pid 1990 exit signal Segmentation fault (11)
[Sat Jul 09 13:02:58 2011] [notice] child pid 1991 exit signal Segmentation fault (11)
[Sat Jul 09 13:03:16 2011] [notice] child pid 1992 exit signal Segmentation fault (11)
[Sat Jul 09 13:03:17 2011] [notice] child pid 1993 exit signal Segmentation fault (11)
[Sat Jul 09 13:03:21 2011] [notice] child pid 1994 exit signal Segmentation fault (11)
[Sat Jul 09 13:03:21 2011] [notice] child pid 1995 exit signal Segmentation fault (11)

崩溃时刻的访问日志:

::1 - - [09/Jul/2011:12:54:07 +0200] "OPTIONS * HTTP/1.0" 200 - "-" "Apache/2.2.10 (Linux/SUSE) (internal dummy connection)"
::1 - - [09/Jul/2011:13:38:48 +0200] "OPTIONS * HTTP/1.0" 200 - "-" "Apache/2.2.10 (Linux/SUSE) (internal dummy connection)"

我们的用户数量不多,所以这有点奇怪。我在哪里可以找到有关崩溃的信息?我们有一个 LAMP 架构。

提前致谢

答案1

配置 Apache,使其在出现段错误时生成核心转储。之后,尝试使用进行调试gdb以查看原因。

答案2

它是在固定时间发生的吗?还是随机发生的?

您是否检查过 Web 服务器上的访问和错误日​​志?

您是否查看过 iostat 来查看负载是由磁盘访问还是网络访问产生的?

Top 是否会告诉您某个特定进程是否正在消耗资源?

系统日志告诉您什么吗?

您说的崩溃是什么意思?内核崩溃?应用程序出现故障?服务器停止响应?控制台可以工作但网络不工作?

我们需要更多信息才能进行猜测。您需要缩小发生时间范围,确定发生时间是否有规律,以及发生时间在哪个子系统中。

相关内容