我最近升级到了一个新的网络服务器。所有数据/配置都从旧服务器转移过来,旧服务器虽然工作正常,但最近磁盘空间不足。
一开始我注意到一件奇怪的事情:随机出现负载峰值,htop/iotop 却什么都没显示(1-2 个正在运行的进程,CPU/RAM/IO 使用率 < 10 %,其他所有进程的状态为“S”)。摘自我的正常运行时间日志:
02:25:01 up 221 days, 4:45, 3 users, load average: 0,20, 2,53, 3,18
02:26:01 up 221 days, 4:46, 3 users, load average: 0,68, 2,27, 3,05
02:27:01 up 221 days, 4:47, 3 users, load average: 0,74, 2,01, 2,91
02:28:01 up 221 days, 4:48, 3 users, load average: 0,53, 1,71, 2,75
02:29:01 up 221 days, 4:49, 3 users, load average: 0,30, 1,44, 2,59
02:30:01 up 221 days, 4:50, 3 users, load average: 0,31, 1,24, 2,44
02:31:01 up 221 days, 4:51, 3 users, load average: 0,80, 1,23, 2,37
02:32:01 up 221 days, 4:52, 3 users, load average: 0,50, 1,07, 2,24
02:33:01 up 221 days, 4:53, 3 users, load average: 0,52, 0,98, 2,13
02:34:01 up 221 days, 4:54, 3 users, load average: 0,92, 1,05, 2,08
02:35:01 up 221 days, 4:55, 3 users, load average: 0,51, 0,91, 1,97
02:36:01 up 221 days, 4:56, 3 users, load average: 48,24, 13,44, 6,13
02:37:01 up 221 days, 4:57, 3 users, load average: 45,14, 18,40, 8,25
02:38:01 up 221 days, 4:58, 3 users, load average: 16,65, 15,08, 7,74
02:39:01 up 221 days, 4:59, 3 users, load average: 6,15, 12,34, 7,26
02:40:01 up 221 days, 5:00, 3 users, load average: 2,38, 10,14, 6,82
02:41:01 up 221 days, 5:01, 3 users, load average: 1,78, 8,57, 6,49
02:42:01 up 221 days, 5:02, 3 users, load average: 0,70, 7,03, 6,08
02:43:01 up 221 days, 5:03, 3 users, load average: 0,40, 5,79, 5,71
02:44:01 up 221 days, 5:04, 3 users, load average: 0,23, 4,76, 5,36
02:45:01 up 221 days, 5:05, 3 users, load average: 0,17, 3,92, 5,04
02:46:01 up 221 days, 5:06, 3 users, load average: 0,73, 3,43, 4,80
02:47:01 up 221 days, 5:07, 3 users, load average: 0,58, 2,89, 4,52
02:48:01 up 221 days, 5:08, 3 users, load average: 0,36, 2,41, 4,25
02:49:01 up 221 days, 5:09, 3 users, load average: 39,40, 14,55, 8,37
02:50:01 up 221 days, 5:10, 3 users, load average: 14,81, 11,99, 7,87
02:51:01 up 221 days, 5:11, 3 users, load average: 6,20, 10,05, 7,46
02:52:01 up 221 days, 5:12, 3 users, load average: 2,41, 8,26, 7,01
02:53:01 up 221 days, 5:13, 3 users, load average: 1,20, 6,83, 6,59
02:54:01 up 221 days, 5:14, 3 users, load average: 0,62, 5,64, 6,20
02:55:01 up 221 days, 5:15, 3 users, load average: 0,42, 4,69, 5,83
02:56:01 up 221 days, 5:16, 3 users, load average: 0,71, 4,01, 5,53
02:57:01 up 221 days, 5:17, 3 users, load average: 0,37, 3,31, 5,19
02:58:01 up 221 days, 5:18, 3 users, load average: 0,25, 2,75, 4,88
02:59:01 up 221 days, 5:19, 3 users, load average: 0,52, 2,40, 4,63
03:00:01 up 221 days, 5:20, 3 users, load average: 0,34, 2,01, 4,35
03:01:01 up 221 days, 5:21, 3 users, load average: 1,66, 2,06, 4,22
03:02:01 up 221 days, 5:22, 3 users, load average: 1,39, 1,91, 4,03
03:03:01 up 221 days, 5:23, 3 users, load average: 1,26, 1,76, 3,84
03:04:01 up 221 days, 5:24, 3 users, load average: 0,74, 1,53, 3,63
03:05:01 up 221 days, 5:25, 3 users, load average: 0,60, 1,35, 3,43
03:06:01 up 221 days, 5:26, 3 users, load average: 1,27, 1,42, 3,33
03:07:01 up 221 days, 5:27, 4 users, load average: 1,13, 1,37, 3,19
03:08:01 up 221 days, 5:28, 4 users, load average: 0,81, 1,21, 3,02
03:09:01 up 221 days, 5:29, 4 users, load average: 16,35, 6,29, 4,68
03:10:01 up 221 days, 5:30, 4 users, load average: 12,01, 7,55, 5,26
03:11:01 up 221 days, 5:31, 4 users, load average: 20,01, 10,72, 6,48
03:12:01 up 221 days, 5:32, 4 users, load average: 8,81, 9,19, 6,22
04:25:01 up 221 days, 6:45, 5 users, load average: 0,20, 0,36, 0,66
04:26:01 up 221 days, 6:46, 5 users, load average: 0,64, 0,47, 0,68
04:27:01 up 221 days, 6:47, 5 users, load average: 0,47, 0,45, 0,66
这似乎完全独立于实际流量或任何正在运行的 cronjob 而发生。然而,我可以将此归咎于 Apache2,所以我最终稍微降低了 prefork 模块的数字:
<IfModule mpm_prefork_module>
StartServers 2
MinSpareServers 25
MaxSpareServers 75
MaxClients 150
MaxRequestsPerChild 500
</IfModule>
最后,这些负载峰值消失了,但负载仍然比旧服务器略高(即使在低流量阶段,例如在晚上,负载也会上升到 3/4)。
今天早上,当流量不断增加时,我注意到 Apache 变得没有响应(出现很多超时,甚至本地主机到本地主机的 HTTP 连接也失败了)。当我检查服务器时,一切看起来都很正常(负载低于 2,Mysql 和 Apache 进程以较低的 CPU/IO % 运行)。
因此,出于没有更好的想法,我将所有内容切换回旧服务器,该服务器目前运行正常(负载峰值约为 1)。
我觉得这整件事很奇怪,因为所有脚本/数据库都刚刚镜像到新服务器。这也是一台全新的 Xeon 服务器,具有更多 RAM,而之前的服务器是一台较旧的 Opteron(两者都有 SSD)。
我刚刚从 Debian 7 升级到 Debian 8,但基本上保留了所有服务的默认设置(上述服务除外)。
欢迎任何提示/帮助!
答案1
终于找到问题所在了。在 apache2.conf 中注释掉此行时
Mutex file:${APACHE_LOCK_DIR} default
那些负载峰值终于消失了。