我的 FreeBSD 服务器已经完美运行了 2 年多,系统没有发生任何重大变化。最近我使用 Apache 的 mod_ssl 安装了 SSL 证书,运行 10 天后,服务器突然开始崩溃。
当服务器崩溃时:
- HTTPS 和 SSH 立即失去响应
- PING 速度减慢至数千毫秒,然后停止响应
15 至 60 分钟无法接通后:
- 服务器突然恢复并开始全速运行 - 就像什么都没发生一样
- 然后在 15-60 分钟内它再次崩溃并重复此循环
我检查了:
- 当我重新启动服务器时,什么都没有改变 - 它仍然无法访问
- CPU / RAM / HDD 使用率 - 正常(< 50%,包括高峰时段)
- 交通没有影响 - 一天中的任何时间都会发生,包括凌晨 4 点
- 禁用防火墙没有帮助
在 httpd-error.log 中我发现:
[notice] Digest: generating secret for digest authentication ...
[notice] Digest: done
[notice] Apache/2.2.23 (FreeBSD) mod_ssl/2.2.23 OpenSSL/0.9.8q DAV/2 configured -- resuming normal operations
[error] server reached MaxClients setting, consider raising the MaxClients setting
我尝试启用 KeepAlive 并大幅(4 倍)增加 MaxClients 大小,但这并不能解决问题:
Timeout 120
KeepAlive On
KeepAliveTimeout 5
MaxKeepAliveRequests 1000
<IfModule mpm_prefork_module>
StartServers 50
MinSpareServers 128
MaxSpareServers 1024
ServerLimit 1024
MaxClients 1024
MaxRequestsPerChild 1000
</IfModule>
在第一次崩溃之前,我在 /var/log/messages 中发现:
kernel: mfi0: 228755 (454057919s/0x0008/FATAL) - Battery needs replacement - SOH Bad
kernel: mfi0: 228756 (454057984s/0x0008/FATAL) - Battery needs replacement - SOH Bad
kernel: mfi0: 228757 (454058049s/0x0008/FATAL) - Battery needs replacement - SOH Bad
kernel: arp: 176.31.237.254 moved from 00:07:b4:00:00:01 to 00:07:b4:00:00:03 on ix0
kernel: arp: 176.31.237.251 moved from 00:25:90:02:08:fc to 00:07:b4:00:00:01 on ix0
kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:01 to 00:07:b4:00:00:03 on ix0
kernel: mfi0: 228758 (454058114s/0x0008/FATAL) - Battery needs replacement - SOH Bad
kernel: mfi0: 228759 (454058179s/0x0008/FATAL) - Battery needs replacement - SOH Bad
第一次重启后,“电池需要更换”警告消失,但 arp 消息在服务器崩溃时以大约相同的间隔不断出现在日志中:
May 23 05:00:00 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:03 to 00:07:b4:00:00:01 on ix0
May 23 05:00:02 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:01 to 00:25:90:02:08:fc on ix0
May 23 05:20:00 ns228407 kernel: arp: 176.31.237.251 moved from 00:25:90:02:08:fc to 00:07:b4:00:00:01 on ix0
May 23 05:20:00 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:01 to 00:07:b4:00:00:03 on ix0
May 23 05:32:44 ns228407 kernel: arp: 176.31.237.254 moved from 00:07:b4:00:00:03 to 00:07:b4:00:00:01 on ix0
May 23 05:40:01 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:03 to 00:25:90:02:08:fc on ix0
May 23 05:40:01 ns228407 kernel: arp: 176.31.237.251 moved from 00:25:90:02:08:fc to 00:07:b4:00:00:01 on ix0
May 23 05:40:01 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:01 to 00:07:b4:00:00:03 on ix0
May 23 05:52:40 ns228407 kernel: arp: 176.31.237.254 moved from 00:07:b4:00:00:01 to 00:07:b4:00:00:03 on ix0
May 23 06:00:00 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:03 to 00:25:90:02:08:fc on ix0
May 23 06:00:00 ns228407 kernel: arp: 176.31.237.251 moved from 00:25:90:02:08:fc to 00:07:b4:00:00:01 on ix0
May 23 06:00:00 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:01 to 00:07:b4:00:00:03 on ix0
May 23 06:00:02 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:03 to 00:25:90:02:08:fc on ix0
May 23 06:20:01 ns228407 kernel: arp: 176.31.237.251 moved from 00:25:90:02:08:fc to 00:07:b4:00:00:03 on ix0
May 23 06:20:01 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:03 to 00:07:b4:00:00:01 on ix0
May 23 06:30:02 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:01 to 00:25:90:02:08:fc on ix0
May 23 06:32:36 ns228407 kernel: arp: 176.31.237.254 moved from 00:07:b4:00:00:03 to 00:07:b4:00:00:01 on ix0
May 23 06:50:01 ns228407 kernel: arp: 176.31.237.251 moved from 00:25:90:02:08:fc to 00:07:b4:00:00:01 on ix0
May 23 06:50:01 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:01 to 00:07:b4:00:00:03 on ix0
May 23 07:00:02 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:03 to 00:25:90:02:08:fc on ix0
May 23 07:12:28 ns228407 kernel: arp: 176.31.237.254 moved from 00:07:b4:00:00:01 to 00:07:b4:00:00:03 on ix0
May 23 07:20:00 ns228407 kernel: arp: 176.31.237.251 moved from 00:25:90:02:08:fc to 00:07:b4:00:00:01 on ix0
May 23 07:20:00 ns228407 kernel: arp: 176.31.237.251 moved from 00:07:b4:00:00:01 to 00:07:b4:00:00:03 on ix0
我下一步该做什么来发现并解决问题?
答案1
您现在应该做的最后一件事是增加 MaxClients。
这很难说。速度变慢和 MaxClients 警告表明您对服务器的需求太多,无法应付。除非您在服务器上运行大量 AJAX/COMET 内容,否则您确实应该减少 keepalive 超时(例如,最初为 2)。
“电池需要更换”不仅仅是提醒进行一些维护 - 在 BBWC 上,这意味着控制器不再尝试缓存写入 - 如果您的系统设置正确,那么您的操作系统和磁盘也不会缓存写入。
两者都表明您的系统性能非常糟糕 - 但您报告的第一件事是它似乎不可用 - 事实上您没有提到性能 - 了解如何衡量性能和捕获数据应该是您的首要任务。
我不确定为什么地址一直在移动(我假设这些是本地接口) - 这可能是其他地方的负载的结果。
这是一只生病的小狗 - 你必须开始一次解决一个问题,直到你更清楚地了解出了什么问题。
首先切换电池、调整 apache 安装并记录性能指标。