我整天都在处理这个问题,这让我快要疯了。这里的所有谷歌搜索结果和搜索都无果而终。我希望有人能和我一起为我自己和未来的受害者提供解决方案。让我们开始吧。
我运营着一个非常受欢迎的网站,每天的页面浏览量超过 300 万次。平均每秒有 34 次页面浏览量,但更实际的是,在高峰时段,每秒的页面浏览量超过 300 次。将这些视为请求。
我正在运行 Ubuntu 10.04 64 位服务器,配备 2 个 E5620 CPU、12GB RAM 和 Micron P300 6Gb/s SSD。在高峰时段,CPU 和内存负载处于平均水平(使用 20-30% 的 CPU 和一半的内存)。
支持此网站的软件是:NGINX、MySQL、PHP5-FPM、PHP-APC 和 Memcached。好的,现在终于到了文章的重点,这是我的错误日志。记录了大量此类错误。
/var/log/php5-fpm
Jul 20 14:49:47.289895 [NOTICE] fpm is running, pid 29373
Jul 20 14:49:47.337092 [NOTICE] ready to handle connections
Jul 20 14:51:23.957504 [ERROR] [pool www] unable to retrieve process activity of one or more child(ren). Will try again later.
Jul 20 14:51:41.846439 [WARNING] [pool www] child 29534 exited with code 1 after 114.518174 seconds from start
Jul 20 14:51:41.846797 [NOTICE] [pool www] child 29597 started
Jul 20 14:51:41.896653 [WARNING] [pool www] child 29408 exited on signal 11 SIGSEGV after 114.596706 seconds from start
Jul 20 14:51:41.897178 [NOTICE] [pool www] child 29598 started
Jul 20 14:51:41.903286 [WARNING] [pool www] child 29398 exited with code 1 after 114.605761 seconds from start
Jul 20 14:51:41.903719 [NOTICE] [pool www] child 29600 started
Jul 20 14:51:41.907816 [WARNING] [pool www] child 29437 exited with code 1 after 114.601417 seconds from start
Jul 20 14:51:41.908253 [NOTICE] [pool www] child 29601 started
Jul 20 14:51:41.916002 [WARNING] [pool www] child 29513 exited with code 1 after 114.592514 seconds from start
Jul 20 14:51:41.916501 [NOTICE] [pool www] child 29602 started
Jul 20 14:51:41.916558 [WARNING] [pool www] child 29494 exited on signal 11 SIGSEGV after 114.597355 seconds from start
Jul 20 14:51:41.916873 [NOTICE] [pool www] child 29603 started
Jul 20 14:51:41.921389 [WARNING] [pool www] child 29502 exited with code 1 after 114.600405 seconds from start
/var/log/nginx/error.log
2011/07/20 15:48:42 [error] 29583#0: *569743 readv() failed (104: Connection reset by peer) while reading upstream, client: 77.223.197.193, server: domain.com, request: "GET /favicon.ico HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "www.domain.com"
2011/07/20 15:48:42 [error] 29578#0: *571695 readv() failed (104: Connection reset by peer) while reading upstream, client: 150.70.64.196, server: domain.com, request: "GET /page HTTP/1.0", upstream: "fastcgi://127.0.0.1:9000", host: "www.domain.com"
2011/07/20 15:48:42 [error] 29581#0: *571050 readv() failed (104: Connection reset by peer) while reading upstream, client: 110.136.157.66, server: domain.com, request: "GET /page HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "www.domain.com"
2011/07/20 15:48:42 [error] 29581#0: *564892 readv() failed (104: Connection reset by peer) while reading upstream, client: 110.136.161.214, server: domain.com, request: "GET /page HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "www.domain.com"
2011/07/20 15:48:42 [error] 29585#0: *456171 readv() failed (104: Connection reset by peer) while reading upstream, client: 93.223.33.135, server: domain.com, request: "GET /favicon.ico HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "www.domain.com"
2011/07/20 15:48:42 [error] 29585#0: *471192 readv() failed (104: Connection reset by peer) while reading upstream, client: 74.90.33.142, server: domain.com, request: "GET /page HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "www.domain.com"
2011/07/20 15:48:42 [error] 29580#0: *570132 readv() failed (104: Connection reset by peer) while reading upstream, client: 180.246.182.191, server: domain.com, request: "GET /page HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "www.domain.com"
最后,我想指出的是,我确实尝试禁用 PHP-APC 以查看它是否是 opt cacher 的一个错误,但段错误仍然存在。我还安装了 PHP5-SUHOSIN 并将其禁用,但错误仍然不断发生。
我将非常感激您的帮助。谢谢。
答案1
安装 PHP 和所有 PHP 模块的调试符号(如果 Ubuntu 提供它们;否则您需要在启用调试的情况下重建),然后按照我的回答启用核心转储这个问题从几个小时前开始。然后启动 GDB 并开始行动。