我目前在一台服务器上运行几个流量很低的网站,但 CPU 占用率很高。其中一个网站仍在开发中,很快就会上线。但是,这个网站的速度非常非常慢……浏览其页面时,我可以看到 httpd 的 CPU 占用率从 30% 上升到 100%(见下面的顶部输出)。
我已经调整了 httpd 和 MySQL、Apache Solr、Tomcat 以获得高性能,并且我正在使用 APC。
不确定接下来要做什么或如何找到罪魁祸首,因为我在 httpd 日志上有很多消息,并且已经追逐死胡同一段时间了... 任何帮助都将不胜感激。
服务器: 正品 AMD,四核 AMD Opteron(tm) 处理器 2352,RAM 16GB
Linux 2.6.27 64 位、Centos 5.5
Plesk 9.5.4、MySQL 5.1.48、PHP 5.2.17
Apache/2.2.3(CentOS)DAV/2 mod_jk/1.2.15 mod_ssl/2.2.3 OpenSSL/0.9.8e-fips-rhel5 PHP/5.2.17 mod_perl/2.0.4 Perl/v5.8.8
Tomcat6-6.0.29-1.jpp5、Tomcat-native-1.1.20-1.el5、Apache Solr
顶部
17595 apache 20 0 1825m 507m 10m R 100.4 3.2 0:17.50 httpd
17596 apache 20 0 1565m 247m 9936 R 83.1 1.5 0:10.86 httpd
17598 apache 20 0 1430m 110m 6472 S 54.5 0.7 0:08.66 httpd
17599 apache 20 0 1438m 124m 12m S 37.2 0.8 0:11.20 httpd
16197 mysql 20 0 13.0g 2.0g 5440 S 9.6 12.6 297:12.79 mysqld
17617 root 20 0 12748 1172 812 R 0.7 0.0 0:00.88 top
8169 tomcat 20 0 4613m 268m 6056 S 0.3 1.7 6:40.56 java
httpd 错误日志
[debug] prefork.c(991): AcceptMutex: sysvsem (default: sysvsem)
[info] mod_fcgid: Process manager 17593 started
[debug] proxy_util.c(1854): proxy: grabbed scoreboard slot 0 in child 17594 for worker proxy:reverse
[debug] proxy_util.c(1967): proxy: initialized single connection worker 0 in child 17594 for (*)
[debug] proxy_util.c(1854): proxy: grabbed scoreboard slot 0 in child 17595 for worker proxy:reverse
[debug] proxy_util.c(1873): proxy: worker proxy:reverse already initialized
[notice] child pid 22782 exit signal Segmentation fault (11)
[error] (43)Identifier removed: apr_global_mutex_lock(jk_log_lock) failed
[debug] util_ldap.c(2021): LDAP merging Shared Cache conf: shm=0x7fd29a5478c0 rmm=0x7fd29a547918 for VHOST: example.com
[info] APR LDAP: Built with OpenLDAP LDAP SDK
[info] LDAP: SSL support available
[info] Init: Seeding PRNG with 256 bytes of entropy
[info] Init: Generating temporary RSA private keys (512/1024 bits)
[info] Init: Generating temporary DH parameters (512/1024 bits)
[debug] ssl_scache_shmcb.c(374): shmcb_init allocated 512000 bytes of shared memory
[debug] ssl_scache_shmcb.c(554): entered shmcb_init_memory()
[debug] ssl_scache_shmcb.c(576): for 512000 bytes, recommending 4265 indexes
[debug] ssl_scache_shmcb.c(619): shmcb_init_memory choices follow
[debug] ssl_scache_shmcb.c(621): division_mask = 0x1F
[debug] ssl_scache_shmcb.c(623): division_offset = 96
[debug] ssl_scache_shmcb.c(625): division_size = 15997
[debug] ssl_scache_shmcb.c(627): queue_size = 2136
[debug] ssl_scache_shmcb.c(629): index_num = 133
[debug] ssl_scache_shmcb.c(631): index_offset = 8
[debug] ssl_scache_shmcb.c(633): index_size = 16
[debug] ssl_scache_shmcb.c(635): cache_data_offset = 8
[debug] ssl_scache_shmcb.c(637): cache_data_size = 13853
[debug] ssl_scache_shmcb.c(650): leaving shmcb_init_memory()
答案1
尝试将 %P(和 %D)添加到您的日志文件 - 然后您应该能够将“top”中看到的内容与您的访问日志关联起来。
答案2
[通知] 子进程 pid 22782 退出信号分段错误 (11)
这里肯定出了问题,您应该将其添加ulimit -c unlimited
到开头,/etc/init.d/httpd
以便在下次出现段错误时获取核心转储。mod_jk 可能是问题的根源,因为日志中有一个与 mod_jk 相关的错误。
答案3
我在列表中看到了 mod_perl。这个站点是用 PERL 编写的应用程序吗?如果是这样,那么编写不当的 PERL 代码就是问题的根源。
同样的评价也适用于 PHP。PHP 应用程序不以性能著称,而 CMS 应用程序则以资源消耗大而闻名。如果您是托管服务提供商,最好禁止此 CMS 软件包或收取更高的费用以弥补额外的资源。
但是,如果您是为了自己使用而运行此 CMS,由于它是开源的,您应该在 StackOverflow 上发布另一个问题,命名软件包并询问如何追踪和修复编写不当的代码。
答案4
我再也没有看到分段错误,但我仍然看到来自 httpd 的高 CPU 使用率。我能够对具有 CPU 的 httpd 进程运行 strace,并得到以下结果:
# strace -c -p 28964
Process 28964 attached - interrupt to quit
^CProcess 28964 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
88.94 0.006093 0 98299 4562 lstat
3.01 0.000206 0 2740 getcwd
2.28 0.000156 0 2158 2 read
2.26 0.000155 0 541 37 open
1.68 0.000115 0 1321 1321 readlink
1.52 0.000104 0 1678 822 access
0.32 0.000022 0 502 fstat
0.00 0.000000 0 25 write
0.00 0.000000 0 507 close
0.00 0.000000 0 547 478 stat
0.00 0.000000 0 23 poll
0.00 0.000000 0 2 rt_sigaction
0.00 0.000000 0 2 rt_sigprocmask
0.00 0.000000 0 2 writev
0.00 0.000000 0 3 setitimer
0.00 0.000000 0 1 sendfile
...
------ ----------- ----------- --------- --------- ----------------
100.00 0.006851 108381 7224 total
lstat 中的 4562 错误是同一类型的错误,并在日志文件中显示如下:
# strace -f -t -o /var/log/strace.output -p 28964
strace.输出
28964 07:10:38 lstat("/var", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
28964 07:10:38 lstat("/var/www", {st_mode=S_IFDIR|0755, st_size=94, ...}) = 0
28964 07:10:38 lstat("/var/www/vhosts", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
28964 07:10:38 lstat("/var/www/vhosts/example.com", {st_mode=S_IFDIR|0777, st_size=4096, ...}) = 0
28964 07:10:38 lstat("/var/www/vhosts/example.com/httpdocs", {st_mode=S_IFDIR|0777, st_size=4096, ...}) = 0
28964 07:10:38 lstat("/var/www/vhosts/example.com/httpdocs/sites", {st_mode=S_IFDIR|0755, st_size=30, ...}) = 0
28964 07:10:38 lstat("/var/www/vhosts/example.com/httpdocs/sites/all", {st_mode=S_IFDIR|0755, st_size=66, ...}) = 0
28964 07:10:38 lstat("/var/www/vhosts/example.com/httpdocs/sites/all/modules", {st_mode=S_IFDIR|0755, st_size=12288, ...}) = 0
28964 07:10:38 lstat("/var/www/vhosts/example.com/httpdocs/sites/all/modules/views", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
28964 07:10:38 lstat("/var/www/vhosts/example.com/httpdocs/sites/all/modules/views/includes", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
28964 07:10:38 lstat("/var/www/vhosts/example.com/httpdocs/sites/all/modules/views/includes/sites", 0x7fff1e627370) = -1 ENOENT (No such file or directory)
上面列出的文件夹都位于本网站目录中,是 Drupal CMS 的一部分。但是列出的最后一个
/var/www/vhosts/example.com/httpdocs/sites/all/modules/views/includes/sites
不存在,而且实际上应该是
/var/www/vhosts/example.com/httpdocs/sites
它确实存在。看起来 lstat 正在尝试读取一个不存在的目录....?
-1 ENOENT (No such file or directory)
解决此问题并找出丢失目录错误根源的最佳方法是什么?