问题的一般描述

问题的一般描述

问题的一般描述

我们目前正在 PHP 的 PaaS 类型解决方案上运行一个应用程序。他们的解决方案基于 AWS 云,由于他们的计划不符合我们的扩展需求,我们决定直接迁移到 AWS。该应用程序在生产环境中以每分钟约 400 个请求的速度在应用程序响应中表现“良好”,约为 100 毫秒,但使用我在 AWS 上的设置需要很长时间才能响应。请记住,每个请求都会执行数据库插入 + 一些计算统计数据的昂贵选择。

当前 AWS 设置尝试

1 台中型 RDS 服务器(这不是瓶颈,因为我已经检查过了) 1 台运行 nginx + PHP FPM + Ubuntu x64 14.04 的中型 r3 EC2 服务器 我一直在运行一些基准测试并尝试尽可能接近地模拟我们的正常流量负载,但它在恒定负载下开始出现故障。

当前正在使用的配置

Nginx

user www-data;
worker_processes 2;
pid /run/nginx.pid;
worker_rlimit_nofile 30000;

events {
    worker_connections 8192;
    #multi_accept on;
    use epoll;
}

http {

    ##
    # Basic Settings
    ##

    sendfile on;
    tcp_nopush on;
    tcp_nodelay off;
    keepalive_timeout 30;
    types_hash_max_size 2048;
    server_tokens off;

    # increase buffer and timeouts
    fastcgi_connect_timeout 60;
    fastcgi_send_timeout 180;
    fastcgi_read_timeout 180;
    fastcgi_buffer_size 128k;
    fastcgi_buffers 4 256k;
    fastcgi_busy_buffers_size 256k;
    fastcgi_temp_file_write_size 256k;
    fastcgi_intercept_errors on;
}

Nginx 站点/可用

server {
  listen          80;
  server_name     project.example.com;

  root            /var/www/project/public/;
  access_log      off;

  charset utf-8;

  index index.html index.htm index.php;


  location / {
      try_files $uri $uri/ /index.php?q=$uri&$args;
  } 

  # catch all
  error_page 404 /index.php;


  location ~ \.php$ {
      # Pass the PHP files to PHP FastCGI for processing

      fastcgi_split_path_info ^(.+\.php)(/.+)$;
      include /etc/nginx/fastcgi_params;
      #fastcgi_pass 127.0.0.1:9000;
      fastcgi_pass unix:/var/run/php5-fpm.sock;
      fastcgi_index index.php;
  }
}

php-fpm.conf

emergency_restart_threshold = 3
emergency_restart_interval = 1m
process_control_timeout = 5s

php-fpm 池

pm = dynamic
pm.max_children = 48
pm.start_servers = 18
pm.min_spare_servers = 16
pm.max_spare_servers = 24
pm.max_requests = 50

/etc/sysctl.conf

net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 16384 16777216
net.core.somaxconn = 4096
net.core.netdev_max_backlog = 16384
net.ipv4.tcp_max_syn_backlog = 8192
net.ipv4.tcp_syncookies = 1
net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_congestion_control = cubic

Opcache 设置

php -i | grep opcache

Additional .ini files parsed => /etc/php5/cli/conf.d/05-opcache.ini,
opcache.blacklist_filename => no value => no value
opcache.consistency_checks => 0 => 0
opcache.dups_fix => Off => Off
opcache.enable => On => On
opcache.enable_cli => Off => Off
opcache.enable_file_override => Off => Off
opcache.error_log => no value => no value
opcache.fast_shutdown => 0 => 0
opcache.file_update_protection => 2 => 2
opcache.force_restart_timeout => 180 => 180
opcache.inherited_hack => On => On
opcache.interned_strings_buffer => 4 => 4
opcache.load_comments => 1 => 1
opcache.log_verbosity_level => 1 => 1
opcache.max_accelerated_files => 50000 => 50000
opcache.max_file_size => 0 => 0
opcache.max_wasted_percentage => 5 => 5
opcache.memory_consumption => 128 => 128
opcache.optimization_level => 0xFFFFFFFF => 0xFFFFFFFF
opcache.preferred_memory_model => no value => no value
opcache.protect_memory => 0 => 0
opcache.restrict_api => no value => no value
opcache.revalidate_freq => 2 => 2
opcache.revalidate_path => Off => Off
opcache.save_comments => 1 => 1
opcache.use_cwd => On => On
opcache.validate_timestamps => On => On

php -i | grep apc

/etc/php5/cli/conf.d/20-apcu.ini,
apc
apcu
apc.coredump_unmap => Off => Off
apc.enable_cli => Off => Off
apc.enabled => On => On
apc.entries_hint => 4096 => 4096
apc.gc_ttl => 3600 => 3600
apc.mmap_file_mask => no value => no value
apc.preload_path => no value => no value
apc.rfc1867 => Off => Off
apc.rfc1867_freq => 0 => 0
apc.rfc1867_name => APC_UPLOAD_PROGRESS => APC_UPLOAD_PROGRESS
apc.rfc1867_prefix => upload_ => upload_
apc.rfc1867_ttl => 3600 => 3600
apc.serializer => php => php
apc.shm_segments => 1 => 1
apc.shm_size => 32M => 32M
apc.slam_defense => On => On
apc.smart => 0 => 0
apc.ttl => 0 => 0
apc.use_request_time => On => On
apc.writable => /tmp => /tmp

基准测试结果

AWS 设置
Concurrency Level:      75
Time taken for tests:   18.836 seconds
Complete requests:      111
Failed requests:        0
Total transferred:      238269 bytes
HTML transferred:       36630 bytes
Requests per second:    5.89 [#/sec] (mean)
Time per request:       12726.963 [ms] (mean)
Time per request:       169.693 [ms] (mean, across all concurrent requests)
Transfer rate:          12.35 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:       58  226 113.0    276     390
Processing:  2099 8960 3593.2   9524   16784
Waiting:     2087 8947 3591.8   9517   16783
Total:       2377 9186 3585.0   9593   17164

Percentage of the requests served within a certain time (ms)
  50%   9512
  66%  11085
  75%  11747
  80%  12323
  90%  12954
  95%  14459
  98%  15792
  99%  16201
 100%  17164 (longest request)
生产设置
 Document Length:        331 bytes

 Concurrency Level:      75
 Time taken for tests:   7.544 seconds
 Complete requests:      595
 Failed requests:        0
 Total transferred:      1220905 bytes
 HTML transferred:       196945 bytes
 Requests per second:    78.87 [#/sec] (mean)
 Time per request:       950.922 [ms] (mean)
 Time per request:       12.679 [ms] (mean, across all concurrent requests)
 Transfer rate:          158.05 [Kbytes/sec] received

 Connection Times (ms)
               min  mean[+/-sd] median   max
 Connect:       58  105  78.4     76     384
 Processing:   265  787 263.1    725    1382
 Waiting:      265  785 262.9    723    1381
 Total:        419  891 267.7    836    1742

 Percentage of the requests served within a certain time (ms)
   50%    836
   66%   1002
   75%   1071
   80%   1129
   90%   1263
   95%   1376
   98%   1662
   99%   1672
  100%   1742 (longest request)     
示例top输出:
top - 12:58:24 up 4 min,  1 user,  load average: 41.69, 16.15, 5.95
Tasks: 121 total,  51 running,  70 sleeping,   0 stopped,   0 zombie
%Cpu(s): 17.7 us, 21.0 sy,  0.0 ni, 40.2 id,  0.9 wa,  0.2 hi,  0.0 si, 20.0 st
KiB Mem:   3838876 total,   643628 used,  3195248 free,    18028 buffers
KiB Swap:  1048572 total,        0 used,  1048572 free.   169920 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
 1035 www-data  20   0  354540  24272  15224 R 11.8  0.6   0:02.61 php5-fpm
 1037 www-data  20   0  356116  26092  15804 R 11.8  0.7   0:02.66 php5-fpm
 1038 www-data  20   0  355344  25036  15552 R 11.8  0.7   0:02.64 php5-fpm
 1042 www-data  20   0  355588  25392  15660 R 11.8  0.7   0:02.59 php5-fpm
 1044 www-data  20   0  354548  24820  15760 R 11.8  0.6   0:02.63 php5-fpm
 1047 www-data  20   0  356364  26416  15792 R 11.8  0.7   0:02.63 php5-fpm
 1538 www-data  20   0  356300  25092  14624 R 11.8  0.7   0:02.39 php5-fpm
 1046 www-data  20   0  356628  26616  15740 R  5.9  0.7   0:02.61 php5-fpm
 1051 www-data  20   0  356360  26572  15960 R  5.9  0.7   0:02.63 php5-fpm
 1052 www-data  20   0  354544  24780  15988 R  5.9  0.6   0:02.63 php5-fpm
 1512 www-data  20   0  353124  21904  14620 R  5.9  0.6   0:02.55 php5-fpm
 1514 www-data  20   0  355856  24540  14620 R  5.9  0.6   0:02.49 php5-fpm
 1517 www-data  20   0  355272  24028  14620 R  5.9  0.6   0:02.48 php5-fpm
 1518 www-data  20   0  355048  24176  14620 R  5.9  0.6   0:02.44 php5-fpm
 1520 www-data  20   0  355600  24264  14620 R  5.9  0.6   0:02.44 php5-fpm
 1525 www-data  20   0  355344  24460  14620 R  5.9  0.6   0:02.41 php5-fpm
 1527 www-data  20   0  355344  24436  14620 R  5.9  0.6   0:02.41 php5-fpm
 1528 www-data  20   0  354760  23848  14620 R  5.9  0.6   0:02.41 php5-fpm
 1539 www-data  20   0  356072  25200  14620 R  5.9  0.7   0:02.38 php5-fpm

我的结论

  • 我尝试通过以所需的负载提供 .txt 文件来查看 nginx 是否是罪魁祸首,结果运行正常。
  • 然后,我尝试通过在生产和 AWS 上提供一个包含“echo “ok”;”的非常简单的 .php 文件来查看 php-fpm 是否有问题,实际上 AWS 在更高负载下表现更好一些
  • MySQL RDS 数据库有 11 个活动连接,即使我用ab -c 100 -n 10000
  • 硬件看起来也不像是问题所在,因为在测试时它有 30% 的 CPU 负载、大量可用 RAM 并且交换未受影响。
  • 我没有在 nginx 或 php-fpm 各自的日志中收到任何错误,只是响应很慢。
  • 代码库本身不可能是问题,因为它为什么会在原始生产服务器上表现良好?

相关内容