问题的一般描述
我们目前正在 PHP 的 PaaS 类型解决方案上运行一个应用程序。他们的解决方案基于 AWS 云,由于他们的计划不符合我们的扩展需求,我们决定直接迁移到 AWS。该应用程序在生产环境中以每分钟约 400 个请求的速度在应用程序响应中表现“良好”,约为 100 毫秒,但使用我在 AWS 上的设置需要很长时间才能响应。请记住,每个请求都会执行数据库插入 + 一些计算统计数据的昂贵选择。
当前 AWS 设置尝试
1 台中型 RDS 服务器(这不是瓶颈,因为我已经检查过了) 1 台运行 nginx + PHP FPM + Ubuntu x64 14.04 的中型 r3 EC2 服务器 我一直在运行一些基准测试并尝试尽可能接近地模拟我们的正常流量负载,但它在恒定负载下开始出现故障。
当前正在使用的配置
Nginx
user www-data;
worker_processes 2;
pid /run/nginx.pid;
worker_rlimit_nofile 30000;
events {
worker_connections 8192;
#multi_accept on;
use epoll;
}
http {
##
# Basic Settings
##
sendfile on;
tcp_nopush on;
tcp_nodelay off;
keepalive_timeout 30;
types_hash_max_size 2048;
server_tokens off;
# increase buffer and timeouts
fastcgi_connect_timeout 60;
fastcgi_send_timeout 180;
fastcgi_read_timeout 180;
fastcgi_buffer_size 128k;
fastcgi_buffers 4 256k;
fastcgi_busy_buffers_size 256k;
fastcgi_temp_file_write_size 256k;
fastcgi_intercept_errors on;
}
Nginx 站点/可用
server {
listen 80;
server_name project.example.com;
root /var/www/project/public/;
access_log off;
charset utf-8;
index index.html index.htm index.php;
location / {
try_files $uri $uri/ /index.php?q=$uri&$args;
}
# catch all
error_page 404 /index.php;
location ~ \.php$ {
# Pass the PHP files to PHP FastCGI for processing
fastcgi_split_path_info ^(.+\.php)(/.+)$;
include /etc/nginx/fastcgi_params;
#fastcgi_pass 127.0.0.1:9000;
fastcgi_pass unix:/var/run/php5-fpm.sock;
fastcgi_index index.php;
}
}
php-fpm.conf
emergency_restart_threshold = 3
emergency_restart_interval = 1m
process_control_timeout = 5s
php-fpm 池
pm = dynamic
pm.max_children = 48
pm.start_servers = 18
pm.min_spare_servers = 16
pm.max_spare_servers = 24
pm.max_requests = 50
/etc/sysctl.conf
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 16384 16777216
net.core.somaxconn = 4096
net.core.netdev_max_backlog = 16384
net.ipv4.tcp_max_syn_backlog = 8192
net.ipv4.tcp_syncookies = 1
net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_congestion_control = cubic
Opcache 设置
php -i | grep opcache
Additional .ini files parsed => /etc/php5/cli/conf.d/05-opcache.ini,
opcache.blacklist_filename => no value => no value
opcache.consistency_checks => 0 => 0
opcache.dups_fix => Off => Off
opcache.enable => On => On
opcache.enable_cli => Off => Off
opcache.enable_file_override => Off => Off
opcache.error_log => no value => no value
opcache.fast_shutdown => 0 => 0
opcache.file_update_protection => 2 => 2
opcache.force_restart_timeout => 180 => 180
opcache.inherited_hack => On => On
opcache.interned_strings_buffer => 4 => 4
opcache.load_comments => 1 => 1
opcache.log_verbosity_level => 1 => 1
opcache.max_accelerated_files => 50000 => 50000
opcache.max_file_size => 0 => 0
opcache.max_wasted_percentage => 5 => 5
opcache.memory_consumption => 128 => 128
opcache.optimization_level => 0xFFFFFFFF => 0xFFFFFFFF
opcache.preferred_memory_model => no value => no value
opcache.protect_memory => 0 => 0
opcache.restrict_api => no value => no value
opcache.revalidate_freq => 2 => 2
opcache.revalidate_path => Off => Off
opcache.save_comments => 1 => 1
opcache.use_cwd => On => On
opcache.validate_timestamps => On => On
php -i | grep apc
/etc/php5/cli/conf.d/20-apcu.ini,
apc
apcu
apc.coredump_unmap => Off => Off
apc.enable_cli => Off => Off
apc.enabled => On => On
apc.entries_hint => 4096 => 4096
apc.gc_ttl => 3600 => 3600
apc.mmap_file_mask => no value => no value
apc.preload_path => no value => no value
apc.rfc1867 => Off => Off
apc.rfc1867_freq => 0 => 0
apc.rfc1867_name => APC_UPLOAD_PROGRESS => APC_UPLOAD_PROGRESS
apc.rfc1867_prefix => upload_ => upload_
apc.rfc1867_ttl => 3600 => 3600
apc.serializer => php => php
apc.shm_segments => 1 => 1
apc.shm_size => 32M => 32M
apc.slam_defense => On => On
apc.smart => 0 => 0
apc.ttl => 0 => 0
apc.use_request_time => On => On
apc.writable => /tmp => /tmp
基准测试结果
AWS 设置Concurrency Level: 75
Time taken for tests: 18.836 seconds
Complete requests: 111
Failed requests: 0
Total transferred: 238269 bytes
HTML transferred: 36630 bytes
Requests per second: 5.89 [#/sec] (mean)
Time per request: 12726.963 [ms] (mean)
Time per request: 169.693 [ms] (mean, across all concurrent requests)
Transfer rate: 12.35 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 58 226 113.0 276 390
Processing: 2099 8960 3593.2 9524 16784
Waiting: 2087 8947 3591.8 9517 16783
Total: 2377 9186 3585.0 9593 17164
Percentage of the requests served within a certain time (ms)
50% 9512
66% 11085
75% 11747
80% 12323
90% 12954
95% 14459
98% 15792
99% 16201
100% 17164 (longest request)
生产设置
Document Length: 331 bytes
Concurrency Level: 75
Time taken for tests: 7.544 seconds
Complete requests: 595
Failed requests: 0
Total transferred: 1220905 bytes
HTML transferred: 196945 bytes
Requests per second: 78.87 [#/sec] (mean)
Time per request: 950.922 [ms] (mean)
Time per request: 12.679 [ms] (mean, across all concurrent requests)
Transfer rate: 158.05 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 58 105 78.4 76 384
Processing: 265 787 263.1 725 1382
Waiting: 265 785 262.9 723 1381
Total: 419 891 267.7 836 1742
Percentage of the requests served within a certain time (ms)
50% 836
66% 1002
75% 1071
80% 1129
90% 1263
95% 1376
98% 1662
99% 1672
100% 1742 (longest request)
示例top
输出:
top - 12:58:24 up 4 min, 1 user, load average: 41.69, 16.15, 5.95
Tasks: 121 total, 51 running, 70 sleeping, 0 stopped, 0 zombie
%Cpu(s): 17.7 us, 21.0 sy, 0.0 ni, 40.2 id, 0.9 wa, 0.2 hi, 0.0 si, 20.0 st
KiB Mem: 3838876 total, 643628 used, 3195248 free, 18028 buffers
KiB Swap: 1048572 total, 0 used, 1048572 free. 169920 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1035 www-data 20 0 354540 24272 15224 R 11.8 0.6 0:02.61 php5-fpm
1037 www-data 20 0 356116 26092 15804 R 11.8 0.7 0:02.66 php5-fpm
1038 www-data 20 0 355344 25036 15552 R 11.8 0.7 0:02.64 php5-fpm
1042 www-data 20 0 355588 25392 15660 R 11.8 0.7 0:02.59 php5-fpm
1044 www-data 20 0 354548 24820 15760 R 11.8 0.6 0:02.63 php5-fpm
1047 www-data 20 0 356364 26416 15792 R 11.8 0.7 0:02.63 php5-fpm
1538 www-data 20 0 356300 25092 14624 R 11.8 0.7 0:02.39 php5-fpm
1046 www-data 20 0 356628 26616 15740 R 5.9 0.7 0:02.61 php5-fpm
1051 www-data 20 0 356360 26572 15960 R 5.9 0.7 0:02.63 php5-fpm
1052 www-data 20 0 354544 24780 15988 R 5.9 0.6 0:02.63 php5-fpm
1512 www-data 20 0 353124 21904 14620 R 5.9 0.6 0:02.55 php5-fpm
1514 www-data 20 0 355856 24540 14620 R 5.9 0.6 0:02.49 php5-fpm
1517 www-data 20 0 355272 24028 14620 R 5.9 0.6 0:02.48 php5-fpm
1518 www-data 20 0 355048 24176 14620 R 5.9 0.6 0:02.44 php5-fpm
1520 www-data 20 0 355600 24264 14620 R 5.9 0.6 0:02.44 php5-fpm
1525 www-data 20 0 355344 24460 14620 R 5.9 0.6 0:02.41 php5-fpm
1527 www-data 20 0 355344 24436 14620 R 5.9 0.6 0:02.41 php5-fpm
1528 www-data 20 0 354760 23848 14620 R 5.9 0.6 0:02.41 php5-fpm
1539 www-data 20 0 356072 25200 14620 R 5.9 0.7 0:02.38 php5-fpm
我的结论
- 我尝试通过以所需的负载提供 .txt 文件来查看 nginx 是否是罪魁祸首,结果运行正常。
- 然后,我尝试通过在生产和 AWS 上提供一个包含“echo “ok”;”的非常简单的 .php 文件来查看 php-fpm 是否有问题,实际上 AWS 在更高负载下表现更好一些
- MySQL RDS 数据库有 11 个活动连接,即使我用
ab -c 100 -n 10000
- 硬件看起来也不像是问题所在,因为在测试时它有 30% 的 CPU 负载、大量可用 RAM 并且交换未受影响。
- 我没有在 nginx 或 php-fpm 各自的日志中收到任何错误,只是响应很慢。
- 代码库本身不可能是问题,因为它为什么会在原始生产服务器上表现良好?