我们运行两台服务器,一台用于应用程序,一台用于数据库和一些其他软件,以减轻应用程序服务器的负载,我们运行的是 CentOS 和最新版本的 PHP (7.3)、Nginx (1.17.9)、Percona MySQL (5.7)、Redis 和 ElasticSearch。
我们已经尝试了我们所知道的一切,但目前没有任何效果,如果有人能为我们指明正确的方向,那就太好了。
中央处理器
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 63
Model name: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
Stepping: 2
CPU MHz: 2781.445
CPU max MHz: 3200.0000
CPU min MHz: 1200.0000
BogoMIPS: 4788.97
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 20480K
NUMA node0 CPU(s): 0-7,16-23
NUMA node1 CPU(s): 8-15,24-31
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm epb invpcid_single intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm ida arat pln pts md_clear spec_ctrl intel_stibp flush_l1d
应用程序的服务器 RAM
total used free shared buff/cache available
Mem: 23G 9.6G 10G 1.2G 3.3G 8.6G
Swap: 0B 0B 0B
MySQL 的服务器 RAM
total used free shared buff/cache available
Mem: 31G 10G 13G 1.5G 6.6G 18G
Swap: 0B 0B 0B
Nginx 主要配置
user nginx;
worker_processes auto;
worker_rlimit_nofile 100000;
## Load Dynamic Modules ##
#load_module modules/ngx_pagespeed.so;
load_module modules/ngx_http_geoip_module.so;
#load_module modules/ngx_http_perl_module.so;
#load_module modules/ngx_http_brotli_filter_module.so;
#load_module modules/ngx_http_brotli_static_module.so;
pid /var/run/nginx.pid;
events {
worker_connections 10524;
multi_accept on;
accept_mutex off;
}
http {
index index.html index.php;
include mime.types;
default_type application/octet-stream;
#geoip_country /usr/share/GeoIP/GeoIP.dat;
log_format main '$remote_addr - $remote_user [$time_local] "$request" $status $bytes_sent "$http_referer" "$http_user_agent" - "$request_id"';
#log_format error403 '$remote_addr - [$time_local] "$request" "$http_user_agent" - "$request_id" - "$geoip_country_code"';
## Nginx amplify metrics
log_format main_ext '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for" '
'"$host" sn="$server_name" '
'rt=$request_time '
'ua="$upstream_addr" us="$upstream_status" '
'ut="$upstream_response_time" ul="$upstream_response_length" '
'cs=$upstream_cache_status' ;
## Enable POST logging in admin and mask passwords
# log_format adminpost '$remote_addr - "$http_x_forwarded_for" $remote_user [$time_local] "$request" "$http_referer" "$http_user_agent" "$masked_post_pwd_data"';
# perl_set $masked_post_pwd_data '
# sub {
# my $r = shift;
# my $req = $r->request_body;
### test either one line below or create different regex
## $req =~ s/password(%5D|_.+?)?\=\w+/PASSWORD_REMOVED/g;
## $req =~ s/password.+/PASSWORD_REMOVED/g;
# return $req;
# } ';
access_log /var/log/nginx/access.log main_ext;
error_log /var/log/nginx/error.log warn;
keepalive_timeout 5;
autoindex off;
server_tokens off;
port_in_redirect off;
sendfile on;
tcp_nopush on;
tcp_nodelay on;
aio threads=default;
#sendfile_max_chunk 512k;
client_max_body_size 64m;
client_body_buffer_size 128k;
client_header_buffer_size 16k;
large_client_header_buffers 4 16k;
fastcgi_buffer_size 16k;
fastcgi_buffers 4 16k;
# Microcache
#proxy_cache_path /tmp/nginx levels=1:2 keys_zone=microcache:100M max_size=500M inactive=2h;
## Flood protection example (see conf_m2/extra_protect.conf)
limit_req_zone $binary_remote_addr zone=zone1:35m rate=1r/s;
limit_req_zone $binary_remote_addr zone=zone2:35m rate=1r/s;
limit_req_zone $binary_remote_addr zone=zone3:35m rate=1r/s;
## Cache open FD
open_file_cache max=35000 inactive=30s;
open_file_cache_valid 30s;
open_file_cache_min_uses 2;
## SSL global settings
#ssl_session_cache shared:SSL:45m;
#ssl_session_timeout 30m;
#ssl_protocols TLSv1.2 TLSv1.3;
#ssl_ciphers "ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:!3DES:!aNULL:!MD5";
#ssl_prefer_server_ciphers on;
#ssl_dhparam /etc/ssl/certs/dhparams.pem;
#ssl_ecdh_curve secp384r1;
#ssl_buffer_size 4k;
#ssl_stapling on;
#ssl_trusted_certificate /etc/letsencrypt/live/example.com/chain.pem;
#resolver 8.8.8.8 8.8.4.4 valid=3600s;
#resolver_timeout 5s;
## Get real ip from proxy
#set_real_ip_from 127.0.0.1;
include /etc/nginx/conf.d/*.conf;
## Main domain configuration
include /etc/nginx/sites-enabled/*.conf;
}
Nginx 配置
## Maps config file
include conf_m2/maps.conf;
## certbot-auto renew webroot
# server {
# listen 80;
# server_name example.com;
#
# location ~ /\.well-known/acme-challenge {
# root $MAGE_ROOT/pub;
# }
#
# location / { return 301 https://example.com$request_uri; }
# }
## Proxy server to terminate ssl before varnish
#server {
#listen 80;
#listen 443 ssl http2;
#server_name domain.com;
## Gzipping is an easy way to reduce page weight
#gzip on;
#gzip_vary on;
#gzip_proxied any;
#gzip_types application/javascript application/x-javascript application/rss+xml text/javascript text/css text/plain image/x-icon image/svg+xml;
#gzip_buffers 4 16k;
#gzip_comp_level 6;
# Brotli compression alternative to Gzip
#brotli on;
#brotli_types text/xml image/svg+xml application/x-font-ttf image/vnd.microsoft.icon application/x-font-opentype application/json font/eot application/vnd.ms-fontobject application/javascript font/otf application/xml application/xhtml+xml text/javascript application/x-javascript text/plain application/x-font-truetype application/xml+rss image/x-icon font/opentype text/css image/x-win-bitmap;
#brotli_comp_level 8;
#if ($api_access) {
# return 403;
#}
#if ($bad_client) {
# return 403;
#}
## Server maintenance block.
#include conf_m2/maintenance.conf;
## SSL key and cert location
#ssl_certificate /etc/letsencrypt/live/domain.com/fullchain.pem; # managed by Certbot
#ssl_certificate_key /etc/letsencrypt/live/domain.com/privkey.pem; # managed by Certbot
#include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot
#ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot
## Proxy-pass to Varnish
#location / {
# include /etc/nginx/conf_m2/varnish_proxy.conf;
#}
#}
server {
listen 80 reuseport;
server_name domain.com;
location / {
return 301 https://$host$request_uri;
}
}
server {
#listen 80 reuseport;
listen 443 reuseport ssl http2;
server_name domain.com;
## Set Magento root folder
set $MAGE_ROOT /home/domain.com/m2/public;
## Set main public directory /pub
root $MAGE_ROOT/pub;
## Gzipping is an easy way to reduce page weight
gzip on;
gzip_vary on;
gzip_proxied any;
gzip_types application/javascript application/x-javascript application/rss+xml text/javascript text/css text/plain image/x-icon image/svg+xml;
gzip_buffers 4 16k;
gzip_comp_level 6;
# Brotli compression alternative to Gzip
#brotli on;
#brotli_types text/xml image/svg+xml application/x-font-ttf image/vnd.microsoft.icon application/x-font-opentype application/json font/eot application/vnd.ms-fontobject application/javascript font/otf application/xml application/xhtml+xml text/javascript application/x-javascript text/plain application/x-font-truetype application/xml+rss image/x-icon font/opentype text/css image/x-win-bitmap;
#brotli_comp_level 8;
if ($api_access) {
return 403;
}
if ($bad_client) {
return 403;
}
## Server maintenance block.
include conf_m2/maintenance.conf;
## SSL key and cert location
ssl_certificate /etc/letsencrypt/live/domain.com/fullchain.pem; # managed by Certbot
ssl_certificate_key /etc/letsencrypt/live/domain.com/privkey.pem; # managed by Certbot
include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot
ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot
# Improve HTTPS performance
http2_max_field_size 16k;
http2_max_header_size 32k;
ssl_buffer_size 1369;
ssl_session_tickets on;
location ^~ /ms-tool/ {
root $MAGE_ROOT;
index index.php index.html index.htm;
try_files $uri $uri/ /index.php?q=$uri&$args;
location ~ \.php$ {
try_files $uri =404;
fastcgi_split_path_info ^(.+\.php)(/.+)$;
fastcgi_pass 127.0.0.1:9000;
fastcgi_index index.php;
include fastcgi_params;
}
}
location ~ bridge_gWtFceci.php$ {
root $MAGE_ROOT;
try_files $uri =404;
fastcgi_split_path_info ^(.+\.php)(/.+)$;
fastcgi_pass 127.0.0.1:9000;
fastcgi_index index.php;
include fastcgi_params;
}
## phpMyAdmin configuration
include conf_m2/phpmyadmin.conf;
## Nginx and php-fpm status
include conf_m2/status.conf;
## Magento Setup Tool
include conf_m2/setup.conf;
## Deny all internal locations
location ~ ^/(app|generated|lib|bin|var|tmp|phpserver|vendor)/ {
deny all;
}
location / {
try_files $uri $uri/ /index.php$is_args$args;
}
## Error log/page
# include conf_m2/error_page.conf;
## Static location
include conf_m2/assets.conf;
## Protect extra directories
include conf_m2/extra_protect.conf;
## Process php files (strict rule, define files to be executed)
location ~ ^/(index|health_check|get|static|errors/(report|404|503))\.php$ {
try_files $uri =404;
# fastcgi_intercept_errors on;
include conf_m2/php_backend.conf;
## Enable Magento profiler
# fastcgi_param MAGE_PROFILER html;
## Store code with multi domain
fastcgi_param MAGE_RUN_CODE $MAGE_RUN_CODE;
fastcgi_param MAGE_RUN_TYPE $MAGE_RUN_TYPE;
## Enable POST logging in admin
# if ($request_method = POST) {set $adminpost A;}
# if ($request_uri ~* "/ADMIN_PLACEHOLDER/") {set $adminpost "${adminpost}B";}
# if ($adminpost = AB) { access_log /var/log/nginx/admin_post.log adminpost;}
}
## Block other undefined php files, possible injections and random malware hooks.
location ~* \.php$ {
return 404;
}
}
PHP-FPM
[www]
user = domain
group = domain
listen= 127.0.0.1:9000
access.format = "%{mega}MMb %{mili}dms pid=%p %C%% %R - %u %t \"%m %r%Q%q\" %s %f"
access.log = /var/log/php-fpm/$pool.access.log
listen.owner = nginx
listen.group = nginx
listen.mode = 0600
listen.allowed_clients = 127.0.0.1
pm = static
pm.max_children = 200
pm.start_servers = 5
pm.min_spare_servers = 5
pm.max_spare_servers = 5
pm.max_requests = 0
request_slowlog_timeout = 5s
slowlog = /var/log/php-fpm/$pool-slow.log
php_admin_value[error_log] = /var/log/php-fpm/$pool-error.log
php_admin_flag[log_errors] = on
php_admin_value[memory_limit] = 2048M
php_admin_value[upload_max_filesize] = 11M
php_admin_value[post_max_size] = 12M
php_admin_value[max_input_vars] = 7000
php_admin_value[opcache.enable] = 1
php_admin_value[opcache.memory_consumption] = 256
php_admin_value[opcache.max_accelerated_files] = 65406
php_admin_value[opcache.blacklist_filename] = /home/domain.com/m2/public/.opcache-exclude.conf
php_admin_value[opcache.error_log] = /var/log/php-fpm/$pool-opcache-error.log
;php_value[session.save_handler] = files
;php_value[session.save_path] = /var/lib/php/session
;php_value[soap.wsdl_cache_dir] = /var/lib/php/wsdlcache
MySQL
# Percona Server template configuration
[mysqld]
#
# Remove leading # and set to the amount of RAM for the most important data
# cache in MySQL. Start at 70% of total RAM for dedicated server, else 10%.
# innodb_buffer_pool_size = 128M
#
# Remove leading # to turn on a very important data integrity option: logging
# changes to the binary log between backups.
# log_bin
#
# Remove leading # to set options mainly useful for reporting servers.
# The server defaults are faster for transactions and fast SELECTs.
# Adjust sizes as needed, experiment to find the optimal values.
# join_buffer_size = 128M
# sort_buffer_size = 2M
# read_rnd_buffer_size = 2M
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
skip-name-resolve
skip-secure-auth
# SAFETY #
max_allowed_packet = 16M
max_connect_errors = 1000000
innodb = FORCE
# BINARY LOGGING #
log_bin = /var/lib/mysql/mysql-bin
expire_logs_days = 14
sync_binlog = 1
# CACHES AND LIMITS #
query_cache_type = 0
query_cache_size = 0
query_cache_limit = 1M
tmp_table_size = 16M
max_heap_table_size = 16M
join_buffer_size = 256.0K
table_open_cache = 2000
table_definition_cache = 1400
key_buffer_size = 271.5M
# Thread Pools #
thread_handling = pool-of-threads
thread_pool_size = 32
thread_pool_high_prio_mode = none
# INNODB #
innodb_flush_method = O_DIRECT
innodb_log_files_in_group = 2
innodb_log_file_size = 512M
innodb_flush_log_at_trx_commit = 1
innodb_file_per_table = 1
innodb_buffer_pool_size = 23G
innodb_lock_wait_timeout = 120
innodb_numa_interleave = 1
# LOGGING #
general_log = 1
general_log_file = /var/log/mysql/mysql.log
log_error = /var/log/mysql/mysql_error.log
log_warnings = 2
log_queries_not_using_indexes = 1
slow_query_log = 1
slow_query_log_file = /var/log/mysql/mysql-slow.log
已编辑
MySQL 服务器 SSD 信息
[root@db ~]# sudo smartctl -a /dev/sdb
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1062.9.1.el7.x86_64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Samsung based SSDs
Device Model: Samsung SSD 860 EVO 250GB
Serial Number: S4BFNF0M906289Y
LU WWN Device Id: 5 002538 e49912e2f
Firmware Version: RVT03B6Q
User Capacity: 250,059,350,016 bytes [250 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is: SATA 3.2, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Tue Mar 24 09:50:06 2020 PKT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x53) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 85) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 3965
12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 10
177 Wear_Leveling_Count 0x0013 045 045 000 Pre-fail Always - 994
179 Used_Rsvd_Blk_Cnt_Tot 0x0013 100 100 010 Pre-fail Always - 0
181 Program_Fail_Cnt_Total 0x0032 100 100 010 Old_age Always - 0
182 Erase_Fail_Count_Total 0x0032 100 100 010 Old_age Always - 0
183 Runtime_Bad_Block 0x0013 100 100 010 Pre-fail Always - 0
187 Uncorrectable_Error_Cnt 0x0032 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0032 067 062 000 Old_age Always - 33
195 ECC_Error_Rate 0x001a 200 200 000 Old_age Always - 0
199 CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0
235 POR_Recovery_Count 0x0012 099 099 000 Old_age Always - 7
241 Total_LBAs_Written 0x0032 099 099 000 Old_age Always - 63100745018
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 3608 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
256 0 65535 Read_scanning was never started
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
MySQL 服务器 CPU 信息:https://justpaste.it/2bnp9
至于 MySQL 配置,我们正在运行 Percona MySQL 5.7,这是我们在服务器上得到的唯一东西 https://justpaste.it/3snvb
显示全球状态:https://justpaste.it/1lc9s
显示全局变量:https://justpaste.it/2kfhn
显示完整流程列表:https://justpaste.it/23a2m
MySQL 调谐器报告:https://justpaste.it/5bwka
ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 127845
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 127845
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
iostat -xm
Linux 3.10.0-1062.9.1.el7.x86_64 03/26/2020 _x86_64_ (24 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
5.57 0.00 0.33 0.58 0.00 93.52
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 2.83 34.59 4.77 75.50 0.49 2.02 64.07 0.14 1.73 2.14 1.70 1.43 11.51
sdb 2.81 34.59 4.17 75.49 0.46 2.02 63.60 0.17 2.10 2.27 2.09 1.62 12.93
md127 0.00 0.00 3.50 91.78 0.26 2.01 48.85 0.00 0.00 0.00 0.00 0.00 0.00
答案1
每秒速率 = RPS
对 my.cnf [mysqld] 部分的建议
innodb_flush_neighbors=2 # from 1 to reduce innodb_buffer_pool_pages_dirty quicker
innodb_buffer_pool_size=16G # from 128M to support your 12.2 GB of innodb data in RAM
innodb_io_capacity=1900 # from 200 to use more of your SSD io capacity
read_rnd_buffer_size=192K # from 256K to recude handler_read_rnd_next RPS of 12,291
read_buffer_size=256K # from 128K to reduce handler_read_next RPS of 45,723
thread_cache_size=100 # from 9 to reduce threads_created
max_connections=300 # from 151 to support additional concurrent connections
connect_timeout=20 # from 10 to tolerate slow connects up to 20 seconds
您会发现这些配置更改将显著减少 CPU。
此配置可能只会在使用 php-fpm 而不是传统的 apache+php 时提高 CPU 性能