上个月,我们的 Linode 服务器负载在运行 CentOS 7 时有所增加。我已将 MariaDB 10.3 升级到 PHP 7.2,现在在 CentOS 7.5 上使用,内存为 16GB,有 6 个内核。根据 apache2buddy perl 脚本,服务器上的 MariaDB 也使用了 5372.81 MB。我使用的是默认的 MaxRequestWorkers,脚本说它太高了,但我在其范围内尝试过,并没有什么不同。我们最近将整个网站置于 HTTPS 下,但在问题出现之前,很多网站已经处于 HTTPS 下。服务器过去主要在 1 左右运行,现在平均为 3-4,峰值达到 8+。
top - 12:39:15 up 10:26, 2 users, load average: 3.27, 3.57, 4.08
Tasks: 181 total, 2 running, 117 sleeping, 0 stopped, 0 zombie
%Cpu(s): 9.3 us, 6.0 sy, 0.0 ni, 54.2 id, 4.2 wa, 0.0 hi, 0.9 si, 25.4 st
KiB Mem : 16419324 total, 462908 free, 7237284 used, 8719132 buff/cache
KiB Swap: 524284 total, 524284 free, 0 used. 8808064 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3614 mysql 20 0 10.1g 5.2g 21372 S 45.2 33.5 259:48.57 mysqld
15613 wmnf_ad+ 20 0 842992 178644 95772 S 20.9 1.1 0:03.67 httpd
15650 wmnf_ad+ 20 0 792196 131184 97048 S 16.6 0.8 0:10.31 httpd
15636 wmnf_ad+ 20 0 837444 179280 101916 R 15.9 1.1 0:15.06 httpd
15634 wmnf_ad+ 20 0 870480 136836 100236 S 7.0 0.8 0:16.07 httpd
15632 wmnf_ad+ 20 0 794060 125052 89772 S 5.6 0.8 0:12.00 httpd
1937 root 20 0 0 0 0 D 2.3 0.0 7:03.28 jbd2/sda-8
1 root 20 0 191432 5732 3856 S 1.0 0.0 1:01.34 systemd
15654 wmnf_ad+ 20 0 795988 123584 88628 S 1.0 0.8 0:05.54 httpd
8 root 20 0 0 0 0 I 0.7 0.0 3:45.27 rcu_sched
34 root 20 0 0 0 0 S 0.3 0.0 1:13.51ksoftirqd/5
3207 root 20 0 492880 15488 12152 S 0.3 0.1 0:17.17 NetworkManager
15254 root 20 0 161992 4632 3856 R 0.3 0.0 0:03.46 top
15604 wmnf_ad+ 20 0 799320 141524 104940 S 0.3 0.9 0:17.32 httpd
15628 wmnf_ad+ 20 0 794284 128708 95872 S 0.3 0.8 0:17.75 httpd
15951 wmnf_ad+ 20 0 796216 124368 89456 S 0.3 0.8 0:09.05 httpd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.12 kthreadd
4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/0:0H
当 Linode 之前给我们提供 8GB 到 16GB 时,我在 httpd conf 中将这些数字大多翻倍了:
StartServers 4
MinSpareServers 20
MaxSpareServers 40
MaxClients 200
MaxRequestsPerChild 4500
自上次重启后大约一小时,Apache 的内存使用情况如下:
[root@archives conf.d]# ps -ylC httpd | awk '{x += $8;y += 1} END {print "Apache Memory Usage (MB): "x/1024; print "Average Process Size (MB): "x/((y- 1)*1024)}'
Apache Memory Usage (MB): 6136.5
Average Process Size (MB): 109.58
查看 Apache 服务器状态页面,看起来并没有大量的请求,而且肯定没有使用我允许的所有资源:
Server Version: Apache/2.4.6 (CentOS) OpenSSL/1.0.2k-fips mod_fcgid/2.3.9 PHP/7.2.6
Server MPM: prefork
Server Built: Apr 20 2018 18:10:38
Current Time: Monday, 28-May-2018 12:27:12 EDT
Restart Time: Monday, 28-May-2018 12:26:14 EDT
Parent Server Config. Generation: 1
Parent Server MPM Generation: 0
Server uptime: 57 seconds
Server load: 4.54 4.76 4.67
Total accesses: 324 - Total Traffic: 127.9 MB
CPU Usage: u47.7 s18.95 cu0 cs0 - 117% CPU load
5.68 requests/sec - 2.2 MB/second - 404.3 kB/request
30 requests currently being processed, 30 idle workers
WKWR__W._W__WWW_RWK_W______RK._.K_R_K_R___K_RW__.R_..___W__KWRRW
.W_.............................................................
................................................................
........
我在几个进程上使用了 pmap,发现有很多模块,我可能不需要,但其中有几个,所有这些模块都默认加载吗?当然,我安装了 php7、fcgid、status 和其他模块……
[root@archives conf.d]# httpd -M
Loaded Modules:
core_module (static)
so_module (static)
http_module (static)
access_compat_module (shared)
actions_module (shared)
alias_module (shared)
allowmethods_module (shared)
auth_basic_module (shared)
auth_digest_module (shared)
authn_anon_module (shared)
authn_core_module (shared)
authn_dbd_module (shared)
authn_dbm_module (shared)
authn_file_module (shared)
authn_socache_module (shared)
authz_core_module (shared)
authz_dbd_module (shared)
authz_dbm_module (shared)
authz_groupfile_module (shared)
authz_host_module (shared)
authz_owner_module (shared)
authz_user_module (shared)
autoindex_module (shared)
cache_module (shared)
cache_disk_module (shared)
data_module (shared)
dbd_module (shared)
deflate_module (shared)
dir_module (shared)
dumpio_module (shared)
echo_module (shared)
env_module (shared)
expires_module (shared)
ext_filter_module (shared)
filter_module (shared)
headers_module (shared)
include_module (shared)
info_module (shared)
log_config_module (shared)
logio_module (shared)
mime_magic_module (shared)
mime_module (shared)
negotiation_module (shared)
remoteip_module (shared)
reqtimeout_module (shared)
rewrite_module (shared)
setenvif_module (shared)
slotmem_plain_module (shared)
slotmem_shm_module (shared)
socache_dbm_module (shared)
socache_memcache_module (shared)
socache_shmcb_module (shared)
status_module (shared)
substitute_module (shared)
suexec_module (shared)
unique_id_module (shared)
unixd_module (shared)
userdir_module (shared)
version_module (shared)
vhost_alias_module (shared)
dav_module (shared)
dav_fs_module (shared)
dav_lock_module (shared)
lua_module (shared)
mpm_prefork_module (shared)
proxy_module (shared)
lbmethod_bybusyness_module (shared)
lbmethod_byrequests_module (shared)
lbmethod_bytraffic_module (shared)
proxy_ajp_module (shared)
proxy_balancer_module (shared)
proxy_connect_module (shared)
proxy_express_module (shared)
proxy_fcgi_module (shared)
proxy_fdpass_module (shared)
proxy_ftp_module (shared)
proxy_http_module (shared)
proxy_scgi_module (shared)
proxy_wstunnel_module (shared)
ssl_module (shared)
systemd_module (shared)
cgi_module (shared)
fcgid_module (shared)
php7_module (shared)
我还检查了 PHP 模块,发现 apc 已加载,这不应该在新 opcache 运行时发生?它可能从早期版本就存在了,但总的来说没有什么区别。我还能做什么,或者我如何确定这种高负载的原因?当 Apache 停止时,负载确实会下降。
运行 iotop 时,我看到这个 [jbd2/sda-8] 进程始终位于顶部,IO 介于 10-60% 之间。如果这是一个与日志相关的进程,那么磁盘是否存在潜在问题。也许需要在单用户模式下清理磁盘?
Total DISK READ : 0.00 B/s | Total DISK WRITE : 181.41 K/s
Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 272.12 K/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
1937 be/3 root 0.00 B/s 0.00 B/s 0.00 % 27.02 % [jbd2/sda-8]
3648 be/4 mysql 0.00 B/s 151.18 K/s 0.00 % 4.66 % mysqld
3645 be/4 mysql 0.00 B/s 0.00 B/s 0.00 % 1.40 % mysqld
19502 be/4 mysql 0.00 B/s 18.14 K/s 0.00 % 0.53 % mysqld
1 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % init
2 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kthreadd]
4 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/0:0H]
6 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [mm_percpu_wq]
7 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/0]
8 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [rcu_sched
<snip>
答案1
我也认为 MySQL 导致了负载。jbd2 进程是一个更新文件系统日志的内核线程正如您所猜测的。看起来 MySQL 正在大量写入磁盘,这导致了 jbd2 的负载。
MySQL 有时需要创建临时表来处理查询,尤其是带有group by
子句的查询。如果这些临时表是在磁盘上创建的,那么这就可以解释您的负载。此命令将显示在磁盘和内存中创建了多少个临时表 SHOW GLOBAL STATUS LIKE 'created_tmp%tables';
。
同样从该链接中,可以了解到 MySQL 在磁盘而不是内存上创建临时表的两个原因:
结果大于 MySQL 变量 max_heap_table_size 和 tmp_table_size 中较小的一个。
结果包含 BLOB 或 TEXT 类型的列。
答案2
是的,mysqld 进程是问题所在,但这是等待此虚拟服务器的主机所致。我的 top 中的高“st”数字表明主机正忙于处理其他虚拟机……
top - 12:39:15 up 10:26, 2 users, load average: 3.27, 3.57, 4.08
Tasks: 181 total, 2 running, 117 sleeping, 0 stopped, 0 zombie
%Cpu(s): 9.3 us, 6.0 sy, 0.0 ni, 54.2 id, 4.2 wa, 0.0 hi, 0.9 si, 25.4 st
^^^^^^^
在向托管服务提供商指出这个问题后,我们的虚拟机被迁移到新主机。问题解决了,现在负载像往常一样很低。