为什么我的 Linux 托管服务器的平均负载一直在 6 到 33 之间波动？

2024-6-1 • tag-icon

linux ubuntu nginx php php-fpm

为什么我的 Linux 托管服务器的平均负载一直在 6 到 33 之间波动？

最近，我注意到我的生产服务器上一个流量适中的 WordPress 网站出现了奇怪的行为。（并发实时流量约为 1500）

该服务器具有以下功能：

DigitalOcean 48GB Memory
16 Core Processor
480GB SSD Disk

问题是平均负载 (LA) 保持在 6 约 30 分钟，然后稳步上升到 33 左右，然后保持在 33 约 30 分钟。然后回到 6（正常）并持续下去。

当 LA 超过 20 时，网站打开速度会变得非常慢。显然，我的访问者会感到沮丧并离开网站，因为它打开需要很长时间。我因此失去了访客。 :(

调整了一些 nginx、php-frm、sysctl.conf、mysql（my.cnf）设置，但似乎没有任何效果。

nginx.conf

user www-data;
worker_processes 16;
worker_rlimit_nofile 100000;
pid /run/nginx.pid;

events {
        worker_connections 4096;
        multi_accept on;
}

但我的ulimit -n回报只有1024

php.ini

max_execution_time = 30
max_input_time = 60
max_input_vars = 5000
memory_limit = 128M
post_max_size = 100M
default_socket_timeout = 60
pdo_mysql.cache_size = 2000

閣下網站

listen = 127.0.0.1:9090
listen.backlog = 65536
pm = static
pm.max_children = 500
pm.start_servers = 60
pm.min_spare_servers = 45
pm.max_spare_servers = 75
pm.max_requests = 5000
request_terminate_timeout = 300

系统配置参数

fs.file-max = 2097152

# Do less swapping
vm.swappiness = 10
vm.dirty_ratio = 60
vm.dirty_background_ratio = 2

# Decrease the time default value for tcp_fin_timeout connection
net.ipv4.tcp_fin_timeout = 15

# Decrease the time default value for connections to keep alive
net.ipv4.tcp_keepalive_time = 300
net.ipv4.tcp_keepalive_probes = 5
net.ipv4.tcp_keepalive_intvl = 15

### TUNING NETWORK PERFORMANCE ###

# Default Socket Receive Buffer
net.core.rmem_default = 31457280

# Maximum Socket Receive Buffer
net.core.rmem_max = 12582912

# Default Socket Send Buffer
net.core.wmem_default = 31457280

# Maximum Socket Send Buffer
net.core.wmem_max = 12582912

# Increase number of incoming connections
net.core.somaxconn = 65536
# Increase number of incoming connections backlog
net.core.netdev_max_backlog = 65536

# Increase the maximum amount of option memory buffers
net.core.optmem_max = 25165824

# Increase the maximum total buffer-space allocatable
# This is measured in units of pages (4096 bytes)
net.ipv4.tcp_mem = 65536 131072 262144
net.ipv4.udp_mem = 65536 131072 262144

# Increase the read-buffer space allocatable
net.ipv4.tcp_rmem = 8192 87380 16777216
net.ipv4.udp_rmem_min = 16384

# Increase the write-buffer-space allocatable
net.ipv4.tcp_wmem = 8192 65536 16777216
net.ipv4.udp_wmem_min = 16384

# Increase the tcp-time-wait buckets pool size to prevent simple DOS attacks
net.ipv4.tcp_max_tw_buckets = 1440000
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1

我的cnf

    #
    # * Fine Tuning
    #
    max_connections         = 1000
    connect_timeout         = 5
    wait_timeout            = 600
    max_allowed_packet      = 16M
    thread_cache_size       = 128
    sort_buffer_size        = 4M
    bulk_insert_buffer_size = 16M
    tmp_table_size          = 32M
    max_heap_table_size     = 32M
query_cache_limit               = 128K
query_cache_size                = 64M
innodb_buffer_pool_size = 1024M
innodb_log_buffer_size  = 8M
innodb_file_per_table   = 1
innodb_open_files       = 400
innodb_io_capacity      = 400
innodb_flush_method     = O_DIRECT

有人可以给我指明正确的方向吗？

编辑

平均负载为 30+ 时的 TOP 截图

答案1

从您的屏幕截图可以看出，CPU 占用率为 599% mysqld，数据库是问题的根源。

我会将你“调整”过的所有文件恢复为默认设置。你很可能在不知道实际问题是什么的情况下，随意谷歌搜索后更改了内容。

之后你应该运行MySQL 调谐器获得关于您应该在何处调整my.cnf设置的建议（例如，innodb 缓存可能太小）。之后，您还需要检查慢查询日志。

进行这些更改后，如果您仍然遇到问题，请创建一个新问题，详细说明您的数据库问题（假设myslqd仍然是问题的根源）。

答案2

您能否检查一下 CPU 负载、网络连接数和传出流量之间的关联？

分析 I/O 统计数据也很有用。worker_process如果 I/O 不堪重负，您可能需要增加计数。

另一个好的做法是运行chkrootkit一下以确保万无一失，并查看auth.log是否有任何异常活动。

我希望你有 wordpressREST API有保障。

相关内容