php-fpm 在重置 opcache 时产生大量负载，导致服务器无响应

2024-6-1 • tag-icon

最近几周，我们在部署代码时遇到了一个负面现象：服务器有时会几分钟无响应。

以下是发生这种情况时服务器负载的示例：

我能找到的唯一相关日志来自/var/log/php7.2-fpm.log，有时（但并非总是）我会看到这样的条目（注意：这来自与上面显示的图像不同的事件，尽管如此，还是发生了相同的事情）：

[22-Mar-2019 15:33:50] WARNING: [pool api] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 32 children, there are 0 idle, and 231 total children
[22-Mar-2019 15:33:52] WARNING: [pool api] server reached pm.max_children setting (250), consider raising it
[22-Mar-2019 15:34:05] WARNING: [pool app] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there are 47 idle, and 104 total children

发生的事情是，我们对此服务器进行了部署：

git status --porcelain检查修改
git pull origin master更新文件
重置 opcache，即我们调用执行opcache_reset()
清除本地缓存文件

经过一些实验，我可以将负载问题简化为：opcache_reset()

一旦我执行此调用（独立于任何之前或之后的部署步骤，当我仅调用此端点时也会单独发生这种情况）就有可能出现系统负载突然飙升的情况。

如果发生这种情况并且负载“过高”（根据我的经验，我会说> 200 左右），系统就会变得无响应，直到几秒或几分钟，具体取决于一切平静下来需要多长时间。

眼镜：

在 VMWare 上运行的 VM（不是我们自己托管的，我们有一个合作伙伴）
4 个 vCPU
8GB 内存
8GB 交换空间
Ubuntu 18.04 TS
nginx 1.14.0（Ubuntu 默认）
PHP 7.2（通过https://launchpad.net/~ondrej/+archive/ubuntu/php）

PHP-FPM 配置：

我们使用了 6 个具有不同虚拟主机的池
直接start_servers总计320php-fpm 进程（也通过确认ps auxw|grep -i fpm|grep -v grep |wc -l）
所有池的总数max_children约为870

也许这里的总数太高了，目的是应对个别虚拟主机的峰值，而这些虚拟主机恰好有有时。

使用htop，系统通常看起来像这样：

和通常除非出现这个峰值，否则负载很低，这些都与 opcache 重置有关（我最近才发现）：

我知道重置缓存并且现在所有进程都必须重新填充它会消耗 CPU。

但我不明白的是：

这种情况最近才开始出现，比如大概 1-2 个月，但直到最近两周，无响应才变得明显
它并不总是发生，有时重置缓存时，什么也不会发生

opcache_get_status(false)这是部署之前的输出：

{
  "opcache_enabled": true,
  "cache_full": false,
  "restart_pending": false,
  "restart_in_progress": false,
  "memory_usage": {
    "used_memory": 67353640,
    "free_memory": 66864088,
    "wasted_memory": 0,
    "current_wasted_percentage": 0
  },
  "interned_strings_usage": {
    "buffer_size": 8388608,
    "used_memory": 5215176,
    "free_memory": 3173432,
    "number_of_strings": 89109
  },
  "opcache_statistics": {
    "num_cached_scripts": 2873,
    "num_cached_keys": 5063,
    "max_cached_keys": 7963,
    "hits": 633581523,
    "start_time": 1553172771,
    "last_restart_time": 1553248200,
    "oom_restarts": 0,
    "hash_restarts": 0,
    "manual_restarts": 6,
    "misses": 9512,
    "blacklist_misses": 0,
    "blacklist_miss_ratio": 0,
    "opcache_hit_rate": 99.9984987161316
  }
}

接下来是：

{
  "opcache_enabled": true,
  "cache_full": false,
  "restart_pending": false,
  "restart_in_progress": false,
  "memory_usage": {
    "used_memory": 57745856,
    "free_memory": 76471872,
    "wasted_memory": 0,
    "current_wasted_percentage": 0
  },
  "interned_strings_usage": {
    "buffer_size": 8388608,
    "used_memory": 4337168,
    "free_memory": 4051440,
    "number_of_strings": 75163
  },
  "opcache_statistics": {
    "num_cached_scripts": 2244,
    "num_cached_keys": 3925,
    "max_cached_keys": 7963,
    "hits": 5893926,
    "start_time": 1553172771,
    "last_restart_time": 1553265235,
    "oom_restarts": 0,
    "hash_restarts": 0,
    "manual_restarts": 7,
    "misses": 4962,
    "blacklist_misses": 0,
    "blacklist_miss_ratio": 0,
    "opcache_hit_rate": 99.91588245106536
  }
}

我观察到的其他情况：

php-fpm 很快就停止响应
nginx 仍然有效，除非负载真的很高；我证实了这一点，因为当 php-fpm 基本上无法访问时，nginx 会提供配置的 500 页

导致这些负载峰值的原因真的? 我怎样才能避免它们？

接受答案后更新：

基本上只是不是呼叫opcache_reset 和将我的大多数 opcache 设置自定义恢复为默认值（即不强加它们）就解决了这个问题。

这一步是我多年来部署流程的一部分。我试图找出最初的原因，据我所知，这与类引用尚未加载/刷新的新代码时的部署问题有关。

现在回想起来，我甚至不确定这是否是真正的问题，但事实就是如此。

答案1

默认情况下，PHP 会检查文件时间戳以使 opcache 条目无效。可以关闭此功能，这是我能想到的唯一opcache_reset()会使用此功能的场景。当然，这也会导致您遇到的问题。

我建议回到默认值：

opcache.validate_timestamps = 1
opcache.revalidate_freq = 2
opcache.revalidate_path = 0

答案1

相关内容