Ubuntu 22.04.2 上 Apache/NFS 的速度出人意料地变慢

Ubuntu 22.04.2 上 Apache/NFS 的速度出人意料地变慢

这个问题已经根本由于对问题进行了进一步调查而进行了编辑

TL:DR;我们已经在 Ubuntu 上运行 Apache 来提供 NFS 页面超过 20 年了,而一个新出现的问题让我们完全不知所措 —— 它大约在四周前(2 月 24 日左右)开始出现。

问题是:发生了什么事?我们该如何解决?

网络、驱动器空间、内存、CPU 似乎不是这里的问题。这似乎与请求的文件元数据突然增加有关。

更新 23.2/03/2023

这似乎与https://serverfault.com/questions/1126320/heavy-nfs-metadata-traffic-flooding-nfsv4-1-server-aws-efs

更新 2023/03/23

未在 apache.conf 中设置用户/组会产生副作用,这样 apache 现在将以 root 身份运行,而它通常不想这样做。(我们尚未将此报告为错误,我们认为,用 apache 自己的话来说,这是一个 BIG_SECURITY_HOLE)。

本质上,nfs 流量和 apache2 之间的区别在于 apache2 是否以 root 身份运行。至于为什么会有区别,我们仍然不清楚。

那么,为什么注释掉用户/组会改变任何事情?(仅供参考,声明哪个用户/组并不重要)。

本质上,注释掉 User/Group 实际上使得 Apache2 能够以如下方式运行:。显然,这将是一个 apache 错误,因为如果我们在 User/Group 中定义 root/root,我们会看到(通过 apachectl -t)

Apache has not been designed to serve pages while
running as root.  There are known race conditions that
will allow any local user to read any file on the system.
If you still desire to serve pages as root then
add -DBIG_SECURITY_HOLE to the CFLAGS env variable
and then rebuild the server.
It is strongly suggested that you instead modify the User
directive in your httpd.conf file to list a non-root
user.

只是为了显示当用户/组关闭时我们看到的内容......

root      402354  0.1  3.7 608224 36872 ?        Ss   12:17   0:00 /usr/sbin/apache2 -k start
root      402355  0.0  0.7  57448  7084 ?        S    12:17   0:00 /usr/sbin/apache2 -k start
root      402356  0.0  1.4 608596 13984 ?        S    12:17   0:00 /usr/sbin/apache2 -k start
root      402357  0.0  1.8 608788 18668 ?        S    12:17   0:00 /usr/sbin/apache2 -k start
root      402358  0.0  1.4 608596 13984 ?        S    12:17   0:00 /usr/sbin/apache2 -k start
root      402359  0.0  1.4 608596 13984 ?        S    12:17   0:00 /usr/sbin/apache2 -k start
root      402360  0.0  1.4 608596 13984 ?        S    12:17   0:00 /usr/sbin/apache2 -k start

似乎两个 Apache 都需要以 root 身份才能使用 NFS 缓存,或者当它实际上并未以 root 身份运行时(即在配置文件中设置用户/组时),它会完全忽略、破坏或破坏它(除非以 root 身份运行)。

请注意,关闭用户/组是阿帕奇事物和不是NFS 事物。

详细信息如下。

我们在私有网络上安装了两个原始 AWS Ubuntu 22.04.2 服务器,并使用 apt update/apt upgrade 使它们完全更新。

其中一个,“nfs”已配置为 nfs 服务器,挂载到重新格式化的 100GB EBS 驱动器上。这些与我们多年来安装的许多其他系统没有什么不同。

在“nfs”上,/etc/exports 文件如下所示:

/mnt/nx 10.0.0.0/16(rw,sync,insecure_locks,no_subtree_check,all_squash,anonuid=1002,anongid=1002)

(用户 1002 是通用用户,在两个盒子上具有相同的权限。无论如何,使用 all_squash,www-data 具有读/写访问权限。)

同时,/etc/fstab的内容如下:

/dev/xvdf /mnt/nx  ext4 defaults,nofail 0 2

在 apache 框“apache”上,/etc/fstab 如下

nfs.private.net:/mnt/nx /websites nfs
sudo mount nfs

正确挂载 nfs。

本地测试反复表明驱动器已正确安装,并且许多测试表明 nfs 服务器正在执行其应有的操作。

为了测试目的,没有其他“移动部件” - 没有其他网络设备,也没有任何其他连接的系统。这些盒子是隔离的,没有其他连接,也没有非系统后台服务。

我们已经安装了 apache 和 nginx - 都出现了问题。但我们最熟悉的是 apache2.4

因此,我们对其进行设置,以便它可以为“原始”网站提供服务 - 它已配置为通过 SSL 提供服务。

有一个虚拟的平面文件,其中包含一些随机 html,用作测试页。'test.html'

从“apache”框中,我们可以轻松地通过 apache 服务器发出请求,并按预期成功检索 html。

curl -o /dev/null -w "%{time_total} \n" -s --resolve apache.private.net:443:10.0.31.198 https://apache.private.net/test.html

我们把它包装成一个while,这样

while true; do curl -o /dev/null -w "%{time_total} \n" -s --resolve apache.private.net:443:10.0.31.198 "https://apache.private.net/test.html"; done

典型/可接受的结果如下(在微实例上)。

0.081165 
0.080225 
0.080009 
0.081856 
0.082625 
0.081589 

然而,为了让它像上面那样工作,我们做了一件奇怪的事情。我们在 apache 配置文件中注释掉了用户/组行。

#User www-data
#Group www-data

如果我们取消注释它们,然后重新启动 apache

systemctl restart apache2

一切看起来都和上面一样,但事实并非如此。时间上有一个小的增量,这在 php 等系统中被大大放大。

0.090778 
0.089920 
0.089189 
0.089006 
0.088288 
0.089497 
0.089113 

然而,让我们检查一下“幕后”发生了什么。

我将使用上述请求,但在 tcpdump 上仅监控一个设置了用户/组的请求(没有其他流量)

tcpdump -Z root -s 9000 port 2049

这是典型的输出对于每个 HTTPS 请求。我们看到 16+ 个 nfs 请求/响应。每次重复的请求都会破坏网络。

18:02:31.777116 IP ip-10-0-31-198.eu-west-1.compute.internal.911 > ip-10-0-31-227.eu-west-1.compute.internal.nfs: Flags [P.], seq 2817:2989, ack 2689, win 8106, options [nop,nop,TS val 1038289658 ecr 3939993554], length 172: NFS request xid 3583950656 168 getattr fh 0,2/53
18:02:31.777685 IP ip-10-0-31-227.eu-west-1.compute.internal.nfs > ip-10-0-31-198.eu-west-1.compute.internal.911: Flags [P.], seq 2689:2861, ack 2989, win 9045, options [nop,nop,TS val 3940004552 ecr 1038289658], length 172: NFS reply xid 3583950656 reply ok 168 getattr NON 4 ids 0/-1469769628 sz 464598276
18:02:31.777700 IP ip-10-0-31-198.eu-west-1.compute.internal.911 > ip-10-0-31-227.eu-west-1.compute.internal.nfs: Flags [.], ack 2861, win 8105, options [nop,nop,TS val 1038289659 ecr 3940004552], length 0
18:02:31.777757 IP ip-10-0-31-198.eu-west-1.compute.internal.911 > ip-10-0-31-227.eu-west-1.compute.internal.nfs: Flags [P.], seq 2989:3169, ack 2861, win 8106, options [nop,nop,TS val 1038289659 ecr 3940004552], length 180: NFS request xid 3600727872 176 getattr fh 0,2/53
18:02:31.778188 IP ip-10-0-31-227.eu-west-1.compute.internal.nfs > ip-10-0-31-198.eu-west-1.compute.internal.911: Flags [P.], seq 2861:3033, ack 3169, win 9045, options [nop,nop,TS val 3940004553 ecr 1038289659], length 172: NFS reply xid 3600727872 reply ok 168 getattr NON 4 ids 0/-1469769628 sz 464598276
18:02:31.778220 IP ip-10-0-31-198.eu-west-1.compute.internal.911 > ip-10-0-31-227.eu-west-1.compute.internal.nfs: Flags [P.], seq 3169:3349, ack 3033, win 8106, options [nop,nop,TS val 1038289659 ecr 3940004553], length 180: NFS request xid 3617505088 176 getattr fh 0,2/53
18:02:31.778569 IP ip-10-0-31-227.eu-west-1.compute.internal.nfs > ip-10-0-31-198.eu-west-1.compute.internal.911: Flags [P.], seq 3033:3205, ack 3349, win 9045, options [nop,nop,TS val 3940004553 ecr 1038289659], length 172: NFS reply xid 3617505088 reply ok 168 getattr NON 4 ids 0/-1469769628 sz 464598276
18:02:31.778617 IP ip-10-0-31-198.eu-west-1.compute.internal.911 > ip-10-0-31-227.eu-west-1.compute.internal.nfs: Flags [P.], seq 3349:3521, ack 3205, win 8106, options [nop,nop,TS val 1038289660 ecr 3940004553], length 172: NFS request xid 3634282304 168 getattr fh 0,2/53
18:02:31.778976 IP ip-10-0-31-227.eu-west-1.compute.internal.nfs > ip-10-0-31-198.eu-west-1.compute.internal.911: Flags [P.], seq 3205:3377, ack 3521, win 9045, options [nop,nop,TS val 3940004553 ecr 1038289660], length 172: NFS reply xid 3634282304 reply ok 168 getattr NON 4 ids 0/-1469769628 sz 464598276
18:02:31.779011 IP ip-10-0-31-198.eu-west-1.compute.internal.911 > ip-10-0-31-227.eu-west-1.compute.internal.nfs: Flags [P.], seq 3521:3693, ack 3377, win 8106, options [nop,nop,TS val 1038289660 ecr 3940004553], length 172: NFS request xid 3651059520 168 getattr fh 0,2/53
18:02:31.779434 IP ip-10-0-31-227.eu-west-1.compute.internal.nfs > ip-10-0-31-198.eu-west-1.compute.internal.911: Flags [P.], seq 3377:3549, ack 3693, win 9045, options [nop,nop,TS val 3940004554 ecr 1038289660], length 172: NFS reply xid 3651059520 reply ok 168 getattr NON 4 ids 0/-1469769628 sz 464598276
18:02:31.779464 IP ip-10-0-31-198.eu-west-1.compute.internal.911 > ip-10-0-31-227.eu-west-1.compute.internal.nfs: Flags [P.], seq 3693:3873, ack 3549, win 8106, options [nop,nop,TS val 1038289660 ecr 3940004554], length 180: NFS request xid 3667836736 176 getattr fh 0,2/53
18:02:31.779885 IP ip-10-0-31-227.eu-west-1.compute.internal.nfs > ip-10-0-31-198.eu-west-1.compute.internal.911: Flags [P.], seq 3549:3721, ack 3873, win 9045, options [nop,nop,TS val 3940004554 ecr 1038289660], length 172: NFS reply xid 3667836736 reply ok 168 getattr NON 4 ids 0/-1469769628 sz 464598276
18:02:31.779932 IP ip-10-0-31-198.eu-west-1.compute.internal.911 > ip-10-0-31-227.eu-west-1.compute.internal.nfs: Flags [P.], seq 3873:4045, ack 3721, win 8106, options [nop,nop,TS val 1038289661 ecr 3940004554], length 172: NFS request xid 3684613952 168 getattr fh 0,2/53
18:02:31.780355 IP ip-10-0-31-227.eu-west-1.compute.internal.nfs > ip-10-0-31-198.eu-west-1.compute.internal.911: Flags [P.], seq 3721:3893, ack 4045, win 9045, options [nop,nop,TS val 3940004555 ecr 1038289661], length 172: NFS reply xid 3684613952 reply ok 168 getattr NON 4 ids 0/-1469769628 sz 464598276
18:02:31.780385 IP ip-10-0-31-198.eu-west-1.compute.internal.911 > ip-10-0-31-227.eu-west-1.compute.internal.nfs: Flags [P.], seq 4045:4225, ack 3893, win 8106, options [nop,nop,TS val 1038289661 ecr 3940004555], length 180: NFS request xid 3701391168 176 getattr fh 0,2/53
18:02:31.780749 IP ip-10-0-31-227.eu-west-1.compute.internal.nfs > ip-10-0-31-198.eu-west-1.compute.internal.911: Flags [P.], seq 3893:4065, ack 4225, win 9045, options [nop,nop,TS val 3940004555 ecr 1038289661], length 172: NFS reply xid 3701391168 reply ok 168 getattr NON 4 ids 0/-1469769628 sz 464598276
18:02:31.780774 IP ip-10-0-31-198.eu-west-1.compute.internal.911 > ip-10-0-31-227.eu-west-1.compute.internal.nfs: Flags [P.], seq 4225:4405, ack 4065, win 8106, options [nop,nop,TS val 1038289662 ecr 3940004555], length 180: NFS request xid 3718168384 176 getattr fh 0,2/53
18:02:31.781175 IP ip-10-0-31-227.eu-west-1.compute.internal.nfs > ip-10-0-31-198.eu-west-1.compute.internal.911: Flags [P.], seq 4065:4237, ack 4405, win 9045, options [nop,nop,TS val 3940004556 ecr 1038289662], length 172: NFS reply xid 3718168384 reply ok 168 getattr NON 4 ids 0/-1469769628 sz 464598276
18:02:31.781212 IP ip-10-0-31-198.eu-west-1.compute.internal.911 > ip-10-0-31-227.eu-west-1.compute.internal.nfs: Flags [P.], seq 4405:4577, ack 4237, win 8106, options [nop,nop,TS val 1038289662 ecr 3940004556], length 172: NFS request xid 3734945600 168 getattr fh 0,2/53
18:02:31.781558 IP ip-10-0-31-227.eu-west-1.compute.internal.nfs > ip-10-0-31-198.eu-west-1.compute.internal.911: Flags [P.], seq 4237:4409, ack 4577, win 9045, options [nop,nop,TS val 3940004556 ecr 1038289662], length 172: NFS reply xid 3734945600 reply ok 168 getattr NON 4 ids 0/-1469769628 sz 464598276
18:02:31.781584 IP ip-10-0-31-198.eu-west-1.compute.internal.911 > ip-10-0-31-227.eu-west-1.compute.internal.nfs: Flags [P.], seq 4577:4757, ack 4409, win 8106, options [nop,nop,TS val 1038289663 ecr 3940004556], length 180: NFS request xid 3751722816 176 getattr fh 0,2/53
18:02:31.781952 IP ip-10-0-31-227.eu-west-1.compute.internal.nfs > ip-10-0-31-198.eu-west-1.compute.internal.911: Flags [P.], seq 4409:4581, ack 4757, win 9045, options [nop,nop,TS val 3940004556 ecr 1038289663], length 172: NFS reply xid 3751722816 reply ok 168 getattr NON 4 ids 0/-1469769628 sz 464598276
18:02:31.781976 IP ip-10-0-31-198.eu-west-1.compute.internal.911 > ip-10-0-31-227.eu-west-1.compute.internal.nfs: Flags [P.], seq 4757:4937, ack 4581, win 8106, options [nop,nop,TS val 1038289663 ecr 3940004556], length 180: NFS request xid 3768500032 176 getattr fh 0,2/53
18:02:31.782394 IP ip-10-0-31-227.eu-west-1.compute.internal.nfs > ip-10-0-31-198.eu-west-1.compute.internal.911: Flags [P.], seq 4581:4753, ack 4937, win 9045, options [nop,nop,TS val 3940004557 ecr 1038289663], length 172: NFS reply xid 3768500032 reply ok 168 getattr NON 4 ids 0/-1469769628 sz 464598276
18:02:31.782467 IP ip-10-0-31-198.eu-west-1.compute.internal.911 > ip-10-0-31-227.eu-west-1.compute.internal.nfs: Flags [P.], seq 4937:5109, ack 4753, win 8106, options [nop,nop,TS val 1038289663 ecr 3940004557], length 172: NFS request xid 3785277248 168 getattr fh 0,2/53
18:02:31.782834 IP ip-10-0-31-227.eu-west-1.compute.internal.nfs > ip-10-0-31-198.eu-west-1.compute.internal.911: Flags [P.], seq 4753:4925, ack 5109, win 9045, options [nop,nop,TS val 3940004557 ecr 1038289663], length 172: NFS reply xid 3785277248 reply ok 168 getattr NON 4 ids 0/-1469769628 sz 464598276
18:02:31.782862 IP ip-10-0-31-198.eu-west-1.compute.internal.911 > ip-10-0-31-227.eu-west-1.compute.internal.nfs: Flags [P.], seq 5109:5289, ack 4925, win 8106, options [nop,nop,TS val 1038289664 ecr 3940004557], length 180: NFS request xid 3802054464 176 getattr fh 0,2/53
18:02:31.783239 IP ip-10-0-31-227.eu-west-1.compute.internal.nfs > ip-10-0-31-198.eu-west-1.compute.internal.911: Flags [P.], seq 4925:5097, ack 5289, win 9045, options [nop,nop,TS val 3940004558 ecr 1038289664], length 172: NFS reply xid 3802054464 reply ok 168 getattr NON 4 ids 0/-1469769628 sz 464598276
18:02:31.783267 IP ip-10-0-31-198.eu-west-1.compute.internal.911 > ip-10-0-31-227.eu-west-1.compute.internal.nfs: Flags [P.], seq 5289:5469, ack 5097, win 8106, options [nop,nop,TS val 1038289664 ecr 3940004558], length 180: NFS request xid 3818831680 176 getattr fh 0,2/53
18:02:31.783644 IP ip-10-0-31-227.eu-west-1.compute.internal.nfs > ip-10-0-31-198.eu-west-1.compute.internal.911: Flags [P.], seq 5097:5269, ack 5469, win 9045, options [nop,nop,TS val 3940004558 ecr 1038289664], length 172: NFS reply xid 3818831680 reply ok 168 getattr NON 4 ids 0/-1469769628 sz 464598276
18:02:31.783688 IP ip-10-0-31-198.eu-west-1.compute.internal.911 > ip-10-0-31-227.eu-west-1.compute.internal.nfs: Flags [P.], seq 5469:5633, ack 5269, win 8106, options [nop,nop,TS val 1038289665 ecr 3940004558], length 164: NFS request xid 3835608896 160 getattr fh 0,2/53
18:02:31.784051 IP ip-10-0-31-227.eu-west-1.compute.internal.nfs > ip-10-0-31-198.eu-west-1.compute.internal.911: Flags [P.], seq 5269:5377, ack 5633, win 9045, options [nop,nop,TS val 3940004558 ecr 1038289665], length 108: NFS reply xid 3835608896 reply ok 104 getattr NON 3 ids 0/-1469769628 sz 464598276
18:02:31.827894 IP ip-10-0-31-198.eu-west-1.compute.internal.911 > ip-10-0-31-227.eu-west-1.compute.internal.nfs: Flags [.], ack 5377, win 8106, options [nop,nop,TS val 1038289709 ecr 3940004558], length 0

现在,关闭用户/组(并重新启动 apache)。第一个 HTTPS 请求如下(3 个 NFS 请求/响应),接下来的 HTTPS 10-15 请求根本不会生成 nfs 流量(我们认为这要归功于默认的 nfs 客户端缓存)

18:06:39.993397 IP ip-10-0-31-198.eu-west-1.compute.internal.911 > ip-10-0-31-227.eu-west-1.compute.internal.nfs: Flags [P.], seq 6669:6837, ack 6525, win 8106, options [nop,nop,TS val 1038537874 ecr 3940246142], length 168: NFS request xid 3969826624 164 getattr fh 0,2/53
18:06:39.994129 IP ip-10-0-31-227.eu-west-1.compute.internal.nfs > ip-10-0-31-198.eu-west-1.compute.internal.911: Flags [P.], seq 6525:6769, ack 6837, win 9045, options [nop,nop,TS val 3940252768 ecr 1038537874], length 244: NFS reply xid 3969826624 reply ok 240 getattr NON 3 ids 0/-1469769628 sz 464598276
18:06:39.994145 IP ip-10-0-31-198.eu-west-1.compute.internal.911 > ip-10-0-31-227.eu-west-1.compute.internal.nfs: Flags [.], ack 6769, win 8105, options [nop,nop,TS val 1038537875 ecr 3940252768], length 0
18:06:39.994372 IP ip-10-0-31-198.eu-west-1.compute.internal.911 > ip-10-0-31-227.eu-west-1.compute.internal.nfs: Flags [P.], seq 6837:7073, ack 6769, win 8106, options [nop,nop,TS val 1038537875 ecr 3940252768], length 236: NFS request xid 3986603840 232 getattr fh 0,2/53
18:06:39.994832 IP ip-10-0-31-227.eu-west-1.compute.internal.nfs > ip-10-0-31-198.eu-west-1.compute.internal.911: Flags [P.], seq 6769:7125, ack 7073, win 9045, options [nop,nop,TS val 3940252769 ecr 1038537875], length 356: NFS reply xid 3986603840 reply ok 352 getattr NON 5 ids 0/-1469769628 sz 464598276
18:06:39.995036 IP ip-10-0-31-198.eu-west-1.compute.internal.911 > ip-10-0-31-227.eu-west-1.compute.internal.nfs: Flags [P.], seq 7073:7249, ack 7125, win 8106, options [nop,nop,TS val 1038537876 ecr 3940252769], length 176: NFS request xid 4003381056 172 getattr fh 0,2/53
18:06:39.995520 IP ip-10-0-31-227.eu-west-1.compute.internal.nfs > ip-10-0-31-198.eu-west-1.compute.internal.911: Flags [P.], seq 7125:7241, ack 7249, win 9045, options [nop,nop,TS val 3940252770 ecr 1038537876], length 116: NFS reply xid 4003381056 reply ok 112 getattr NON 3 ids 0/-1469769628 sz 464598276
18:06:40.035889 IP ip-10-0-31-198.eu-west-1.compute.internal.911 > ip-10-0-31-227.eu-west-1.compute.internal.nfs: Flags [.], ack 7241, win 8106, options [nop,nop,TS val 1038537917 ecr 3940252770], length 0

提醒你一下 这种架构已经运行良好许多- 尽管人们对 NFS 颇有微词。我们看到的行为是新的,而且由于我们没有对系统进行任何重大更改,我们认为罪魁祸首可能是最近的系统升级(通过无人值守升级)。我们可能错了。但涉及替换 NFS 的答案并不是我们在这里寻找的。

答案1

Ubuntu 错误

该问题已解决。参见

我们的内核版本与问题相符。

我们已经针对 Kinetic(发布修复程序的地方)验证并测试了该行为。

在等待修复时,可以按如下方式降级内核(适用于 22.04)

# super-user
sudo su

# grab the previous kernel.
grub-mkconfig | grep menuentry | grep 5.15.0-1030

# now copy the 'gnulinux-5.15.0-1030....' string.
vi /etc/default/grub

# replace the GRUB_DEFAULT=0 with the captured string as follows:
GRUB_DEFAULT='gnulinux-5.15.0-1030....' 

# save it and run update. 
update-grub

# this will tell you off, but have the correct (very long) signature to use. Use that.
vi /etc/default/grub

GRUB_DEFAULT='gnulinux-advanced-...>gnulinux-5.15.0-1030...'

# save it again and run update again
update-grub

# if there are no warnings or errors.
reboot

我们选择具体化,以便菜单修改不会改变效果。

相关内容