CPU利用率不均匀的Linux双插槽服务器

2024-6-2 • tag-icon

最近我注意到我们的两台服务器上任务分配很奇怪。两台服务器都是双 CPU EPYC 7402，物理上是相同的平台，运行相同的任务，numa 配置、内核和 ubuntu 有所不同。

服务器1配置和负载：

Linux sv-marmoset222 5.15.0-46-generic #49~20.04.1-Ubuntu SMP Thu Aug 4 19:15:44 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   43 bits physical, 48 bits virtual
CPU(s):                          96
On-line CPU(s) list:             0-95
Thread(s) per core:              2
Core(s) per socket:              24
Socket(s):                       2
NUMA node(s):                    2
Vendor ID:                       AuthenticAMD
CPU family:                      23
Model:                           49
Model name:                      AMD EPYC 7402 24-Core Processor
Stepping:                        0
CPU MHz:                         2794.626
BogoMIPS:                        5589.25
Virtualization:                  AMD-V
L1d cache:                       1.5 MiB
L1i cache:                       1.5 MiB
L2 cache:                        24 MiB
L3 cache:                        256 MiB
NUMA node0 CPU(s):               0-23,48-71
NUMA node1 CPU(s):               24-47,72-95
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
node 0 size: 128511 MB
node 0 free: 113713 MB
node 1 cpus: 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
node 1 size: 129005 MB
node 1 free: 121583 MB
node distances:
node   0   1
  0:  10  32
  1:  32  10

服务器 1 负载

服务器2配置和负载：

Linux sv-marmoset318 5.3.0-62-generic #56~18.04.1-Ubuntu SMP Wed Jun 24 16:17:03 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              96
On-line CPU(s) list: 0-95
Thread(s) per core:  2
Core(s) per socket:  24
Socket(s):           2
NUMA node(s):        8
Vendor ID:           AuthenticAMD
CPU family:          23
Model:               49
Model name:          AMD EPYC 7402 24-Core Processor
Stepping:            0
CPU MHz:             3340.149
BogoMIPS:            5589.69
Virtualization:      AMD-V
L1d cache:           32K
L1i cache:           32K
L2 cache:            512K
L3 cache:            16384K
NUMA node0 CPU(s):   0-5,48-53
NUMA node1 CPU(s):   6-11,54-59
NUMA node2 CPU(s):   12-17,60-65
NUMA node3 CPU(s):   18-23,66-71
NUMA node4 CPU(s):   24-29,72-77
NUMA node5 CPU(s):   30-35,78-83
NUMA node6 CPU(s):   36-41,84-89
NUMA node7 CPU(s):   42-47,90-95
available: 8 nodes (0-7)
node 0 cpus: 0 1 2 3 4 5 48 49 50 51 52 53
node 0 size: 0 MB
node 0 free: 0 MB
node 1 cpus: 6 7 8 9 10 11 54 55 56 57 58 59
node 1 size: 64085 MB
node 1 free: 52924 MB
node 2 cpus: 12 13 14 15 16 17 60 61 62 63 64 65
node 2 size: 0 MB
node 2 free: 0 MB
node 3 cpus: 18 19 20 21 22 23 66 67 68 69 70 71
node 3 size: 0 MB
node 3 free: 0 MB
node 4 cpus: 24 25 26 27 28 29 72 73 74 75 76 77
node 4 size: 0 MB
node 4 free: 0 MB
node 5 cpus: 30 31 32 33 34 35 78 79 80 81 82 83
node 5 size: 64489 MB
node 5 free: 43644 MB
node 6 cpus: 36 37 38 39 40 41 84 85 86 87 88 89
node 6 size: 0 MB
node 6 free: 0 MB
node 7 cpus: 42 43 44 45 46 47 90 91 92 93 94 95
node 7 size: 0 MB
node 7 free: 0 MB
node distances:
node   0   1   2   3   4   5   6   7
  0:  10  12  12  12  32  32  32  32
  1:  12  10  12  12  32  32  32  32
  2:  12  12  10  12  32  32  32  32
  3:  12  12  12  10  32  32  32  32
  4:  32  32  32  32  10  12  12  12
  5:  32  32  32  32  12  10  12  12
  6:  32  32  32  32  12  12  10  12
  7:  32  32  32  32  12  12  12  10

服务器2负载

因此，我相信它们作为工作后端具有不同的响应时间，在服务器 1 上约为 8 毫秒，在第二台服务器上约为 4-5 毫秒。

此问题是否是由于 numa 配置错误造成的？我怎样才能在 server1 上实现与第二台服务器相同的利用率？

编辑：由于这些任务是 uwsgi 进程，我可以在 uwsgi 配置中为它们设置 CPU 绑定并获得我想要的结果。但我描述的行为对我来说仍然很奇怪。

答案1

经过一番研究，我想我找到了解决方案，希望它能帮助其他人。所谓的接收数据包转向有所帮助。相关文档和更多信息可在此处找到：有关网络扩展的内核文档

有用的文章

有关主题的 redhat 文档

答案1

相关内容