最近我注意到我们的两台服务器上任务分配很奇怪。两台服务器都是双 CPU EPYC 7402,物理上是相同的平台,运行相同的任务,numa 配置、内核和 ubuntu 有所不同。
服务器1配置和负载:
Linux sv-marmoset222 5.15.0-46-generic #49~20.04.1-Ubuntu SMP Thu Aug 4 19:15:44 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 43 bits physical, 48 bits virtual
CPU(s): 96
On-line CPU(s) list: 0-95
Thread(s) per core: 2
Core(s) per socket: 24
Socket(s): 2
NUMA node(s): 2
Vendor ID: AuthenticAMD
CPU family: 23
Model: 49
Model name: AMD EPYC 7402 24-Core Processor
Stepping: 0
CPU MHz: 2794.626
BogoMIPS: 5589.25
Virtualization: AMD-V
L1d cache: 1.5 MiB
L1i cache: 1.5 MiB
L2 cache: 24 MiB
L3 cache: 256 MiB
NUMA node0 CPU(s): 0-23,48-71
NUMA node1 CPU(s): 24-47,72-95
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
node 0 size: 128511 MB
node 0 free: 113713 MB
node 1 cpus: 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
node 1 size: 129005 MB
node 1 free: 121583 MB
node distances:
node 0 1
0: 10 32
1: 32 10
服务器2配置和负载:
Linux sv-marmoset318 5.3.0-62-generic #56~18.04.1-Ubuntu SMP Wed Jun 24 16:17:03 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 96
On-line CPU(s) list: 0-95
Thread(s) per core: 2
Core(s) per socket: 24
Socket(s): 2
NUMA node(s): 8
Vendor ID: AuthenticAMD
CPU family: 23
Model: 49
Model name: AMD EPYC 7402 24-Core Processor
Stepping: 0
CPU MHz: 3340.149
BogoMIPS: 5589.69
Virtualization: AMD-V
L1d cache: 32K
L1i cache: 32K
L2 cache: 512K
L3 cache: 16384K
NUMA node0 CPU(s): 0-5,48-53
NUMA node1 CPU(s): 6-11,54-59
NUMA node2 CPU(s): 12-17,60-65
NUMA node3 CPU(s): 18-23,66-71
NUMA node4 CPU(s): 24-29,72-77
NUMA node5 CPU(s): 30-35,78-83
NUMA node6 CPU(s): 36-41,84-89
NUMA node7 CPU(s): 42-47,90-95
available: 8 nodes (0-7)
node 0 cpus: 0 1 2 3 4 5 48 49 50 51 52 53
node 0 size: 0 MB
node 0 free: 0 MB
node 1 cpus: 6 7 8 9 10 11 54 55 56 57 58 59
node 1 size: 64085 MB
node 1 free: 52924 MB
node 2 cpus: 12 13 14 15 16 17 60 61 62 63 64 65
node 2 size: 0 MB
node 2 free: 0 MB
node 3 cpus: 18 19 20 21 22 23 66 67 68 69 70 71
node 3 size: 0 MB
node 3 free: 0 MB
node 4 cpus: 24 25 26 27 28 29 72 73 74 75 76 77
node 4 size: 0 MB
node 4 free: 0 MB
node 5 cpus: 30 31 32 33 34 35 78 79 80 81 82 83
node 5 size: 64489 MB
node 5 free: 43644 MB
node 6 cpus: 36 37 38 39 40 41 84 85 86 87 88 89
node 6 size: 0 MB
node 6 free: 0 MB
node 7 cpus: 42 43 44 45 46 47 90 91 92 93 94 95
node 7 size: 0 MB
node 7 free: 0 MB
node distances:
node 0 1 2 3 4 5 6 7
0: 10 12 12 12 32 32 32 32
1: 12 10 12 12 32 32 32 32
2: 12 12 10 12 32 32 32 32
3: 12 12 12 10 32 32 32 32
4: 32 32 32 32 10 12 12 12
5: 32 32 32 32 12 10 12 12
6: 32 32 32 32 12 12 10 12
7: 32 32 32 32 12 12 12 10
因此,我相信它们作为工作后端具有不同的响应时间,在服务器 1 上约为 8 毫秒,在第二台服务器上约为 4-5 毫秒。
此问题是否是由于 numa 配置错误造成的?我怎样才能在 server1 上实现与第二台服务器相同的利用率?
编辑:由于这些任务是 uwsgi 进程,我可以在 uwsgi 配置中为它们设置 CPU 绑定并获得我想要的结果。但我描述的行为对我来说仍然很奇怪。