谢谢

谢谢

我编写了一个非常简单的 C 程序来演示我所看到的一些内容,以及我试图优化的更复杂的程序。以下是简单示例:

#include <stdio.h>
#include <unistd.h>
#include <inttypes.h>
#include <string.h>
#include <math.h>
#include <time.h>
#include <stdlib.h>
#include <ctype.h>

int main(int argc, char *argv[])
{
    uint64_t loops = 0;
    uint64_t count;

    if (argc > 1) {
        if (sscanf(argv[1], "%" SCNx64, &loops) != 1) {
            fprintf(stderr, "invalid loops %s\n", argv[1]);
            exit(-1);
        }
    }

    printf("loops = %" PRIx64 "\n", loops);
    for (count = 0; count < loops; ++count) {

    }

    return 0;
}

因此现在我在我的 ubuntu 20.04 lenovo 笔记本电脑上运行它几次,使用相同数量的循环以及日期和时间命令来测量执行需要多长时间:

jeff@jeff-ThinkPad-E15:~/opengl/matrix_code/timing$ date; time ./a.out 1000000000; date
Tue 11 May 2021 06:21:51 PM PDT
loops = 1000000000

real    2m15.489s
user    2m15.433s
sys 0m0.004s
Tue 11 May 2021 06:24:07 PM PDT
jeff@jeff-ThinkPad-E15:~/opengl/matrix_code/timing$ date; time ./a.out 1000000000; date
Tue 11 May 2021 06:25:56 PM PDT
loops = 1000000000

real    2m7.822s
user    2m7.792s
sys 0m0.001s
Tue 11 May 2021 06:28:04 PM PDT
jeff@jeff-ThinkPad-E15:~/opengl/matrix_code/timing$

我原本期望这样一个简单的程序的执行时间接近常数,但可以看出执行时间有 5% 的差异。在我的更复杂的程序中,差异要大得多,尽管它没有执行 I/O。它分配内存,执行一堆整数和双浮点运算并释放内存。就是这样。

我可以做些什么来让这些数字更加一致。我希望依靠这些时间上的差异来确定我所做的优化是否具有预期的效果,但如果完全相同的代码和完全相同的数据差异如此之大,我的优化努力就会被这种噪音所蒙蔽。

或者我可以使用其他策略?

谢谢

2021 年 5 月 12 日更新

Doug Smythies,回答您关于 CPU 的更多信息的问题,这足够吗?

jeff@jeff-ThinkPad-E15:~$ cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 142
model name  : Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz
stepping    : 12
microcode   : 0xde
cpu MHz     : 763.853
cache size  : 6144 KB
physical id : 0
siblings    : 8
core id     : 0
cpu cores   : 4
apicid      : 0
initial apicid  : 0
fpu     : yes
fpu_exception   : yes
cpuid level : 22
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d arch_capabilities
vmx flags   : vnmi preemption_timer invvpid ept_x_only ept_ad ept_1gb flexpriority tsc_offset vtpr mtf vapic ept vpid unrestricted_guest ple pml ept_mode_based_exec
bugs        : spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit srbds
bogomips    : 4199.88
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model       : 142
model name  : Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz
stepping    : 12
microcode   : 0xde
cpu MHz     : 800.081
cache size  : 6144 KB
physical id : 0
siblings    : 8
core id     : 1
cpu cores   : 4
apicid      : 2
initial apicid  : 2
fpu     : yes
fpu_exception   : yes
cpuid level : 22
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d arch_capabilities
vmx flags   : vnmi preemption_timer invvpid ept_x_only ept_ad ept_1gb flexpriority tsc_offset vtpr mtf vapic ept vpid unrestricted_guest ple pml ept_mode_based_exec
bugs        : spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit srbds
bogomips    : 4199.88
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

processor   : 2
vendor_id   : GenuineIntel
cpu family  : 6
model       : 142
model name  : Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz
stepping    : 12
microcode   : 0xde
cpu MHz     : 752.384
cache size  : 6144 KB
physical id : 0
siblings    : 8
core id     : 2
cpu cores   : 4
apicid      : 4
initial apicid  : 4
fpu     : yes
fpu_exception   : yes
cpuid level : 22
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d arch_capabilities
vmx flags   : vnmi preemption_timer invvpid ept_x_only ept_ad ept_1gb flexpriority tsc_offset vtpr mtf vapic ept vpid unrestricted_guest ple pml ept_mode_based_exec
bugs        : spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit srbds
bogomips    : 4199.88
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

processor   : 3
vendor_id   : GenuineIntel
cpu family  : 6
model       : 142
model name  : Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz
stepping    : 12
microcode   : 0xde
cpu MHz     : 800.383
cache size  : 6144 KB
physical id : 0
siblings    : 8
core id     : 3
cpu cores   : 4
apicid      : 6
initial apicid  : 6
fpu     : yes
fpu_exception   : yes
cpuid level : 22
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d arch_capabilities
vmx flags   : vnmi preemption_timer invvpid ept_x_only ept_ad ept_1gb flexpriority tsc_offset vtpr mtf vapic ept vpid unrestricted_guest ple pml ept_mode_based_exec
bugs        : spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit srbds
bogomips    : 4199.88
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

processor   : 4
vendor_id   : GenuineIntel
cpu family  : 6
model       : 142
model name  : Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz
stepping    : 12
microcode   : 0xde
cpu MHz     : 719.028
cache size  : 6144 KB
physical id : 0
siblings    : 8
core id     : 0
cpu cores   : 4
apicid      : 1
initial apicid  : 1
fpu     : yes
fpu_exception   : yes
cpuid level : 22
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d arch_capabilities
vmx flags   : vnmi preemption_timer invvpid ept_x_only ept_ad ept_1gb flexpriority tsc_offset vtpr mtf vapic ept vpid unrestricted_guest ple pml ept_mode_based_exec
bugs        : spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit srbds
bogomips    : 4199.88
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

processor   : 5
vendor_id   : GenuineIntel
cpu family  : 6
model       : 142
model name  : Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz
stepping    : 12
microcode   : 0xde
cpu MHz     : 772.575
cache size  : 6144 KB
physical id : 0
siblings    : 8
core id     : 1
cpu cores   : 4
apicid      : 3
initial apicid  : 3
fpu     : yes
fpu_exception   : yes
cpuid level : 22
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d arch_capabilities
vmx flags   : vnmi preemption_timer invvpid ept_x_only ept_ad ept_1gb flexpriority tsc_offset vtpr mtf vapic ept vpid unrestricted_guest ple pml ept_mode_based_exec
bugs        : spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit srbds
bogomips    : 4199.88
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

processor   : 6
vendor_id   : GenuineIntel
cpu family  : 6
model       : 142
model name  : Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz
stepping    : 12
microcode   : 0xde
cpu MHz     : 800.224
cache size  : 6144 KB
physical id : 0
siblings    : 8
core id     : 2
cpu cores   : 4
apicid      : 5
initial apicid  : 5
fpu     : yes
fpu_exception   : yes
cpuid level : 22
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d arch_capabilities
vmx flags   : vnmi preemption_timer invvpid ept_x_only ept_ad ept_1gb flexpriority tsc_offset vtpr mtf vapic ept vpid unrestricted_guest ple pml ept_mode_based_exec
bugs        : spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit srbds
bogomips    : 4199.88
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

processor   : 7
vendor_id   : GenuineIntel
cpu family  : 6
model       : 142
model name  : Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz
stepping    : 12
microcode   : 0xde
cpu MHz     : 762.521
cache size  : 6144 KB
physical id : 0
siblings    : 8
core id     : 3
cpu cores   : 4
apicid      : 7
initial apicid  : 7
fpu     : yes
fpu_exception   : yes
cpuid level : 22
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d arch_capabilities
vmx flags   : vnmi preemption_timer invvpid ept_x_only ept_ad ept_1gb flexpriority tsc_offset vtpr mtf vapic ept vpid unrestricted_guest ple pml ept_mode_based_exec
bugs        : spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit srbds
bogomips    : 4199.88
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

jeff@jeff-ThinkPad-E15:~$ 

Doug,您报告的服务器上的数字与我预期在笔记本电脑上看到的数字更一致。我们的处理器型号不完全相同,但都是 Intel i5。

2021年5月13日更新:

道格,我按照建议安装了 linux-tools-common,但是运行 turbostat 时出现错误:

jeff@jeff-ThinkPad-E15:~$ which turbostat
jeff@jeff-ThinkPad-E15:~$ sudo apt install linux-tools-common
[sudo] password for jeff: 
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages were automatically installed and are no longer required:
  libfprint-2-tod1 libllvm10 python3-pyxattr
Use 'sudo apt autoremove' to remove them.
The following NEW packages will be installed:
  linux-tools-common
0 upgraded, 1 newly installed, 0 to remove and 15 not upgraded.
Need to get 217 kB of archives.
After this operation, 687 kB of additional disk space will be used.
Get:1 http://us.archive.ubuntu.com/ubuntu focal-updates/main amd64 linux-tools-common all 5.4.0-73.82 [217 kB]
Fetched 217 kB in 1s (395 kB/s)            
Selecting previously unselected package linux-tools-common.
(Reading database ... 200137 files and directories currently installed.)
Preparing to unpack .../linux-tools-common_5.4.0-73.82_all.deb ...
Unpacking linux-tools-common (5.4.0-73.82) ...
Setting up linux-tools-common (5.4.0-73.82) ...
Processing triggers for man-db (2.9.1-1) ...
jeff@jeff-ThinkPad-E15:~$ which turbostat
/usr/bin/turbostat
jeff@jeff-ThinkPad-E15:~$ sudo turbostat --Summary --quiet --show Busy%,Bzy_MHz,IRQ,PkgWatt,PkgTmp,RAMWatt,GFXWatt --interval 15
WARNING: turbostat not found for kernel 5.8.0-53

  You may need to install the following packages for this specific kernel:
    linux-tools-5.8.0-53-generic
    linux-cloud-tools-5.8.0-53-generic

  You may also want to install one of the following packages to keep up to date:
    linux-tools-generic
    linux-cloud-tools-generic
jeff@jeff-ThinkPad-E15:~$ 

当我在谷歌上搜索该警告文本时,我发现一些人遵循了 turbostat 提供的建议,但遇到了更多包括依赖关系中断在内的问题。

对于尝试风险最小的事情的建议,我们非常感激。8^)

2021-5-14更新:

道格,你写道:

for your turbostat troubles see this bug report. Just by-pass the annoying wrapper and run it directly. Are you up to date with everything? you seem to be running the kernel for the hwe version whereas linux -tools-common expects the non-hwe kernel. – Doug Smythies 20 hours ago
installing linux-tools-5.8.0-53-generic should help and yet not break other dependencies. Note that I don't do it this way, I use the master turbostat as compiled directly from the master kernel source tree. – Doug Smythies 7 hours ago

我查看了该脚本,但无法找到直接运行 turbostat 的任何方法。除了该脚本之外,该软件包似乎并没有实际安装名为 turbostat 的可执行文件。无论如何,我决定像您建议的那样,在 sysfs noturbo 条目中插入一个 0,而不是摸索着解决这个 turbostat 安装问题,如下所示:

jeff@jeff-ThinkPad-E15:~$ cat /sys/devices/system/cpu/intel_pstate/no_turbo
0
jeff@jeff-ThinkPad-E15:~$ echo 1 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo
1
jeff@jeff-ThinkPad-E15:~$ cat /sys/devices/system/cpu/intel_pstate/no_turbo
1
jeff@jeff-ThinkPad-E15:~$ 

效果非常好!现在,执行时间虽然如您所预测的那样慢,但却是一致的(至少比以前一致得多):

jeff@jeff-ThinkPad-E15:~/opengl/matrix_code/timing$ date; time ./a.out 1000000000; date
Fri 14 May 2021 01:51:47 PM PDT
loops = 1000000000

real    4m7.149s
user    4m7.122s
sys 0m0.000s
Fri 14 May 2021 01:55:54 PM PDT
jeff@jeff-ThinkPad-E15:~/opengl/matrix_code/timing$ date; time ./a.out 1000000000; date
Fri 14 May 2021 01:55:59 PM PDT
loops = 1000000000

real    4m6.895s
user    4m6.882s
sys 0m0.001s
Fri 14 May 2021 02:00:06 PM PDT
jeff@jeff-ThinkPad-E15:~/opengl/matrix_code/timing$ 

至于您关于 hwe 内核的问题,我通过下载当时最新稳定版本的 ubuntu 20.04 LTS ISO 来安装它,将其放在闪存驱动器上,然后从闪存驱动器将其安装在笔记本电脑附带的 Windows 之上。如果这导致安装 hwe 内核,那对我来说就不是什么新鲜事了。8^)

非常感谢!您是一位绅士,也是一位学者。

答案1

您的处理器是移动型号,TDP 较低,为 15 瓦。100% 负载下的 CPU 频率可能不恒定。它可能由于多种原因而降低速度:多个活动核心;功率限制;热量限制;...

为了能够继续工作并确保每次运行的差异仅由代码更改/优化引起,您需要首先找到一个稳定的测试操作点。您需要将最大 CPU 频率限制为小于或等于系统不会因任何原因将其调低的点。

建议使用的监控工具是 turbostat(包含在 linux-tools-common)包中)。此外,尽管可能不会产生任何影响,但您的示例程序是单线程的,因此请尝试强制 CPU 亲和性。示例(我已调用您的程序ask.c并将其编译为ask):

doug@s19:~/tmp$ time taskset -c 5 ./ask 1000000000
loops = 1000000000

real    1m23.237s
user    1m23.236s
sys     0m0.003s

与此同时,turbostat正在运行:

doug@s19:~$ sudo turbostat --Summary --quiet --show Busy%,Bzy_MHz,IRQ,PkgWatt,PkgTmp,RAMWatt,GFXWatt --interval 15
Busy%   Bzy_MHz IRQ     PkgTmp  PkgWatt GFXWatt RAMWatt
0.01    1224    446     36      1.36    0.00    0.89
0.01    800     354     35      1.36    0.00    0.89
4.30    4790    8122    48      12.20   0.00    0.89
8.32    4800    15430   48      22.49   0.00    0.89
8.32    4800    15358   48      22.60   0.00    0.89
8.32    4800    15377   48      22.63   0.00    0.89
8.32    4800    15394   48      22.58   0.00    0.89
8.32    4800    15435   49      22.61   0.00    0.89
0.23    4634    758     36      2.04    0.00    0.89
0.01    800     348     36      1.47    0.00    0.89

请注意,在程序运行时,CPU 频率稳定在 4.8 GHz,并且远未达到可能导致节流的任何其他限制。运行turobstat不带--quiet选项的程序可以了解其中一些限制:

...
cpu8: MSR_TURBO_RATIO_LIMIT: 0x303030303030
48 * 100.0 = 4800.0 MHz max turbo 6 active cores
48 * 100.0 = 4800.0 MHz max turbo 5 active cores
48 * 100.0 = 4800.0 MHz max turbo 4 active cores
48 * 100.0 = 4800.0 MHz max turbo 3 active cores
48 * 100.0 = 4800.0 MHz max turbo 2 active cores
48 * 100.0 = 4800.0 MHz max turbo 1 active cores
...
cpu0: MSR_PKG_POWER_INFO: 0x000003e8 (125 W TDP, RAPL 0 - 0 W, 0.000000 sec.)
cpu0: MSR_PKG_POWER_LIMIT: 0x428440001b83e8 (UNlocked)
cpu0: PKG Limit #1: ENabled (125.000000 Watts, 8.000000 sec, clamp ENabled)
cpu0: PKG Limit #2: ENabled (136.000000 Watts, 0.002441* sec, clamp DISabled)

建议你从禁用 turbo 开始。默认情况下,你应该使用 intel_pstate CPU 频率调整驱动程序,因此:

$ echo 1 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo
1

turbostat

Busy%   Bzy_MHz IRQ     PkgTmp  PkgWatt GFXWatt RAMWatt
8.32    4100    15613   39      13.04   0.00    0.89
8.32    4100    15386   39      13.12   0.00    0.89
8.32    4100    15382   39      13.02   0.00    0.89
8.32    4100    15384   40      13.05   0.00    0.89

注意功率明显下降。当然,程序运行时间也更长:

doug@s19:~/tmp$ time taskset -c 5 ./ask 1000000000
loops = 1000000000

real    1m37.531s
user    1m37.531s
sys     0m0.003s

要直接限制 CPU 频率,请执行以下操作,例如:

doug@s19:~/tmp$ echo 95 | sudo tee /sys/devices/system/cpu/intel_pstate/max_perf_pct
95

turbostat在程序运行期间(显然,我重新启用了涡轮增压):

Busy%   Bzy_MHz IRQ     PkgTmp  PkgWatt GFXWatt RAMWatt
0.02    1266    694     34      1.69    0.00    0.89
0.03    800     681     34      1.39    0.00    0.89
1.70    4553    3662    44      5.01    0.00    0.89
8.32    4600    15387   44      19.24   0.00    0.89
8.32    4600    15374   44      19.46   0.00    0.89

相关内容