即使在高负载下使用 Ubuntu 20.04.4 和 intel_pstate,服务器 CPU 仍停留在 800 MHz

即使在高负载下使用 Ubuntu 20.04.4 和 intel_pstate,服务器 CPU 仍停留在 800 MHz

我的网络服务器在 Intel Core i7-7700 CPU @ 3.60GHz 上运行 Ubuntu Server 20.04.4(内核 5.4.0-124)。它是托管在远程数据中心的专用物理 Supermicro 服务器。

# lshw

....
  *-core
       description: Motherboard
       product: X11SSD-F
       vendor: Supermicro
...
     *-cpu
          description: CPU
          product: Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
          vendor: Intel Corp.
          physical id: 12
          bus info: cpu@0
          version: Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
          serial: To Be Filled By O.E.M.
          slot: CPU
          size: 800MHz
          capacity: 4200MHz
          width: 64 bits
          clock: 100MHz
          capabilities: lm fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp x86-64 constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d arch_capabilities cpufreq
          configuration: cores=4 enabledcores=4 threads=8

https://askubuntu.com/questions/916382/ubuntu-get-actual-current-cpu-clock-speed
# lscpu | grep MHz
CPU MHz:                         800.010
CPU max MHz:                     4200.0000
CPU min MHz:                     800.0000

但是,即使在 100% 负载下,最大频率也限制在 800 MHz:

# watch -n1 "cat /proc/cpuinfo | grep -i Hz"

model name      : Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
cpu MHz         : 800.016
model name      : Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
cpu MHz         : 800.023
model name      : Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
cpu MHz         : 800.032
model name      : Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
cpu MHz         : 800.024
model name      : Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
cpu MHz         : 800.011
model name      : Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
cpu MHz         : 800.021
model name      : Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
cpu MHz         : 800.010
model name      : Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
cpu MHz         : 800.038

# sudo apt install i7z -y
# sudo i7z

Cpu speed from cpuinfo 3600.00Mhz
cpuinfo might be wrong if cpufreq is enabled. To guess correctly try estimating via tsc
Linux's inbuilt cpu_khz code emulated now
True Frequency (without accounting Turbo) 3599 MHz
  CPU Multiplier 36x || Bus clock frequency (BCLK) 99.97 MHz

Socket [0] - [physical cores=4, logical cores=8, max online cores ever=4]
  TURBO ENABLED on 4 Cores, Hyper Threading ON
  Max Frequency without considering Turbo 3698.97 MHz (99.97 x [37])
  Max TURBO Multiplier (if Enabled) with 1/2/3/4 Cores is  36x/36x/36x/36x
  Real Current Frequency 799.78 MHz [99.97 x 8.00] (Max of below)
        Core [core-id]  :Actual Freq (Mult.)      C0%   Halt(C1)%  C3 %   C6 %  Temp      VCore
        Core 1 [0]:       799.78 (8.00x)        13.3      97       0       0    27      0.6636
        Core 2 [1]:       799.78 (8.00x)          25    90.1     1.4    2.91    27      0.6636
        Core 3 [2]:       799.77 (8.00x)        3.61    95.7       1    2.55    27      0.6639
        Core 4 [3]:       799.78 (8.00x)         100    77.8       0       0    29      0.6644

通过cpufreq-info我发现调速器已设置为powersave,因此我立即切换到performance(并重新启动)

https://askubuntu.com/a/1049313/181869
sudo apt install cpufrequtils -y
echo 'GOVERNOR="performance"' | sudo tee /etc/default/cpufrequtils
sudo systemctl ondemand disable

# cpufreq-set -g performance -r

# cpufreq-info

analyzing CPU 0:
  driver: intel_pstate
  CPUs which run at the same hardware frequency: 0
  CPUs which need to have their frequency coordinated by software: 0
  maximum transition latency: 4294.55 ms.
  hardware limits: 800 MHz - 4.20 GHz
  available cpufreq governors: performance, powersave
  current policy: frequency should be within 800 MHz and 4.20 GHz.
                  The governor "performance" may decide which speed to use
                  within this range.
  current CPU frequency is 800 MHz.

<repeat for each other core>

# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
performance
performance
performance
performance
performance
performance
performance
performance

禁用ondemand是至关重要的:如果我不这样做,ondemand就将调速器切换回powersave

不过,从 MHz 角度来看并没有什么区别:无论负载有多大,核心仍然限制在 800 MHz。

温度非常低,所以这不是热节流阀:

# sudo apt install lm-sensors -y && sudo sensors-detect
...

# sudo sensors

power_meter-acpi-0
Adapter: ACPI interface
power1:       23.00 W  (interval = 4294967.29 s)

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +28.0�C  (high = +80.0�C, crit = +100.0�C)
Core 0:        +25.0�C  (high = +80.0�C, crit = +100.0�C)
Core 1:        +27.0�C  (high = +80.0�C, crit = +100.0�C)
Core 2:        +26.0�C  (high = +80.0�C, crit = +100.0�C)
Core 3:        +28.0�C  (high = +80.0�C, crit = +100.0�C)

acpitz-acpi-0
Adapter: ACPI interface
temp1:        +27.0�C  (crit = +119.0�C)
temp2:        +27.0�C  (crit = +119.0�C)

pch_skylake-virtual-0
Adapter: Virtual device
temp1:        +32.5�C

i350bb-pci-0200
Adapter: PCI adapter
loc1:         +54.0�C  (high = +120.0�C, crit = +110.0�C)

max_perf_pct不限于:

# cd /sys/devices/system/cpu/intel_pstate && grep -r .
no_turbo:0
num_pstates:35
status:active
turbo_pct:18
max_perf_pct:100
hwp_dynamic_boost:0
min_perf_pct:100

注意:我通过nano上述方法min_perf_pct手动将其更改为100

我也尝试在 grub 中设置GRUB_CMDLINE_LINUX_DEFAULT="intel_pstate=disable"。它产生了一个小的差异:活动驱动程序现在是acpi-cpufreq,cpufreq-info 现在声称 CPU 以 3.60 GHz 运行(应该如此)....但这是错误的:正如所证实的任何我运行的其他命令,CPU 仍然为 800 MHz

# cpufreq-info
cpufrequtils 008: cpufreq-info (C) Dominik Brodowski 2004-2009
Report errors and bugs to [email protected], please.
analyzing CPU 0:
  driver: acpi-cpufreq
  CPUs which run at the same hardware frequency: 0
  CPUs which need to have their frequency coordinated by software: 0
  maximum transition latency: 10.0 us.
  hardware limits: 800 MHz - 3.60 GHz
  available frequency steps: 3.60 GHz, 3.60 GHz, 3.40 GHz, 3.20 GHz, 3.00 GHz, 2.80 GHz, 2.60 GHz, 2.40 GHz, 2.20 GHz, 2.00 GHz, 1.80 GHz, 1.60 GHz, 1.40 GHz, 1.20 GHz, 1000 MHz, 800 MHz
  available cpufreq governors: conservative, ondemand, userspace, powersave, performance, schedutil
  current policy: frequency should be within 800 MHz and 3.60 GHz.
                  The governor "performance" may decide which speed to use
                  within this range.
  current CPU frequency is 3.60 GHz (asserted by call to hardware).
  cpufreq stats: 3.60 GHz:100.00%, 3.60 GHz:0.00%, 3.40 GHz:0.00%, 3.20 GHz:0.00%, 3.00 GHz:0.00%, 2.80 GHz:0.00%, 2.60 GHz:0.00%, 2.40 GHz:0.00%, 2.20 GHz:0.00%, 2.00 GHz:0.00%, 1.80 GHz:0.00%, 1.60 GHz:0.00%, 1.40 GHz:0.00%, 1.20 GHz:0.00%, 1000 MHz:0.00%, 800 MHz:0.00%  (1)

....

正如Doug 在评论中所建议的:

# sudo apt install linux-tools-common linux-tools-generic -y
# sudo turbostat --Summary --show Busy%,Bzy_MHz,IRQ,PkgWatt,PkgTmp,RAMWatt,GFXWatt,CorWatt --interval 15
turbostat version 19.08.31 - Len Brown <[email protected]>
CPUID(0): GenuineIntel 0x16 CPUID levels; 0x80000008 xlevels; family:model:stepping 0x6:9e:9 (6:158:9)
CPUID(1): SSE3 MONITOR SMX EIST TM2 TSC MSR ACPI-TM HT TM
CPUID(6): APERF, TURBO, DTS, PTM, HWP, HWPnotify, HWPwindow, HWPepp, No-HWPpkg, EPB
cpu6: MSR_IA32_MISC_ENABLE: 0x00850089 (TCC EIST MWAIT PREFETCH TURBO)
CPUID(7): SGX
cpu6: MSR_IA32_FEATURE_CONTROL: 0x00000005 (Locked )
CPUID(0x15): eax_crystal: 2 ebx_tsc: 300 ecx_crystal_hz: 0
TSC: 3600 MHz (24000000 Hz * 300 / 2 / 1000000)
CPUID(0x16): base_mhz: 3600 max_mhz: 3600 bus_mhz: 100
cpu6: MSR_MISC_PWR_MGMT: 0x00401cc0 (ENable-EIST_Coordination DISable-EPB DISable-OOB)
RAPL: 4033 sec. Joule Counter Range, at 65 Watts
cpu6: MSR_PLATFORM_INFO: 0x88080838f1012400
8 * 100.0 = 800.0 MHz max efficiency frequency
36 * 100.0 = 3600.0 MHz base frequency
cpu6: MSR_IA32_POWER_CTL: 0x403c005d (C1E auto-promotion: DISabled)
cpu6: MSR_TURBO_RATIO_LIMIT: 0x24242424
36 * 100.0 = 3600.0 MHz max turbo 4 active cores
36 * 100.0 = 3600.0 MHz max turbo 3 active cores
36 * 100.0 = 3600.0 MHz max turbo 2 active cores
36 * 100.0 = 3600.0 MHz max turbo 1 active cores
cpu6: MSR_CONFIG_TDP_NOMINAL: 0x00000024 (base_ratio=36)
cpu6: MSR_CONFIG_TDP_LEVEL_1: 0x00000000 ()
cpu6: MSR_CONFIG_TDP_LEVEL_2: 0x00000000 ()
cpu6: MSR_CONFIG_TDP_CONTROL: 0x80000000 ( lock=1)
cpu6: MSR_TURBO_ACTIVATION_RATIO: 0x00000000 (MAX_NON_TURBO_RATIO=0 lock=0)
cpu6: MSR_PKG_CST_CONFIG_CONTROL: 0x7e008006 (UNdemote-C3, UNdemote-C1, demote-C3, demote-C1, locked, pkg-cstate-limit=6 (pc8))
cpu6: cpufreq driver: intel_pstate
cpu6: cpufreq governor: performance
cpufreq intel_pstate no_turbo: 0
cpu6: MSR_MISC_FEATURE_CONTROL: 0x00000000 (L2-Prefetch L2-Prefetch-pair L1-Prefetch L1-IP-Prefetch)
cpu0: MSR_PM_ENABLE: 0x00000001 (HWP)
cpu0: MSR_HWP_CAPABILITIES: 0x0108242a (high 42 guar 36 eff 8 low 1)
cpu0: MSR_HWP_REQUEST: 0x00002a2a (min 42 max 42 des 0 epp 0x0 window 0x0 pkg 0x0)
cpu0: MSR_HWP_INTERRUPT: 0x00000000 (Dis_Guaranteed_Perf_Change, Dis_Excursion_Min)
cpu0: MSR_HWP_STATUS: 0x00000004 (No-Guaranteed_Perf_Change, No-Excursion_Min)
cpu0: MSR_IA32_ENERGY_PERF_BIAS: 0x00000006 (balanced)
cpu0: MSR_RAPL_POWER_UNIT: 0x000a0e03 (0.125000 Watts, 0.000061 Joules, 0.000977 sec.)
cpu0: MSR_PKG_POWER_INFO: 0x00000208 (65 W TDP, RAPL 0 - 0 W, 0.000000 sec.)
cpu0: MSR_PKG_POWER_LIMIT: 0x8042028a001b8208 (locked)
cpu0: PKG Limit #1: ENabled (65.000000 Watts, 8.000000 sec, clamp ENabled)
cpu0: PKG Limit #2: DISabled (81.250000 Watts, 0.002441* sec, clamp DISabled)
cpu0: MSR_DRAM_POWER_LIMIT: 0x805400de00000000 (UNlocked)
cpu0: DRAM Limit: DISabled (0.000000 Watts, 0.000977 sec, clamp DISabled)
cpu0: MSR_PP0_POLICY: 0
cpu0: MSR_PP0_POWER_LIMIT: 0x00000000 (UNlocked)
cpu0: Cores Limit: DISabled (0.000000 Watts, 0.000977 sec, clamp DISabled)
cpu0: MSR_PP1_POLICY: 0
cpu0: MSR_PP1_POWER_LIMIT: 0x00000000 (UNlocked)
cpu0: GFX Limit: DISabled (0.000000 Watts, 0.000977 sec, clamp DISabled)
cpu0: MSR_IA32_TEMPERATURE_TARGET: 0x00641400 (100 C)
cpu0: MSR_IA32_PACKAGE_THERM_STATUS: 0x884a010c (26 C)
cpu0: MSR_IA32_PACKAGE_THERM_INTERRUPT: 0x00d5c100 (15 C, 35 C)
cpu6: MSR_PKGC3_IRTL: 0x0000884e (valid, 79872 ns)
cpu6: MSR_PKGC6_IRTL: 0x00008876 (valid, 120832 ns)
cpu6: MSR_PKGC7_IRTL: 0x00008894 (valid, 151552 ns)
cpu6: MSR_PKGC8_IRTL: 0x000088fa (valid, 256000 ns)
cpu6: MSR_PKGC9_IRTL: 0x0000894c (valid, 339968 ns)
cpu6: MSR_PKGC10_IRTL: 0x00008bf2 (valid, 1034240 ns)
Busy%   Bzy_MHz IRQ     PkgTmp  PkgWatt CorWatt GFXWatt RAMWatt
4.48    800     16059   26      2.08    0.34    0.00    1.56
5.81    800     17468   26      2.12    0.38    0.00    1.57
5.78    800     15871   26      2.12    0.38    0.00    1.57
4.55    800     13702   26      2.06    0.32    0.00    1.56

我尝试了其他一些东西:

https://www.reddit.com/r/GarudaLinux/comments/l73vfz/autocpufreq_stuck_at_800_mhz/
# auto-cpufreq --log
auto-cpufreq: command not found

# ls -l /sys/class/power_supply/
total 0

https://askubuntu.com/questions/1307773/in-ubuntu-20-10-cpu-clock-fixed-at-800mhz
# cat /sys/devices/system/cpu/cpu*/cpufreq/bios_limit
cat: '/sys/devices/system/cpu/cpu*/cpufreq/bios_limit': No such file or directory

# service thermald status
Unit thermald.service could not be found.

:/sys/devices/system/cpu/cpu0/cpufreq# grep -r .
energy_performance_available_preferences:default performance balance_performance balance_power power
scaling_min_freq:800000
scaling_available_governors:performance powersave
base_frequency:3600000
scaling_governor:performance
cpuinfo_max_freq:4200000
related_cpus:0
scaling_cur_freq:800010
scaling_setspeed:<unsupported>
affected_cpus:0
scaling_max_freq:4200000
cpuinfo_transition_latency:0
energy_performance_preference:performance
scaling_driver:intel_pstate
cpuinfo_min_freq:800000

我在这里阅读了很多其他问答、论坛和 reddit 主题,所有主题都报告了与我类似的问题,但似乎没有任何效果。

答案1

从 turbostat 标头信息中可以看出,这两行信息:

cpu0: MSR_IA32_PACKAGE_THERM_STATUS: 0x884a010c (26 C)
cpu0: MSR_IA32_PACKAGE_THERM_INTERRUPT: 0x00d5c100 (15 C, 35 C)

由于未知原因,您的热中断寄存器 (IA32_PACKAGE_THERM_INTERRUPT, 0x1B2) 已设置为如果处理器封装温度高于 15 摄氏度,则导致 2 级过热声明;如果处理器封装温度高于 35 摄氏度,则导致 1 级过热声明。两个阈值均已启用。偏移量相对于 TCC(100 摄氏度),分别为 85 摄氏度和 65 摄氏度,因此我的最佳猜测是所需配置为 2 级 = 85 和 1 级 = 65。我认为 65 太低,我建议 75。对这些所需数字进行反向编码将得到 0X8F9300,而不是 0XD5C100。

您可以尝试查找配置错误并进行修复,或者作为测试,并谨慎地直接修改 MSR。要修改 MSR(机器特定寄存器),需要做两件事:

  • 需要加载 msr 模块,如果 turbostat 已运行,则会加载它。否则sudo modprobe msr将加载它。
  • 如果您的内核足够新,则需要启用 MSR 写入,无论是在启动期间通过内核命令行msr.allow_writes=on,还是在运行中,echo on | sudo tee /sys/module/msr/parameters/allow_writes。根据您的 turbostat 版本,您的内核可能较旧。

命令将是sudo wrmsr 0x1b2 0x8f9300并使用 turbostat 或来检查它sudo rdmsr 0x1b2

现在,热状态寄存器 (IA32_PACKAGE_THERM_STATUS, 0x1B1) 指示 PROCHOT 位已置位,并且存在 2 级热条件。我假设 PROCHOT 位是由于上述配置问题引起的,但我可能错了。

注意:我无法用我的处理器来测试这一点(我试过了),因为正常运行存在硬件依赖性(在我的情况下,我的处理器没有对 0x1b2 = 0xd5C100 进行限制)。

参考:Intel® 64 和 IA-32 架构软件开发人员手册

相关内容