Ubuntu 20.04 - 过热后关机

Ubuntu 20.04 - 过热后关机

我使用 Thinkpad L13。现在,我遇到了散热问题,尤其是在满负荷的情况下。当我运行使用所有核心的 Python 程序时,我的笔记本电脑很快就会关机。

到目前为止我尝试了什么?我在我的机器上安装了 TLP 和 thermald。此外,我将 BIOS 中的 Intel 设置更改为“平衡”。

最近,发生了两件事:

  1. 我已经安装了 Ubuntu 20.04。

  2. 由于我的 ThinkPad 图形问题,他们最近更换了我的主板。也许是硬件问题,比如散热器安装不合适?

在此之前,没有出现任何问题。

grep -i -e temp -e therm /var/log/syslog*此时该命令会产生以下输出:

Apr 29 09:20:50 omikron systemd[1]: Started Daily Cleanup of Temporary Directories.
Apr 29 09:20:50 omikron systemd[1]: Starting Thermal Daemon Service...
Apr 29 09:20:50 omikron kernel: [    0.221560] mce: CPU0: Thermal monitoring enabled (TM1)
Apr 29 09:20:50 omikron kernel: [    0.376125] ACPI: \_SB_.PR00: _OSC native thermal LVT Acked
Apr 29 09:20:50 omikron kernel: [    0.539054] thermal_sys: Registered thermal governor 'fair_share'
Apr 29 09:20:50 omikron kernel: [    0.539055] thermal_sys: Registered thermal governor 'bang_bang'
Apr 29 09:20:50 omikron kernel: [    0.539056] thermal_sys: Registered thermal governor 'step_wise'
Apr 29 09:20:50 omikron kernel: [    0.539056] thermal_sys: Registered thermal governor 'user_space'
Apr 29 09:20:50 omikron kernel: [    0.539057] thermal_sys: Registered thermal governor 'power_allocator'
Apr 29 09:20:50 omikron kernel: [    0.725855] thermal LNXTHERM:00: registered as thermal_zone0
Apr 29 09:20:50 omikron kernel: [    0.725856] ACPI: Thermal Zone [THM0] (31 C)
Apr 29 09:20:50 omikron kernel: [    2.056100] proc_thermal 0000:00:04.0: enabling device (0000 -> 0002)
Apr 29 09:20:50 omikron kernel: [    2.147392] proc_thermal 0000:00:04.0: Creating sysfs group for PROC_THERMAL_PCI
Apr 29 09:20:50 omikron kernel: [    2.412750] thermal thermal_zone5: failed to read out thermal zone (-61)
Apr 29 09:20:50 omikron sensors[826]: temp1:            N/A
Apr 29 09:20:50 omikron sensors[826]: coretemp-isa-0000
Apr 29 09:20:50 omikron sensors[826]: temp1:         +1.0°C
Apr 29 09:20:50 omikron sensors[826]: temp2:         +1.0°C
Apr 29 09:20:50 omikron sensors[826]: temp3:         +4.0°C
Apr 29 09:20:50 omikron sensors[826]: temp4:         +0.0°C
Apr 29 09:20:50 omikron sensors[826]: temp5:       +121.0°C
Apr 29 09:20:50 omikron sensors[826]: temp6:       +121.0°C
Apr 29 09:20:50 omikron sensors[826]: temp7:         +0.0°C
Apr 29 09:20:50 omikron sensors[826]: temp8:         +0.0°C
Apr 29 09:20:50 omikron sensors[826]: temp9:        +64.0°C
Apr 29 09:20:50 omikron sensors[826]: temp10:        +3.0°C
Apr 29 09:20:50 omikron sensors[826]: temp11:       -80.0°C
Apr 29 09:20:50 omikron sensors[826]: temp12:        +0.0°C
Apr 29 09:20:50 omikron sensors[826]: temp13:        +0.0°C
Apr 29 09:20:50 omikron sensors[826]: temp14:        +0.0°C
Apr 29 09:20:50 omikron sensors[826]: temp15:        +0.0°C
Apr 29 09:20:50 omikron sensors[826]: temp16:        +0.0°C
Apr 29 09:20:50 omikron sensors[826]: temp1:        +48.0°C  (crit = +98.0°C)
Apr 29 09:20:50 omikron thermald[822]: [WARN]22 CPUID levels; family:model:stepping 0x6:8e:c (6:142:12)
Apr 29 09:20:50 omikron thermald[822]: [WARN]Polling mode is enabled: 4
Apr 29 09:20:50 omikron thermald[822]: [WARN]sensor id 10 : No temp sysfs for reading raw temp
Apr 29 09:20:50 omikron thermald[822]: message repeated 2 times: [ [WARN]sensor id 10 : No temp sysfs for reading raw temp]
Apr 29 09:20:50 omikron thermald[822]: I/O warning : failed to load external entity "/etc/thermald/thermal-conf.xml"
Apr 29 09:20:50 omikron thermald[822]: [WARN]error: could not parse file /etc/thermald/thermal-conf.xml
Apr 29 09:20:50 omikron thermald[822]: [WARN]sysfs open failed
Apr 29 09:20:50 omikron thermald[822]: I/O warning : failed to load external entity "/etc/thermald/thermal-conf.xml"
Apr 29 09:20:50 omikron thermald[822]: [WARN]error: could not parse file /etc/thermald/thermal-conf.xml
Apr 29 09:20:50 omikron systemd[1]: Started Thermal Daemon Service.
Apr 29 09:20:50 omikron thermald[822]: I/O warning : failed to load external entity "/etc/thermald/thermal-conf.xml"
Apr 29 09:20:50 omikron thermald[822]: [WARN]error: could not parse file /etc/thermald/thermal-conf.xml
Apr 29 09:21:04 omikron gsd-print-notif[1262]: Source ID 3 was not found when attempting to remove it
Apr 29 09:29:01 omikron kernel: [  493.759292] mce: CPU0: Core temperature above threshold, cpu clock throttled (total events = 1)
Apr 29 09:29:01 omikron kernel: [  493.759293] mce: CPU4: Core temperature above threshold, cpu clock throttled (total events = 1)
Apr 29 09:29:01 omikron kernel: [  493.759295] mce: CPU5: Package temperature above threshold, cpu clock throttled (total events = 1)
Apr 29 09:29:01 omikron kernel: [  493.759296] mce: CPU6: Package temperature above threshold, cpu clock throttled (total events = 1)
Apr 29 09:29:01 omikron kernel: [  493.759298] mce: CPU2: Package temperature above threshold, cpu clock throttled (total events = 1)
Apr 29 09:29:01 omikron kernel: [  493.759299] mce: CPU1: Package temperature above threshold, cpu clock throttled (total events = 1)
Apr 29 09:29:01 omikron kernel: [  493.759300] mce: CPU4: Package temperature above threshold, cpu clock throttled (total events = 1)
Apr 29 09:29:01 omikron kernel: [  493.759302] mce: CPU0: Package temperature above threshold, cpu clock throttled (total events = 1)
Apr 29 09:29:01 omikron kernel: [  493.759326] mce: CPU3: Package temperature above threshold, cpu clock throttled (total events = 1)
Apr 29 09:29:01 omikron kernel: [  493.759327] mce: CPU7: Package temperature above threshold, cpu clock throttled (total events = 1)
Apr 29 09:29:01 omikron kernel: [  493.760277] mce: CPU4: Core temperature/speed normal
Apr 29 09:29:01 omikron kernel: [  493.760278] mce: CPU0: Core temperature/speed normal
Apr 29 09:29:01 omikron kernel: [  493.760279] mce: CPU5: Package temperature/speed normal
Apr 29 09:29:01 omikron kernel: [  493.760280] mce: CPU1: Package temperature/speed normal
Apr 29 09:29:01 omikron kernel: [  493.760281] mce: CPU6: Package temperature/speed normal
Apr 29 09:29:01 omikron kernel: [  493.760282] mce: CPU2: Package temperature/speed normal
Apr 29 09:29:01 omikron kernel: [  493.760283] mce: CPU0: Package temperature/speed normal
Apr 29 09:29:01 omikron kernel: [  493.760284] mce: CPU4: Package temperature/speed normal
Apr 29 09:29:01 omikron kernel: [  493.760317] mce: CPU7: Package temperature/speed normal
Apr 29 09:29:01 omikron kernel: [  493.760318] mce: CPU3: Package temperature/speed normal
Apr 29 09:35:50 omikron systemd[1]: Starting Cleanup of Temporary Directories...
Apr 29 09:35:50 omikron systemd[1]: Finished Cleanup of Temporary Directories.
Apr 29 10:14:58 omikron kernel: [ 3250.661431] mce: CPU3: Core temperature above threshold, cpu clock throttled (total events = 1)
Apr 29 10:14:58 omikron kernel: [ 3250.661431] mce: CPU7: Core temperature above threshold, cpu clock throttled (total events = 1)
Apr 29 10:14:58 omikron kernel: [ 3250.661433] mce: CPU0: Package temperature above threshold, cpu clock throttled (total events = 196)
Apr 29 10:14:58 omikron kernel: [ 3250.661434] mce: CPU2: Package temperature above threshold, cpu clock throttled (total events = 196)
Apr 29 10:14:58 omikron kernel: [ 3250.661435] mce: CPU6: Package temperature above threshold, cpu clock throttled (total events = 196)
Apr 29 10:14:58 omikron kernel: [ 3250.661436] mce: CPU4: Package temperature above threshold, cpu clock throttled (total events = 196)
Apr 29 10:14:58 omikron kernel: [ 3250.661437] mce: CPU5: Package temperature above threshold, cpu clock throttled (total events = 196)
Apr 29 10:14:58 omikron kernel: [ 3250.661438] mce: CPU1: Package temperature above threshold, cpu clock throttled (total events = 196)
Apr 29 10:14:58 omikron kernel: [ 3250.661438] mce: CPU7: Package temperature above threshold, cpu clock throttled (total events = 196)
Apr 29 10:14:58 omikron kernel: [ 3250.661440] mce: CPU3: Package temperature above threshold, cpu clock throttled (total events = 196)
Apr 29 10:14:58 omikron kernel: [ 3250.665320] mce: CPU3: Core temperature/speed normal
Apr 29 10:14:58 omikron kernel: [ 3250.665321] mce: CPU7: Core temperature/speed normal
Apr 29 10:14:58 omikron kernel: [ 3250.665322] mce: CPU2: Package temperature/speed normal
Apr 29 10:14:58 omikron kernel: [ 3250.665323] mce: CPU0: Package temperature/speed normal
Apr 29 10:14:58 omikron kernel: [ 3250.665324] mce: CPU4: Package temperature/speed normal
Apr 29 10:14:58 omikron kernel: [ 3250.665325] mce: CPU5: Package temperature/speed normal
Apr 29 10:14:58 omikron kernel: [ 3250.665325] mce: CPU6: Package temperature/speed normal
Apr 29 10:14:58 omikron kernel: [ 3250.665326] mce: CPU1: Package temperature/speed normal
Apr 29 10:14:58 omikron kernel: [ 3250.665327] mce: CPU7: Package temperature/speed normal
Apr 29 10:14:58 omikron kernel: [ 3250.665328] mce: CPU3: Package temperature/speed normal
Apr 29 10:20:05 omikron kernel: [ 3557.746988] mce: CPU4: Core temperature above threshold, cpu clock throttled (total events = 323)
Apr 29 10:20:05 omikron kernel: [ 3557.746989] mce: CPU0: Core temperature above threshold, cpu clock throttled (total events = 323)
Apr 29 10:20:05 omikron kernel: [ 3557.746991] mce: CPU7: Package temperature above threshold, cpu clock throttled (total events = 650)
Apr 29 10:20:05 omikron kernel: [ 3557.746992] mce: CPU3: Package temperature above threshold, cpu clock throttled (total events = 650)
Apr 29 10:20:05 omikron kernel: [ 3557.746993] mce: CPU0: Package temperature above threshold, cpu clock throttled (total events = 650)
Apr 29 10:20:05 omikron kernel: [ 3557.746994] mce: CPU4: Package temperature above threshold, cpu clock throttled (total events = 650)
Apr 29 10:20:05 omikron kernel: [ 3557.747022] mce: CPU1: Package temperature above threshold, cpu clock throttled (total events = 650)
Apr 29 10:20:05 omikron kernel: [ 3557.747023] mce: CPU5: Package temperature above threshold, cpu clock throttled (total events = 650)
Apr 29 10:20:05 omikron kernel: [ 3557.747025] mce: CPU2: Package temperature above threshold, cpu clock throttled (total events = 650)
Apr 29 10:20:05 omikron kernel: [ 3557.747026] mce: CPU6: Package temperature above threshold, cpu clock throttled (total events = 650)
Apr 29 10:20:05 omikron kernel: [ 3557.749589] mce: CPU4: Core temperature/speed normal
Apr 29 10:20:05 omikron kernel: [ 3557.749590] mce: CPU0: Core temperature/speed normal
Apr 29 10:20:05 omikron kernel: [ 3557.749591] mce: CPU7: Package temperature/speed normal
Apr 29 10:20:05 omikron kernel: [ 3557.749591] mce: CPU3: Package temperature/speed normal
Apr 29 10:20:05 omikron kernel: [ 3557.749592] mce: CPU0: Package temperature/speed normal
Apr 29 10:20:05 omikron kernel: [ 3557.749593] mce: CPU4: Package temperature/speed normal
Apr 29 10:20:05 omikron kernel: [ 3557.749625] mce: CPU5: Package temperature/speed normal
Apr 29 10:20:05 omikron kernel: [ 3557.749626] mce: CPU1: Package temperature/speed normal
Apr 29 10:20:05 omikron kernel: [ 3557.749627] mce: CPU6: Package temperature/speed normal
Apr 29 10:20:05 omikron kernel: [ 3557.749628] mce: CPU2: Package temperature/speed normal
Apr 29 10:23:09 omikron kernel: [ 3741.654959] thermal thermal_zone0: critical temperature reached (100 C), shutting down

编辑(2020 年 5 月 1 日):

今天,我参加了一个 Zoom 会议,笔记本电脑发热严重,以至于在会议期间关机。这不应该发生吧?这是怎么回事?我在这里没有进行复杂的计算。也许这与电源有关,因为我已经把它放进去了?


编辑(2020 年 5 月 9 日):

我将性能设置调到最高级别,并进行了与我的笔记本的各种温度评测中相同的压力测试。在 Windows 上,我得到的值与他们得到的值相似。因此,我认为必须是新版 Ubuntu 20.04 的一个问题。不知何故,Ubuntu 不会降低频率,这样温度就会下降。


编辑(2020/07/19):

我联系了联想支持部门,他们修好了我的笔记本(不管他们做了什么)。几个星期以来,它一直运行良好。现在,我又遇到了同样的问题。

我更新了 BIOS 版本,这很有帮助,但又带来了另一个问题:一旦温度接近过热,CPU 就会降速到 400Mhz。结果,我的笔记本几乎无法用于执行要求高的任务。

作为一种可能的解决方案,我停用了英特尔的睿频加速功能。现在温度处于可以忍受的范围内,一切都运行得足够顺畅。这是我愿意接受的妥协。

答案1

就你的情况而言,通过 askubuntu 对硬件+软件系统进行全面诊断比较困难。硬件问题尤其困难。

诊断的第一步的替代方法是与 Ubuntu 20.04 并行安装另一个操作系统,并执行密集测试。

您可以运行相同的 Python 程序(如果您可以将其配置为使用所有核心)。即便如此,它可能不会在与您看到的关机完全相同的条件下运行。市面上有相当多的应用程序可用于测试性能,它们应该足够好(甚至比您的程序更严格)。而且它不会受到您可能的 Ubuntu 20.04 配置的任何“污染”。

稍后,当完整诊断完成后,您可以摆脱该操作系统并为您的 Ubuntu 回收空间。

答案2

尝试这个:

mkdir ~/helper

curl https://raw.githubusercontent.com/Sepero/temp-throttle/stable/temp_throttle.sh -o ~/helper/temp_throttle.sh
chmod +x ~/helper/temp_throttle.sh

cat <<EOF > ~/helper/temp_down.sh 
#!/bin/bash
/usr/bin/sudo -H -S <<< "yourpassword" -p GNOME_SUDO_PASS -u root bash -c '~/helper/temp_throttle.sh 65'
EOF

chmod +x ~/helper/temp_down.sh

测试一下:

  sh ~/helper/temp_down.sh

这只是为了测试它是否有效,我不建议将密码插入容易获取的文本文件中。

您可以将其添加到启动应用程序。

答案3

BIOS 更新确实解决了这个问题。

相关内容