优化热守护进程

优化热守护进程

我刚刚发现thermald以防止机器过热。我想听听关于如何修改 xml 配置文件的一些基本建议。下面是我在 上的一个/etc/thermald/thermal-conf.xml。从我在网上浏览的一些示例来看,它似乎设置为在 55 C 时开始防止过热(如果我读得<Temperature>55000</Temperature>正确的话),但我的核心在风扇运转的情况下甚至达到 94 C。

我正在使用一台Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz机器。

<?xml version="1.0"?>

<!--
use "man thermal-conf.xml" for details
-->

<!-- BEGIN -->
<ThermalConfiguration>
<Platform>
    <Name>Generic X86 Laptop Device</Name>
    <ProductName>EXAMPLE_SYSTEM</ProductName>
    <Preference>QUIET</Preference>
    <ThermalSensors>
        <ThermalSensor>
            <Type>TSKN</Type>
            <AsyncCapable>1</AsyncCapable>
        </ThermalSensor>
    </ThermalSensors>
    <ThermalZones>
        <ThermalZone>
            <Type>SKIN</Type>
            <TripPoints>
                <TripPoint>
                    <SensorType>TSKN</SensorType>
                    <Temperature>55000</Temperature>
                    <type>passive</type>
                    <ControlType>SEQUENTIAL</ControlType>
                    <CoolingDevice>
                        <index>1</index>
                        <type>rapl_controller</type>
                        <influence> 100 </influence>
                        <SamplingPeriod> 16 </SamplingPeriod>
                    </CoolingDevice>
                    <CoolingDevice>
                        <index>2</index>
                        <type>intel_powerclamp</type>
                        <influence> 100 </influence>
                        <SamplingPeriod> 12 </SamplingPeriod>
                    </CoolingDevice>
                </TripPoint>
            </TripPoints>
        </ThermalZone>
    </ThermalZones>
</Platform>

<!-- Thermal configuration example only -->
<Platform>
    <Name>Example Platform Name</Name>
    <!--UUID is optional, if present this will be matched -->
    <!-- Both product name and UUID can contain
        wild card "*", which matches any platform
     -->
    <UUID>Example UUID</UUID>
    <ProductName>Example Product Name</ProductName>
    <Preference>QUIET</Preference>
    <ThermalSensors>
        <ThermalSensor>
            <!-- New Sensor with a type and path -->
            <Type>example_sensor_1</Type>
            <Path>/some_path</Path>
            <AsyncCapable>0</AsyncCapable>
        </ThermalSensor>
        <ThermalSensor>
            <!-- Already present in thermal sysfs,
                enable this or add/change config
                For example, here we are indicating that
                sensor can do async events to avoid polling
            -->
            <Type>example_thermal_sysfs_sensor</Type>
            <!-- If async capable, then we don't need to poll -->
            <AsyncCapable>1</AsyncCapable>
        </ThermalSensor>
        <ThermalSensor>
            <!-- Examle of a virtual sensor. This sensor
                depends on other real sensor or
                virtual sensor.
                E.g. here the temp will be
                 temp of example_sensor_1 * 0.5 + 10
            -->
            <Type>example_virtual_sensor</Type>
            <Virtual>1</Virtual>
            <SensorLink>
                <SensorType>example_sensor_1</SensorType>
                <Multiplier> 0.5 </Multiplier>
                <Offset> 10 </Offset>
            </SensorLink>
        </ThermalSensor>

    </ThermalSensors>
    <ThermalZones>
        <ThermalZone>
            <Type>Example Zone type</Type>
            <TripPoints>
                <TripPoint>
                    <SensorType>example_sensor_1</SensorType>
                    <!-- Temperature at which to take action -->
                    <Temperature> 75000 </Temperature>
                    <!-- max/passive/active
                        If a MAX type is specified, then
                        daemon will use PID control
                        to aggresively throttle to avoid
                        reaching this temp.
                     -->
                    <type>max</type>
                    <!-- SEQUENTIAL | PARALLEL
                    When a trip point temp is violated, then
                    number of cooling device can be activated.
                    If control type is SEQUENTIAL then
                    It will exhaust first cooling device before trying
                    next.
                    -->
                    <ControlType>SEQUENTIAL</ControlType>
                    <CoolingDevice>
                        <index>1</index>
                        <type>example_cooling_device</type>
                        <!-- Influence will be used order cooling devices.
                            First cooling device will be used, which has
                            highest influence.
                        -->
                        <influence> 100 </influence>
                        <!-- Delay in using this cdev, this takes some time
                        too actually cool a zone
                        -->
                        <SamplingPeriod> 12 </SamplingPeriod>
                    </CoolingDevice>
                </TripPoint>

            </TripPoints>
        </ThermalZone>
    </ThermalZones>
    <CoolingDevices>
        <CoolingDevice>
            <!--
                Cooling device can be specified
                by a type and optionally a sysfs path
                If the type already present in thermal sysfs
                no need of a path.
                Compensation can use min/max and step size
                to increasing cool the system.
                Debounce period can be used to force
                a waiting period for action
            -->
            <Type>example_cooling_device</Type>
            <MinState>0</MinState>
            <IncDecStep>10</IncDecStep>
            <ReadBack> 0 </ReadBack>
            <MaxState>50</MaxState>
            <DebouncePeriod>5000</DebouncePeriod>
            <!--
                If there are no PID parameter
                compensation increase step wise and exponentaially
                if single step is not able to change trend.
                Alternatively a PID parameters can be specified
                then next step will use PID calculation using
                provided PID constants.
            -->>
            <PidControl>
                <kp>0.001</kp>
                <kd>0.0001</kd>
                <ki>0.0001</ki>
            </PidControl>
        </CoolingDevice>
    </CoolingDevices>
</Platform>
</ThermalConfiguration>
<!-- END -->

根据@heynnema的建议,我删除了配置文件,停止thermald并运行sudo thermald --no-daemon --loglevel=info。以下是输出,但这如何帮助我构建一个新的、更高效的配置文件?

$ sudo thermald --no-daemon --loglevel=info
[1649408071][INFO]RAPL domain count 1
[1649408071][INFO]RAPL domain count 1
[1649408071][MSG]22 CPUID levels; family:model:stepping 0x6:a5:2 (6:165:2)
[1649408071][INFO]Running on a vanilla kernel
[1649408071][MSG]Polling mode is enabled: 4
[1649408071][INFO]sensor_update: type TSKN
[1649408071][INFO]sensor_update: type acpitz
[1649408071][INFO]sensor_update: type x86_pkg_temp
[1649408071][INFO]sensor_update: type pch_cometlake
[1649408071][INFO]sensor_update: type NGFF
[1649408071][INFO]sensor_update: type TMEM
[1649408071][INFO]sensor_update: type B0D4
[1649408071][INFO]sensor_update: type TVGA
[1649408071][INFO]thd_read_default_thermal_sensors loaded 8 sensors 
[1649408071][INFO]dts /sys/devices/platform/coretemp.0/name doesn't exist
[1649408071][WARN]sensor id 11 : No temp sysfs for reading raw temp
[1649408071][WARN]sensor id 11 : No temp sysfs for reading raw temp
[1649408071][WARN]sensor id 11 : No temp sysfs for reading raw temp
[1649408071][INFO]INT3400 Base path is 
[1649408071][INFO]failed to open /dev/acpi_thermal_rel 
[1649408071][INFO]failed to open /dev/acpi_thermal_rel 
[1649408071][INFO]TRT/ART read failed
[1649408071][INFO]Using config file /etc/thermald/thermal-conf.xml
I/O warning : failed to load external entity "/etc/thermald/thermal-conf.xml"
[1649408071][WARN]error: could not parse file /etc/thermald/thermal-conf.xml
[1649408071][INFO]sensor index:2 TSKN /sys/class/thermal/thermal_zone2/ Async:0 
[1649408071][INFO]sensor index:0 acpitz /sys/class/thermal/thermal_zone0/ Async:0 
[1649408071][INFO]sensor index:7 x86_pkg_temp /sys/class/thermal/thermal_zone7/ Async:1 
[1649408071][INFO]sensor index:5 pch_cometlake /sys/class/thermal/thermal_zone5/ Async:0 
[1649408071][INFO]sensor index:3 NGFF /sys/class/thermal/thermal_zone3/ Async:0 
[1649408071][INFO]sensor index:1 TMEM /sys/class/thermal/thermal_zone1/ Async:0 
[1649408071][INFO]sensor index:6 B0D4 /sys/class/thermal/thermal_zone6/ Async:0 
[1649408071][INFO]sensor index:4 TVGA /sys/class/thermal/thermal_zone4/ Async:0 
[1649408071][INFO]sensor index:8 hwmon /sys/class/hwmon/hwmon5/temp1_input Async:0 
[1649408071][INFO]sensor index:9 hwmon /sys/class/hwmon/hwmon5/temp2_input Async:0 
[1649408071][INFO]sensor index:10 hwmon /sys/class/hwmon/hwmon5/temp3_input Async:0 
[1649408071][INFO]thd_read_default_cooling devices loaded 14 cdevs 
[1649408071][INFO]ppcc limits max:47000000 min:10000000  min_win:28000000 step:1000000
[1649408071][INFO]set_pid_param 14 [-1000.100,10]
[1649408071][INFO]Use Default pstate drv settings
[1649408071][INFO]sysfs create failed 
[1649408071][INFO]INT3400 Base path is 
[1649408071][INFO]failed to open /dev/acpi_thermal_rel 
[1649408071][INFO]failed to open /dev/acpi_thermal_rel 
[1649408071][INFO]TRT/ART read failed
[1649408071][INFO]Using config file /etc/thermald/thermal-conf.xml
I/O warning : failed to load external entity "/etc/thermald/thermal-conf.xml"
[1649408071][WARN]error: could not parse file /etc/thermald/thermal-conf.xml
[1649408071][INFO]name = package-0
[1649408071][INFO]name = dram
[1649408071][INFO]sysfs read failed /sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/intel-rapl:0:2/constraint_0_max_power_uw
[1649408071][INFO]:powercap RAPL invalid max power limit range 
[1649408071][INFO]Calculate dynamically phy_max 
[1649408071][INFO]set_pid_param 18 [-0.4.0,0]
[1649408071][INFO]13: ath10k_thermal, C:0 MN: 0 MX:100 ST:1 pt:/sys/class/thermal/ rd_bk 1 
[1649408071][INFO]1: Processor, C:0 MN: 0 MX:3 ST:1 pt:/sys/class/thermal/ rd_bk 0 
[1649408071][INFO]11: Processor, C:0 MN: 0 MX:3 ST:1 pt:/sys/class/thermal/ rd_bk 0 
[1649408071][INFO]8: Processor, C:0 MN: 0 MX:3 ST:1 pt:/sys/class/thermal/ rd_bk 0 
[1649408071][INFO]6: Processor, C:0 MN: 0 MX:3 ST:1 pt:/sys/class/thermal/ rd_bk 0 
[1649408071][INFO]4: Processor, C:0 MN: 0 MX:3 ST:1 pt:/sys/class/thermal/ rd_bk 0 
[1649408071][INFO]2: Processor, C:0 MN: 0 MX:3 ST:1 pt:/sys/class/thermal/ rd_bk 0 
[1649408071][INFO]12: intel_powerclamp, C:-1 MN: 0 MX:50 ST:5 pt:/sys/class/thermal/ rd_bk 0 
[1649408071][INFO]0: Processor, C:0 MN: 0 MX:3 ST:1 pt:/sys/class/thermal/ rd_bk 0 
[1649408071][INFO]10: Processor, C:0 MN: 0 MX:3 ST:1 pt:/sys/class/thermal/ rd_bk 0 
[1649408071][INFO]9: Processor, C:0 MN: 0 MX:3 ST:1 pt:/sys/class/thermal/ rd_bk 0 
[1649408071][INFO]7: Processor, C:0 MN: 0 MX:3 ST:1 pt:/sys/class/thermal/ rd_bk 0 
[1649408071][INFO]5: Processor, C:0 MN: 0 MX:3 ST:1 pt:/sys/class/thermal/ rd_bk 0 
[1649408071][INFO]3: Processor, C:0 MN: 0 MX:3 ST:1 pt:/sys/class/thermal/ rd_bk 0 
[1649408071][INFO]14: rapl_controller, C:47000000 MN: 47000000 MX:10000000 Inc ST:-2000000 Dec ST:-1000000 pt:/sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/ rd_bk 1 
[1649408071][INFO]15: intel_pstate, C:0 MN: 0 MX:10 ST:1 pt:/sys/devices/system/cpu/intel_pstate/ rd_bk 1 
[1649408071][INFO]16: rapl_controller_dram, C:100000000 MN: 100000000 MX:0 ST:-500000 pt:/sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/intel-rapl:0:2/ rd_bk 1 
[1649408071][INFO]17: LCD, C:0 MN: 0 MX:120000 ST:12000 pt:/sys/class/backlight/intel_backlight/ rd_bk 1 
[1649408071][INFO]18: amdgpu, C:0 MN: 0 MX:0 ST:0 pt: rd_bk 1 
[1649408071][INFO]thd_read_default_thermal_zones loaded 7 zones 
[1649408071][INFO]INT3400 Base path is 
[1649408071][INFO]zone cpu will be created 
[1649408071][INFO]dts zone /sys/devices/platform/coretemp.0/name doesn't exist
[1649408071][INFO]/sys/class/hwmon/hwmon6/name->dell_smm
[1649408071][INFO]/sys/class/hwmon/hwmon4/name->pch_cometlake
[1649408071][INFO]/sys/class/hwmon/hwmon2/name->BAT0
[1649408071][INFO]/sys/class/hwmon/hwmon0/name->AC
[1649408071][INFO]/sys/class/hwmon/hwmon7/name->ath10k_hwmon
[1649408071][INFO]/sys/class/hwmon/hwmon5/name->coretemp
[1649408071][INFO]Buggy max temp: to close to critical 90000
[1649408071][INFO]Core temp DTS :critical 100000, max 90000, psv 95000
[1649408071][INFO]node type: Element, name: CoolingDevice value: rapl_controller
[1649408071][INFO]node type: Element, name: CoolingDevice value: intel_pstate
[1649408071][INFO]node type: Element, name: CoolingDevice value: intel_powerclamp
[1649408071][INFO]node type: Element, name: CoolingDevice value: cpufreq
[1649408071][INFO]node type: Element, name: CoolingDevice value: Processor
[1649408071][INFO]CDEVS order specified in thermal-cpu-cdev-order.xml
[1649408071][INFO]/sys/class/hwmon/hwmon3/name->nouveau
[1649408071][INFO]/sys/class/hwmon/hwmon1/name->acpitz
[1649408071][INFO]INT3400 Base path is 
[1649408071][INFO]failed to open /dev/acpi_thermal_rel 
[1649408071][INFO]failed to open /dev/acpi_thermal_rel 
[1649408071][INFO]TRT/ART read failed
[1649408071][INFO]Using config file /etc/thermald/thermal-conf.xml
I/O warning : failed to load external entity "/etc/thermald/thermal-conf.xml"
[1649408071][WARN]error: could not parse file /etc/thermald/thermal-conf.xml
[1649408071][INFO]

 ZONE DUMP BEGIN
[1649408071][INFO]
[1649408071][INFO]Zone 8: cpu, Active:1 Bind:0 Sensor_cnt:1
[1649408071][INFO]..sensors.. 
[1649408071][INFO]sensor index:7 x86_pkg_temp /sys/class/thermal/thermal_zone7/ Async:1 
[1649408071][INFO]..trips.. 
[1649408071][INFO]index 0: type:passive temp:95000 hyst:0 zone id:8 sensor id:65535 control_type:1 cdev size:4
[1649408071][INFO]cdev[0] rapl_controller, Sampling period: 0
[1649408071][INFO]   target_state:not defined
[1649408071][INFO]cdev[1] intel_pstate, Sampling period: 0
[1649408071][INFO]   target_state:not defined
[1649408071][INFO]cdev[2] intel_powerclamp, Sampling period: 0
[1649408071][INFO]   target_state:not defined
[1649408071][INFO]cdev[3] Processor, Sampling period: 0
[1649408071][INFO]   target_state:not defined
[1649408071][INFO]index 1: type:polling temp:85500 hyst:0 zone id:8 sensor id:7 control_type:0 cdev size:0
[1649408071][INFO]
[1649408071][INFO]

 ZONE DUMP END
[1649408071][INFO]Current user preference is 0
[1649408071][INFO]thd_engine_thread begin

编辑后,这是我的配置文件,但核心温度上升到 90 C:

~$ cat /etc/thermald/thermal-conf.xml
<?xml version="1.0"?>
<ThermalConfiguration>
<Platform>
        <Name>Generic X86 Laptop Device</Name>
        <ProductName>*</ProductName>
        <Preference>QUIET</Preference>
        <ThermalZones>
                <ThermalZone>
                        <Type>cpu</Type>
                        <TripPoints>
                                <TripPoint>
                                        <SensorType>x86_pkg_temp</SensorType>
                                        <Temperature>55000</Temperature>
                                        <type>passive</type>
                                        <ControlType>PARALLEL</ControlType>
                                </TripPoint>
                        </TripPoints>
                </ThermalZone>
        </ThermalZones>
</Platform>
</ThermalConfiguration>

附加信息:

~$ ls -al /etc/thermald
total 32
drwxr-xr-x   2 root      root       4096 Apr  8 16:32 .
drwxr-xr-x 159 root      root      12288 Apr  5 09:03 ..
-rw-r--r--   1 root      root       4605 Jan 15  2019 backup
-rw-rw-r--   1 username username   816 Apr  8 16:32 thermal-conf.xml
-rw-r--r--   1 root      root        508 Jan 15  2019 thermal-cpu-cdev-order.xml

而且这似乎也相关(thermald不活跃?):

$ sudo systemctl status thermald
● thermald.service - Thermal Daemon Service
     Loaded: loaded (/lib/systemd/system/thermald.service; enabled; vendor preset: enabled)
     Active: inactive (dead) since Fri 2022-04-08 10:54:28 CEST; 1 weeks 0 days ago
   Main PID: 1328 (code=exited, status=0/SUCCESS)

Apr 07 11:51:51 Precision-3551 thermald[1328]: error: could not parse file /etc/thermald/thermal-conf.xml
Apr 07 11:51:51 Precision-3551 thermald[1328]: I/O warning : failed to load external entity "/etc/thermald/thermal-conf.xml"
Apr 07 11:51:51 Precision-3551 thermald[1328]: error: could not parse file /etc/thermald/thermal-conf.xml
Apr 07 11:51:51 Precision-3551 thermald[1328]: I/O warning : failed to load external entity "/etc/thermald/thermal-conf.xml"
Apr 07 11:51:51 Precision-3551 thermald[1328]: error: could not parse file /etc/thermald/thermal-conf.xml
Apr 08 10:54:26 Precision-3551 systemd[1]: Stopping Thermal Daemon Service...
Apr 08 10:54:26 Precision-3551 thermald[1328]: Terminating ...
Apr 08 10:54:27 Precision-3551 thermald[1328]: terminating on user request ..
Apr 08 10:54:28 Precision-3551 systemd[1]: thermald.service: Succeeded.
Apr 08 10:54:28 Precision-3551 systemd[1]: Stopped Thermal Daemon Service.

我现在已重新激活它,sudo service thermald restart现在:

$ sudo systemctl status thermald
● thermald.service - Thermal Daemon Service
     Loaded: loaded (/lib/systemd/system/thermald.service; enabled; vendor preset: enabled)
     Active: active (running) since Fri 2022-04-15 22:26:23 CEST; 2s ago
   Main PID: 609438 (thermald)
      Tasks: 2 (limit: 18622)
     Memory: 1.3M
     CGroup: /system.slice/thermald.service
             └─609438 /usr/sbin/thermald --systemd --dbus-enable --adaptive

Apr 15 22:26:23 Precision-3551 systemd[1]: Starting Thermal Daemon Service...
Apr 15 22:26:23 Precision-3551 systemd[1]: Started Thermal Daemon Service.
Apr 15 22:26:23 Precision-3551 thermald[609438]: 22 CPUID levels; family:model:stepping 0x6:a5:2 (6:165:2)
Apr 15 22:26:23 Precision-3551 thermald[609438]: 22 CPUID levels; family:model:stepping 0x6:a5:2 (6:165:2)
Apr 15 22:26:23 Precision-3551 thermald[609438]: Polling mode is enabled: 4
Apr 15 22:26:23 Precision-3551 thermald[609438]: sensor id 11 : No temp sysfs for reading raw temp
Apr 15 22:26:23 Precision-3551 thermald[609438]: sensor id 11 : No temp sysfs for reading raw temp
Apr 15 22:26:23 Precision-3551 thermald[609438]: sensor id 11 : No temp sysfs for reading raw temp

答案1

来自评论:

关于如何配置 thermald 的课程可能需要一段时间。首先检查man thermaldman thermal-conf.xml。您使用的 thermal-conf.xml 文件是通用文件,仅作为示例。首先将其全部删除,然后重新启动 thermald。如果找不到 .xml 文件,它将尝试在默认配置下运行。看看它是如何工作的。否则,停止 thermald,然后使用手动运行它,sudo thermald --no-daemon --loglevel=info让 thermald 告诉您它自己找到了什么,然后使用它来编写您自己的 .xml 文件。

这是我的 thermal-conf.xml 文件...

<?xml version="1.0"?>
<ThermalConfiguration>
<Platform>
        <Name>Dell Inspiron-7700-AIO</Name>
        <ProductName>*</ProductName>
        <Preference>QUIET</Preference>
        <ThermalZones>
                <ThermalZone>
                        <Type>cpu</Type>
                        <TripPoints>
                                <TripPoint>
                                        <SensorType>x86_pkg_temp</SensorType>
                                        <Temperature>65000</Temperature>
                                        <type>passive</type>
                                        <ControlType>PARALLEL</ControlType>
                                        <CoolingDevice>
                                                <index>0</index>
                                                <type>Fan</type>
                                                <influence>30</influence>
                                                <SamplingPeriod>10</SamplingPeriod>
                                        </CoolingDevice>
                                        <CoolingDevice>
                                                <index>5</index>
                                                <type>Processor</type>
                                                <influence>80</influence>
                                                <SamplingPeriod>5</SamplingPeriod>
                                        </CoolingDevice>
                                        <CoolingDevice>
                                                <index>13</index>
                                                <type>intel_powerclamp</type>
                                                <influence>100</influence>
                                                <SamplingPeriod>5</SamplingPeriod>
                                        </CoolingDevice>
                                </TripPoint>
                        </TripPoints>
                </ThermalZone>
        </ThermalZones>
</Platform>
</ThermalConfiguration>

更新#1:

最小的 thermal-conf.xml 文件...

只需编辑 <Name>、<SensorType> 和 <Temperature> 值。然后重新启动 thermald 作为守护进程,或手动观察发生了什么。

<?xml version="1.0"?>
<ThermalConfiguration>
<Platform>
        <Name>Generic</Name>
        <ProductName>*</ProductName>
        <Preference>QUIET</Preference>
        <ThermalZones>
                <ThermalZone>
                        <Type>cpu</Type>
                        <TripPoints>
                                <TripPoint>
                                        <SensorType>x86_pkg_temp</SensorType>
                                        <Temperature>55000</Temperature>
                                </TripPoint>
                        </TripPoints>
                </ThermalZone>
        </ThermalZones>
</Platform>
</ThermalConfiguration>

要对 CPU 进行压力测试并观察温度变化情况,首先安装Vitals https://extensions.gnome.org/extension/1460/vitals/并将其设置为显示 CPU 封装温度和风扇速度。然后在终端中输入“YES”,观察 CPU 温度的变化。您也可以安装应用程序stress来执行与“YES”相同的操作,但控制性更强。

相关内容