硬件 (hardinfo
):
我希望这不是硬件问题......
OS: Ubuntu 21.10
CPU: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz, 1 physical processor; 4 cores; 8 threads
RAM: 7858608 KiB (AKA 8GB)
Motherboard: Lenovo YOGA 730-13IKB / LNVNB161216 (LENOVO)
Graphics: 1920x1080 (Unknown) The X.Org foundation
Storage: (Shows nothing for some reason, but I already opened my computer to clean it to fix this issue and i confirmed it was an NVMe SSD.
Printers: (Irrelevant)
Audio: USB-Audio - USB Device 0x46d
症状:
我所鄙视的。
现在,自从从 Ubuntu 20.04 LTS 升级到 Ubuntu 21.04 以来,我遇到了一些崩溃,这些崩溃:
- 不自动重启
- 自发的
- 仅当插入交流电源时才会发生
- 任何地方均无伐木迹象
- 项目清单
尝试:
不起作用的事情
我尝试重新安装系统至少两次(不知为何我忘记了,但至少两次),在此过程中从 21.04 更新到 21.10。还值得注意的是,我选择了要备份的程序,只选择了以下程序:
- 不自动安装
- 非本地(我可以稍后自己重新安装这些 deb)
- 全部不自动开发库
21.04 和 21.10 在崩溃方面唯一显著的差异是没有(IIRC)。
我还尝试过其他方法:
- BIOS 更新
- 重新安装
thermald
- 禁用
c-states
(由于没有帮助而再次启用它们) - 已尝试
log kernel
(无法正确设置,手动崩溃未提供任何日志) - 设置
persistent journal
(没有发现任何有用的东西,但如果有必要我可以发布它)
额外的
一些可能有帮助的额外信息
我可以提供的最后一条信息是一个文本文件,其中我写了一些我尝试过、怀疑过和失败的事情。它非常混乱(尤其是最后我生气地在文件末尾开始咒骂),但我还是会把它包括在内。
个人日志:
When I updated to Ubuntu 21.04, thing's started going wrong.
I assume schedutil is doing something, as the computer crahes sometimes, no log or anything either.
I checked /var/log/kern.log among others, and I found nothing.
I suspect it's something to do with "P-states" and "C-states".
P-states, which stand for performance states, are used to optimize power consumption during code execution. They can be changed by the OS to change the CPU voltage (in short, change CPU frequency).
C-states on the other hand, are used to optimize/reduce power consumption during idle mode (when no code is being executed).
The typical C-states are:
C0 - CPU is actively running code (P-states)
C1 - CPU uses HLT instruction when idle, the clock is gated off to parts of the core, but it is relatively quick to wake up
C1E - This is actually just C1, except when C1E is enabled, the CPU lowers the CPU's speed & voltage when it is in C1
C2 & up - The CPU will shut off various parts of the core for greater power savings, at the cost of no longer to wake up.
Source: "Controlling Processor C-State Usage in Linux, A Dell technical white paper describing the use of C-states with Linux operating systems"
Anyways, all of this is still happening now, even in 21.10, so this has to be a kernel issue.
Although setting "intel_idle.max_cstate=0" does not stop the crashes, so maybe it's a different problem.
I already used "memtest86" and my system is fine.
I'm going to restart my computer and see if there are any c-state settings in the BIOS/UEFI (are the settings still called BIOS?).
Yeah I checked I couldn't find anything.
The Dell C-state PDF (same in source above) has this section right below the C-states one (which is the first one www) called "Checking C-State Usage". It says:
There are several ways to see how much idle time is being spent in the various C-states.
First check the kernel messages from boot (“dmesg |grep idle” or “grep idle /var/log/messages”, for instance) to see which idle driver is in use.
This is what I got:
sudo dmesg |grep idle
[sudo] password for ws:
[ 0.028186] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
[ 0.076265] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 79635855245 ns
[ 0.100211] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x39a8208cdd2, max_idle_ns: 881590748921 ns
[ 0.104538] process: using mwait in idle threads
[ 0.128722] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[ 0.132319] cpuidle: using governor ladder
[ 0.132322] cpuidle: using governor menu
[ 0.389960] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
[ 1.426615] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x396d4ffc055, max_idle_ns: 881590662783 ns
[ 4.981048] systemd-journald[323]: varlink-22: varlink: setting state idle-server
[ 4.981116] systemd-journald[323]: varlink-22: varlink: changing state idle-server → processing-method
[ 5.336734] systemd-journald[323]: varlink-22: varlink: changing state processed-method → idle-server
[ 5.339242] systemd-journald[323]: varlink-22: varlink: changing state idle-server → pending-disconnect
I don't seem to see anything here, but I remember in the BIOS/UEFI settings something said ACPI instead of RXS or whatever it's called.
I can't open the folder "/proc/acpi/processor/CPU0/power", as it doesn't exist (only reaches "acpi").
Some time later, I decided to reinstall Ubuntu, and so I did, things worked well for the first day (and second) day, and then it crashed.
After some fidling around i decided to run "watch sensors", and I discovered something; when playing osu!, my temperature spiked up to ~95ºC, reaching 99ºC!
Just want to mention that the PC's killswitch is triggered at 100ºC, I was 1 degree away from it, and most of the time, 2 away (~96-98ºC most of the time)!
Another idea, this may be a PSU problem, as I've never seen it crash unplugged..
"kernelUpdateCrash" was this file's old name, now it is "cleaningComputer", I opened up the computer and holy hell there was so much crud in the fans.
Hasn't crashes since I cleaned it! Not really elaborating because this file is so long and also i've opened another computer which will be in another story (I think I'll call it "firstUbuntuInstallation").
Update it crashed again.
It didn't while it was sideways, so I think it's a fan air conduction problem thing.
An askubuntu question had their computer shutting down due to heating, I'm not sure if it is heating in my case but a bios update helped.
Source: "https://askubuntu.com/questions/1232813/ubuntu-20-04-shutdown-after-overheating"
I did it, I had to boot into a Windows PE USB to run the program, but the program didn't work...
So instead of ticking the "Install" option, I chose the "Unpack" option, and it unpacked another executable with the same name except that all the letters were capital now!
Anyways I ran it and it was this weird sketchy setup that appeared to be using WinAPI to put text where it shouldn't be and it wouldn't run without AC power.
I proceeded to plug it in and re-run it, it had a weird and probably broken image of a mascot that was like a pencil?
I attached an 2 images I took with my phone, that's why this story is in a folder.
PC rebooted, it worked, and then the fans started whirring up as if the thing was gonna blow up, never seen it like that, probably a temporary overvoltage of the fans while the computer tried to reboot.
So yeah I changed the title of this again.
It crashed again...
The last time I modified this file was: 2021年10月26日 19時55分59秒.
Now it's: 2021年11月06日 23時10分36秒
I just reintsalled thermald, seems to work, I'm not sure, throttles well I guess.
The "setPerformanceMode.sh" and "setPowersaveMode.sh" scripts that I created (using cpufreq) no longer seem to change anything.
So lets just hope this works, even if thermald was installed by default...
PS: I have i7z on a terminal set to "Always on Top" so I can monitor Frequency, C-states, and the temperature of the CPU cores (4 physical, logical).
Bruh thermald is throttling down to 400MHz while playing.
I tab out of the program it goes back to 1GHz what?
Okay I made Osu! set the FPS cap to V-Sync (60fps) instead of double of that (120fps, which it was before) and it seems to be good even when the computer is not on it's side (it usually didn't crash when on it's side, as the fans were pointing out.
Okay so I was checking the journalctl logs and I got:
"thermald.service: Changed running -> stop-sigterm"
huh...
Oh wait this is at the end of the journal it's probably shutdown 笑.
Keywords checked with "journalctl -g ???":
thermal
shutting
crash
panic
spark
It just crashed while searching in journalctl... Let's investigate with "journalctl -b -1". Huh, the last log is 4 minutes before the crash, okay...
Yeah that's it im asking for help in AskUbuntu, should've done that a long time ago!
Alright now I just have to copy this into the question.
脚注
如果我可以提供其他信息,请发表评论,我会检查
定期更新帖子并进行相应更新。同样,这可能是硬件问题,但它发生在我更新系统时,目前由于 Wayland 和其他一些原因,我无法降级到 20.04 LTS 并继续使用它。