我遇到了一个奇怪的问题。在新安装的 Ubuntu 18.04 上,系统似乎运行良好。突然,似乎毫无原因,系统挂起了 10 秒或几分钟,我无法执行任何操作。
我尝试让一个顶级实例保持打开状态,RAM/CPU 使用率似乎很好。我在一台 i5 机器上,有 6GB RAM 和 12GB 交换空间。我刚刚测试了 RAM 和磁盘,它们没有错误。
编辑 一些附加信息。我将 CPU 频率调节器设置为性能,因此它始终以最大速度运行。
执行 CPU 密集型操作(例如数据分析)时,此问题更常出现。完成后,GUI 会完全无响应,很难或不可能恢复工作。
编辑
输出grep . -r /sys/firmware/acpi/interrupts
/sys/firmware/acpi/interrupts/gpe2F: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe23: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe1F: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe13: 0 EN enabled unmasked
/sys/firmware/acpi/interrupts/gpe0F: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe03: 0 disabled unmasked
/sys/firmware/acpi/interrupts/gpe3D: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe31: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe2D: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe21: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe1D: 0 EN enabled unmasked
/sys/firmware/acpi/interrupts/ff_pwr_btn: 0 EN enabled unmasked
/sys/firmware/acpi/interrupts/gpe11: 0 STS invalid unmasked
/sys/firmware/acpi/interrupts/gpe0D: 0 disabled unmasked
/sys/firmware/acpi/interrupts/gpe01: 0 EN enabled unmasked
/sys/firmware/acpi/interrupts/gpe3B: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe2B: 0 invalid unmasked
/sys/firmware/acpi/interrupts/ff_rt_clk: 0 disabled unmasked
/sys/firmware/acpi/interrupts/ff_pmtimer: 0 STS invalid unmasked
/sys/firmware/acpi/interrupts/gpe1B: 0 STS invalid unmasked
/sys/firmware/acpi/interrupts/gpe38: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe0B: 0 disabled unmasked
/sys/firmware/acpi/interrupts/gpe28: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe18: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe08: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe36: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe26: 0 invalid unmasked
/sys/firmware/acpi/interrupts/error: 0
/sys/firmware/acpi/interrupts/gpe16: 0 EN enabled unmasked
/sys/firmware/acpi/interrupts/sci: 4
/sys/firmware/acpi/interrupts/gpe06: 4 EN enabled unmasked
/sys/firmware/acpi/interrupts/gpe34: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe24: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe14: 0 EN enabled unmasked
/sys/firmware/acpi/interrupts/gpe04: 0 disabled unmasked
/sys/firmware/acpi/interrupts/gpe3E: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe32: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe2E: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe22: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe1E: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe12: 0 EN enabled unmasked
/sys/firmware/acpi/interrupts/gpe0E: 0 disabled unmasked
/sys/firmware/acpi/interrupts/gpe02: 0 EN enabled unmasked
/sys/firmware/acpi/interrupts/gpe3C: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe30: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe2C: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe20: 0 disabled unmasked
/sys/firmware/acpi/interrupts/gpe1C: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe10: 0 STS invalid unmasked
/sys/firmware/acpi/interrupts/gpe39: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe0C: 0 disabled unmasked
/sys/firmware/acpi/interrupts/gpe00: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe3A: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe_all: 4
/sys/firmware/acpi/interrupts/gpe29: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe2A: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe19: 0 STS invalid unmasked
/sys/firmware/acpi/interrupts/gpe1A: 0 STS invalid unmasked
/sys/firmware/acpi/interrupts/gpe09: 0 disabled unmasked
/sys/firmware/acpi/interrupts/gpe37: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe0A: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe27: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe17: 0 STS invalid unmasked
/sys/firmware/acpi/interrupts/ff_gbl_lock: 0 EN enabled unmasked
/sys/firmware/acpi/interrupts/gpe07: 0 enabled unmasked
/sys/firmware/acpi/interrupts/sci_not: 0
/sys/firmware/acpi/interrupts/gpe35: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe25: 0 disabled unmasked
/sys/firmware/acpi/interrupts/gpe15: 0 EN enabled unmasked
/sys/firmware/acpi/interrupts/gpe05: 0 disabled unmasked
/sys/firmware/acpi/interrupts/gpe3F: 0 invalid unmasked
/sys/firmware/acpi/interrupts/gpe33: 0 invalid unmasked
/sys/firmware/acpi/interrupts/ff_slp_btn: 0 invalid unmasked
编辑 04/03/2019 我进行了完整的 SMART 测试,现在看起来不太好,至少在我看来是这样。
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 179 176 021 Pre-fail Always - 4025
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 218
5 Reallocated_Sector_Ct 0x0033 154 154 140 Pre-fail Always - 364
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 034 034 000 Old_age Always - 48741
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 217
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 100
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 117
194 Temperature_Celsius 0x0022 089 080 000 Old_age Always - 58
196 Reallocated_Event_Count 0x0032 022 022 000 Old_age Always - 178
197 Current_Pending_Sector 0x0032 199 199 000 Old_age Always - 234
198 Offline_Uncorrectable 0x0030 199 199 000 Old_age Offline - 245
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 188 188 000 Old_age Offline - 2436
240 Head_Flying_Hours 0x0032 038 038 000 Old_age Always - 45709
241 Total_LBAs_Written 0x0032 200 200 000 Old_age Always - 81196791754
242 Total_LBAs_Read 0x0032 200 200 000 Old_age Always - 75991010629
答案1
我还会检查 CPU 温度并确保冷却风扇正常。如果冷却风扇正常,您可能需要检查是否存在恶意软件/病毒。
此外,有时您可能需要更新 BIOS 以完全适应新操作系统中的新功能(取决于系统)
我发现另一个可能导致系统冻结的原因是您的互联网连接断开,尤其是在更新等过程中,因此也请检查您的互联网连接并确保它没有断开。
有点“盲目猜测”,但也许一个建议会有所帮助。有关您的系统的更多信息(例如主板品牌、型号和版本)可能会有所帮助。
答案2
这只是个人经验,但如果其他建议没有帮助,因为你的 CPU 温度适宜。你可能需要考虑找到另一个与你的主板兼容的类似 CPU,看看装上它是否有助于解决问题。我最近有一个 CPU 坏了,在它完全坏掉之前,它所做的几乎与你描述的事情相同。也可能是某种主板问题,但我会先检查 CPU。我知道获取和测试其他部件可能也不完全实用,但根据我的经验,这种问题往往是某种硬件问题。
如果这两个都不是问题,我会使用磁盘实用程序对硬盘运行 SMART 测试,详细信息如下:如何在当前版本的 Ubuntu 14.04 至 18.10 上检查 SSD 或 HDD 的 SMART 状态?
答案3
尝试调整有关交换的设置。例如,通过运行sudo sysctl vm.swappiness=20
,重启后将再次恢复。即使您的内存尚未完全使用,内核也会开始将部分内存交换到磁盘以保留一些空间。选择相当低的值会导致可用空间减少,但交换也会减少。最佳值取决于您的内存大小以及您正在运行的工作负载。
当您找到一个合适的值时,您可以通过添加如下行来永久设置它/etc/sysctl.conf
:
vm.swappiness=20
有关更多背景信息,请参阅:什么是 swappiness 以及如何改变它?
答案4
从系统监视器获取信息(例如传感器;GUI 可能对你没什么用,传感器) 并将其转储,以便进行事后分析。 RRD工具可能会有用。
您可以输出带有时间和日期的信息,选择转储数据的间隔,获取硬盘温度等。
看
https://ubuntuforums.org/showthread.php?t=1998005
https://ubuntuforums.org/showthread.php?t=2364408
http://manpages.ubuntu.com/manpages/bionic/man8/turbostat.8.html
http://manpages.ubuntu.com/manpages/trusty/man8/hddtemp.8.html