Ubuntu 定期冻结

Ubuntu 定期冻结

我遇到了一个奇怪的问题。在新安装的 Ubuntu 18.04 上,系统似乎运行良好。突然,似乎毫无原因,系统挂起了 10 秒或几分钟,我无法执行任何操作。

我尝试让一个顶级实例保持打开状态,RAM/CPU 使用率似乎很好。我在一台 i5 机器上,有 6GB RAM 和 12GB 交换空间。我刚刚测试了 RAM 和磁盘,它们没有错误。

编辑 一些附加信息。我将 CPU 频率调节器设置为性能,因此它始终以最大速度运行。

执行 CPU 密集型操作(例如数据分析)时,此问题更常出现。完成后,GUI 会完全无响应,很难或不可能恢复工作。

编辑 输出grep . -r /sys/firmware/acpi/interrupts

/sys/firmware/acpi/interrupts/gpe2F:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe23:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe1F:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe13:       0  EN     enabled      unmasked
/sys/firmware/acpi/interrupts/gpe0F:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe03:       0         disabled     unmasked
/sys/firmware/acpi/interrupts/gpe3D:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe31:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe2D:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe21:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe1D:       0  EN     enabled      unmasked
/sys/firmware/acpi/interrupts/ff_pwr_btn:       0  EN     enabled      unmasked
/sys/firmware/acpi/interrupts/gpe11:       0     STS invalid      unmasked
/sys/firmware/acpi/interrupts/gpe0D:       0         disabled     unmasked
/sys/firmware/acpi/interrupts/gpe01:       0  EN     enabled      unmasked
/sys/firmware/acpi/interrupts/gpe3B:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe2B:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/ff_rt_clk:       0         disabled     unmasked
/sys/firmware/acpi/interrupts/ff_pmtimer:       0     STS invalid      unmasked
/sys/firmware/acpi/interrupts/gpe1B:       0     STS invalid      unmasked
/sys/firmware/acpi/interrupts/gpe38:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe0B:       0         disabled     unmasked
/sys/firmware/acpi/interrupts/gpe28:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe18:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe08:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe36:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe26:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/error:       0
/sys/firmware/acpi/interrupts/gpe16:       0  EN     enabled      unmasked
/sys/firmware/acpi/interrupts/sci:       4
/sys/firmware/acpi/interrupts/gpe06:       4  EN     enabled      unmasked
/sys/firmware/acpi/interrupts/gpe34:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe24:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe14:       0  EN     enabled      unmasked
/sys/firmware/acpi/interrupts/gpe04:       0         disabled     unmasked
/sys/firmware/acpi/interrupts/gpe3E:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe32:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe2E:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe22:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe1E:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe12:       0  EN     enabled      unmasked
/sys/firmware/acpi/interrupts/gpe0E:       0         disabled     unmasked
/sys/firmware/acpi/interrupts/gpe02:       0  EN     enabled      unmasked
/sys/firmware/acpi/interrupts/gpe3C:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe30:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe2C:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe20:       0         disabled     unmasked
/sys/firmware/acpi/interrupts/gpe1C:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe10:       0     STS invalid      unmasked
/sys/firmware/acpi/interrupts/gpe39:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe0C:       0         disabled     unmasked
/sys/firmware/acpi/interrupts/gpe00:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe3A:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe_all:       4
/sys/firmware/acpi/interrupts/gpe29:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe2A:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe19:       0     STS invalid      unmasked
/sys/firmware/acpi/interrupts/gpe1A:       0     STS invalid      unmasked
/sys/firmware/acpi/interrupts/gpe09:       0         disabled     unmasked
/sys/firmware/acpi/interrupts/gpe37:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe0A:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe27:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe17:       0     STS invalid      unmasked
/sys/firmware/acpi/interrupts/ff_gbl_lock:       0  EN     enabled      unmasked
/sys/firmware/acpi/interrupts/gpe07:       0         enabled      unmasked
/sys/firmware/acpi/interrupts/sci_not:       0
/sys/firmware/acpi/interrupts/gpe35:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe25:       0         disabled     unmasked
/sys/firmware/acpi/interrupts/gpe15:       0  EN     enabled      unmasked
/sys/firmware/acpi/interrupts/gpe05:       0         disabled     unmasked
/sys/firmware/acpi/interrupts/gpe3F:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/gpe33:       0         invalid      unmasked
/sys/firmware/acpi/interrupts/ff_slp_btn:       0         invalid      unmasked

编辑 04/03/2019 我进行了完整的 SMART 测试,现在看起来不太好,至少在我看来是这样。

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
3 Spin_Up_Time            0x0027   179   176   021    Pre-fail  Always       -       4025
4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       218
5 Reallocated_Sector_Ct   0x0033   154   154   140    Pre-fail  Always       -       364
7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
9 Power_On_Hours          0x0032   034   034   000    Old_age   Always       -       48741
10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       217
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       100
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       117
194 Temperature_Celsius     0x0022   089   080   000    Old_age   Always       -       58
196 Reallocated_Event_Count 0x0032   022   022   000    Old_age   Always       -       178
197 Current_Pending_Sector  0x0032   199   199   000    Old_age   Always       -       234
198 Offline_Uncorrectable   0x0030   199   199   000    Old_age   Offline      -       245
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   188   188   000    Old_age   Offline      -       2436
240 Head_Flying_Hours       0x0032   038   038   000    Old_age   Always       -       45709
241 Total_LBAs_Written      0x0032   200   200   000    Old_age   Always       -       81196791754
242 Total_LBAs_Read         0x0032   200   200   000    Old_age   Always       -       75991010629

答案1

我还会检查 CPU 温度并确保冷却风扇正常。如果冷却风扇正常,您可能需要检查是否存在恶意软件/病毒。

此外,有时您可能需要更新 BIOS 以完全适应新操作系统中的新功能(取决于系统)

我发现另一个可能导致系统冻结的原因是您的互联网连接断开,尤其是在更新等过程中,因此也请检查您的互联网连接并确保它没有断开。

有点“盲目猜测”,但也许一个建议会有所帮助。有关您的系统的更多信息(例如主板品牌、型号和版本)可能会有所帮助。

答案2

这只是个人经验,但如果其他建议没有帮助,因为你的 CPU 温度适宜。你可能需要考虑找到另一个与你的主板兼容的类似 CPU,看看装上它是否有助于解决问题。我最近有一个 CPU 坏了,在它完全坏掉之前,它所做的几乎与你描述的事情相同。也可能是某种主板问题,但我会先检查 CPU。我知道获取和测试其他部件可能也不完全实用,但根据我的经验,这种问题往往是某种硬件问题。

如果这两个都不是问题,我会使用磁盘实用程序对硬盘运行 SMART 测试,详细信息如下:如何在当前版本的 Ubuntu 14.04 至 18.10 上检查 SSD 或 HDD 的 SMART 状态?

答案3

尝试调整有关交换的设置。例如,通过运行sudo sysctl vm.swappiness=20,重启后将再次恢复。即使您的内存尚未完全使用,内核也会开始将部分内存交换到磁盘以保留一些空间。选择相当低的值会导致可用空间减少,但交换也会减少。最佳值取决于您的内存大小以及您正在运行的工作负载。

当您找到一个合适的值时,您可以通过添加如下行来永久设置它/etc/sysctl.conf

vm.swappiness=20

有关更多背景信息,请参阅:什么是 swappiness 以及如何改变它?

答案4

从系统监视器获取信息(例如传感器;GUI 可能对你没什么用,传感器) 并将其转储,以便进行事后分析。 RRD工具可能会有用。

您可以输出带有时间和日期的信息,选择转储数据的间隔,获取硬盘温度等。

如何监控和记录服务器硬件温度和负载

温度监测帮助

https://ubuntuforums.org/showthread.php?t=1998005

https://ubuntuforums.org/showthread.php?t=2364408

http://manpages.ubuntu.com/manpages/bionic/man8/turbostat.8.html

http://manpages.ubuntu.com/manpages/trusty/man8/hddtemp.8.html

相关内容