Ubuntu 18.04 LTS 使用 AMD GPU 时每次都会挂起

Ubuntu 18.04 LTS 使用 AMD GPU 时每次都会挂起

我最近在笔记本电脑上安装了 18.04 LTS ubuntu。我每天都会遇到这个问题。我的笔记本电脑使用几个小时后就挂断了,什么都不能用,甚至鼠标和键盘都不能用。我已经运行dist-upgrade并安装了图形驱动程序,但什么都不起作用。

需要帮忙

编辑

按照@ElderGeek 的建议。我已经安装了lm-sensors。我看到温度在 43 到 48 摄氏度之间。

这是我的系统信息:

ajit-soman@ajitsoman-X542BA:~$ sudo lshw -short
[sudo] password for ajit-soman: 
H/W path      Device      Class       Description
=================================================
                          system      X542BA
/0                        bus         X542BA
/0/0                      memory      64KiB BIOS
/0/4                      memory      160KiB L1 cache
/0/5                      memory      1MiB L2 cache
/0/28                     memory      8GiB System Memory
/0/28/0                   memory      4GiB SODIMM DDR4 Synchronous Unbuffered (U
/0/28/1                   memory      4GiB SODIMM DDR4 Synchronous Unbuffered (U
/0/30                     processor   AMD A9-9420 RADEON R5, 5 COMPUTE CORES 2C+
/0/100                    bridge      Family 15h (Models 60h-6fh) Processor Root
/0/100/0.2                generic     Family 15h (Models 60h-6fh) I/O Memory Man
/0/100/1                  display     Stoney [Radeon R2/R3/R4/R5 Graphics]
/0/100/1.1                multimedia  Advanced Micro Devices, Inc. [AMD/ATI]
/0/100/2.2                bridge      Family 15h (Models 60h-6fh) Processor Root
/0/100/2.2/0  wlp1s0      network     QCA9565 / AR9565 Wireless Network Adapter
/0/100/2.3                bridge      Family 15h (Models 60h-6fh) Processor Root
/0/100/2.3/0  enp2s0      network     RTL8111/8168/8411 PCI Express Gigabit Ethe
/0/100/2.4                bridge      Family 15h (Models 60h-6fh) Processor Root
/0/100/2.4/0              storage     ASM1062 Serial ATA Controller
/0/100/8                  generic     Advanced Micro Devices, Inc. [AMD]
/0/100/9.2                multimedia  Family 15h (Models 60h-6fh) Audio Controll
/0/100/10                 bus         FCH USB XHCI Controller
/0/100/11                 storage     FCH SATA Controller [AHCI mode]
/0/100/12                 bus         FCH USB EHCI Controller
/0/100/14                 bus         FCH SMBus Controller
/0/100/14.3               bridge      FCH LPC Bridge
/0/100/14.7               generic     FCH SD Flash Controller
/0/101                    bridge      Family 15h (Models 60h-6fh) Host Bridge
/0/102                    bridge      Family 15h (Models 60h-6fh) Host Bridge
/0/103                    bridge      Advanced Micro Devices, Inc. [AMD]
/0/104                    bridge      Advanced Micro Devices, Inc. [AMD]
/0/105                    bridge      Advanced Micro Devices, Inc. [AMD]
/0/106                    bridge      Advanced Micro Devices, Inc. [AMD]
/0/107                    bridge      Advanced Micro Devices, Inc. [AMD]
/0/108                    bridge      Advanced Micro Devices, Inc. [AMD]
/0/109                    bridge      Advanced Micro Devices, Inc. [AMD]
/0/1          scsi0       storage     
/0/1/0.0.0    /dev/sda    disk        1TB ST1000LM035-1RK1
/0/1/0.0.0/1              volume      511MiB Windows FAT volume
/0/1/0.0.0/2  /dev/sda2   volume      931GiB EXT4 volume
/0/2          scsi1       storage     
/0/2/0.0.0    /dev/cdrom  disk        DVDRAM GUE1N
ajit-soman@ajitsoman-X542BA:~$ 

uname -a输出如下

ajit-soman@ajitsoman-X542BA:~$ uname -a
Linux ajitsoman-X542BA 4.15.0-22-generic #24-Ubuntu SMP Wed May 16 12:15:17 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
ajit-soman@ajitsoman-X542BA:~$ 

编辑

正如@WinEunuuchs2Unix 所建议的。我运行journalctl -b-1后发现了这些红色线条。我已将其逐一复制粘贴到下面:

Jun 12 22:10:23 ajitsoman-X542BA kernel: ata2: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0xe frozen Jun 12 22:10:23 ajitsoman-X542BA kernel: ata2: ACPI event

Jun 12 22:22:47 ajitsoman-X542BA kernel: ACPI Error: [^^^PB2_.VGA_.AFN7] Namespace lookup failure, AE_NOT_FOUND (20170831/psargs-364)

Jun 12 22:22:47 ajitsoman-X542BA kernel: ACPI Error: Method parse/execution failed \_SB.PCI0.VGA.LCDD._BCM, AE_NOT_FOUND (20170831/psparse-550
Jun 12 22:22:47 ajitsoman-X542BA kernel: ACPI Error: Evaluating _BCM failed (20170831/video-364)

Jun 12 22:22:47 ajitsoman-X542BA kernel: [drm:hwss_wait_for_blank_complete [amdgpu]] *ERROR* DC: failed to blank crtc!


Jun 12 22:23:09 ajitsoman-X542BA bluetoothd[781]: Failed to set mode: Blocked through rfkill (0x12)


Jun 12 23:39:54 ajitsoman-X542BA kernel: [Firmware Bug]: cpu 0, invalid threshold interrupt offset 1 for bank 4, block 0 (MSR00000413=0xd00000


Jun 12 23:39:54 ajitsoman-X542BA rtkit-daemon[973]: The canary thread is apparently starving. Taking action.
Jun 12 23:39:54 ajitsoman-X542BA kernel: ata2.00: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0xe frozen
Jun 12 23:39:54 ajitsoman-X542BA kernel: ata2.00: ACPI event
Jun 12 23:39:54 ajitsoman-X542BA kernel: ata2.00: cmd a0/00:00:00:08:00/00:00:00:00:00/a0 tag 10 pio 16392 in
                                                  Get event status notification 4a 01 00 00 10 00 00 00 08 00res 50/00:03:00:00:00/00:00:00:00
Jun 12 23:39:54 ajitsoman-X542BA kernel: ata2.00: status: { DRDY }
Jun 12 23:39:54 ajitsoman-X542BA kernel: ata2: hard resetting link


Jun 13 00:01:53 ajitsoman-X542BA gdm3[840]: GLib: g_variant_new_string: assertion 'string != NULL' failed

Jun 13 00:01:53 ajitsoman-X542BA gdm3[840]: GLib: g_hash_table_find: assertion 'version == hash_table->version' failed

答案1

2018 年 6 月 14 日更新

基于此 ArchLinux论坛主题看来您需要添加:

amdgpu.dc=0

到你的/etc/default/grubLINUX 行之后quiet splash。然后运行sudo update-grub


作为 Ubuntu 18.04 的新安装,您是幸运者之一,可以使用它journalctl查看上次启动(锁定)。使用:

journalctl -b-1

然后按下End键跳转到 EOF(文件结束)。在我上次成功的启动中,它显示:

Jun 10 16:18:51 alien systemd[1]: Unmounting /mnt/d...
Jun 10 16:18:51 alien systemd[1]: Unmounted /run/user/1000.
Jun 10 16:18:51 alien systemd[1]: Unmounted /media/rick/Ubuntu 18.04 LTS amd64.
Jun 10 16:18:51 alien systemd[1]: Unmounted /boot/efi.
Jun 10 16:18:51 alien ntfs-3g[648]: Unmounting /dev/nvme0n1p8 (Shared_WSL+Linux)
Jun 10 16:18:51 alien ntfs-3g[648]: Permissions cache : 21 writes, 4033288 reads, 99.9% hits
Jun 10 16:18:51 alien systemd[1]: Unmounted /media/rick/casper-rw.
Jun 10 16:18:51 alien systemd[1]: Unmounted /mnt/e.
Jun 10 16:18:51 alien ntfs-3g[736]: Unmounting /dev/sda3 (HGST_Win10)
Jun 10 16:18:51 alien ntfs-3g[736]: Permissions cache : 754 writes, 4108560 reads, 99.9% hits
Jun 10 16:18:51 alien ntfs-3g[637]: Unmounting /dev/nvme0n1p4 (NVMe_Win10)
Jun 10 16:18:51 alien ntfs-3g[637]: Permissions cache : 987 writes, 4983239 reads, 99.9% hits
Jun 10 16:18:51 alien systemd[1]: Unmounted /mnt/d.
Jun 10 16:18:51 alien systemd[1]: Unmounted /mnt/c.
Jun 10 16:18:51 alien systemd[1]: Reached target Unmount All Filesystems.
Jun 10 16:18:51 alien systemd[1]: Stopped target Local File Systems (Pre).
Jun 10 16:18:51 alien systemd[1]: Stopped Remount Root and Kernel File Systems.
Jun 10 16:18:51 alien systemd[1]: Stopped Create Static Device Nodes in /dev.
Jun 10 16:18:51 alien systemd[1]: Reached target Shutdown.
Jun 10 16:18:51 alien systemd[1]: Reached target Final Step.
Jun 10 16:18:51 alien systemd[1]: dev-disk-by\x2dpartlabel-Basic\x5cx20data\x5cx20partition.device: Dev dev-
Jun 10 16:18:51 alien systemd[1]: Received SIGRTMIN+20 from PID 18665 (plymouthd).
Jun 10 16:18:51 alien systemd[1]: Started Show Plymouth Reboot Screen.
Jun 10 16:18:51 alien systemd[1]: Starting Reboot...
Jun 10 16:18:51 alien systemd[1]: Shutting down.
Jun 10 16:18:51 alien kernel: systemd-shutdow: 36 output lines suppressed due to ratelimiting
Jun 10 16:18:51 alien systemd-shutdown[1]: Sending SIGTERM to remaining processes...
Jun 10 16:18:51 alien dnsmasq[1393]: exiting on receipt of SIGTERM
Jun 10 16:18:51 alien systemd-journald[288]: Journal stopped
lines 46804-46832/46832 (END)

您需要在其中查找错误消息。

您可能必须使用Page Up密钥才能看到它们。

当您找到了所寻找的内容(或放弃寻找)时,按Q退出。

如果过热导致关机,您可以安装 Intel Powerclamp:防止 CPU 过热

此外,lm-sensors您可以使用以下命令直接从命令行获取所有热区域的温度读数:

$ paste <(cat /sys/class/thermal/thermal_zone*/type) <(cat /sys/class/thermal/thermal_zone*/temp) | column -s $'\t' -t | sed 's/\(.\)..$/.\1°C/'

INT3400 Thermal  20.0°C
SEN1             44.0°C
SEN2             52.0°C
SEN3             64.0°C
SEN4             59.0°C
B0D4             73.0°C
pch_skylake      76.5°C
x86_pkg_temp     73.0°C

以摄氏度报告并删除最后三个零。

答案2

除了使用 amdgpu.dc=0 内核选项的解决方案外,升级到基于 linux 4.18 的 ubuntu 18.10 内核已修复此问题,并且不再需要在内核启动中使用此 amdgpu.dc=0 参数来使图形正常工作。(AMD Stoney 硬件)

答案3

内核已经安装好了,对吗?;-)

确定你的内核版本:

uname-ar

然后搜索合适的 kernel-headers 包,kernel-headers 版本要一致,然后安装即可。

您也可以在终端中输入:

sudo apt-get 安装 linux-headers-$(uname -r)

重启

重启后,Linux 崩溃的次数会减少。

相关内容