R9-290/290X Hawaii 系列卡不适用于 ubuntu 18 中的 Linux 内核 4.19.x 和 4.20.x。最后一个完全正常运行的内核版本是 4.18.20,它包含最新的稳定 mesa 驱动程序和内核中的 amdgpu drm 驱动程序。
4.19.x 和 4.20.x 导致启动失败或根本无法启动(grub 后黑屏,无 tty)。
根据 grub linux 命令行参数,我能够启动不稳定的桌面,以进一步调查和收集状态证据。这里是...
内核和命令行
核心:
Linux version 4.20.0-042000-generic (kernel@tangerine) (gcc version 8.2.0 (Ubuntu 8.2.0-12ubuntu1)) #201812232030 SMP Mon Dec 24 01:32:58 UTC 2018
内核命令行:
BOOT_IMAGE=/vmlinuz-4.20.0-042000-generic root=/dev/mapper/ubuntu--vg-root ro quiet splash radeon.si_support=0 radeon.cik_support=0 amdgpu.si_support=1 amdgpu.cik_support=1 amdgpu.dc=1
lspci -v
适用于 Linux 内核 4.20.0
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii XT / Grenada XT [Radeon R9 290X/390X] (prog-if 00 [VGA controller])
Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii XT / Grenada XT [Radeon R9 290X/390X]
Flags: fast devsel, IRQ 16
Memory at d0000000 (64-bit, prefetchable) [size=256M]
Memory at ef800000 (64-bit, prefetchable) [size=8M]
I/O ports at ae00 [size=256]
Memory at fb980000 (32-bit, non-prefetchable) [size=256K]
[virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
Capabilities: [48] Vendor Specific Information: Len=08 <?>
Capabilities: [50] Power Management version 3
Capabilities: [58] Express Legacy Endpoint, MSI 00
Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities: [150] Advanced Error Reporting
Capabilities: [270] #19
Capabilities: [2b0] Address Translation Service (ATS)
Capabilities: [2c0] Page Request Interface (PRI)
Capabilities: [2d0] Process Address Space ID (PASID)
Kernel modules: radeon, amdgpu
01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii HDMI Audio [Radeon R9 290/290X / 390/390X]
Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii HDMI Audio [Radeon R9 290/290X / 390/390X]
Flags: bus master, fast devsel, latency 0, IRQ 32
Memory at fb9fc000 (64-bit, non-prefetchable) [size=16K]
Capabilities: [48] Vendor Specific Information: Len=08 <?>
Capabilities: [50] Power Management version 3
Capabilities: [58] Express Legacy Endpoint, MSI 00
Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities: [150] Advanced Error Reporting
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel
最后启动内核 4.20 只有一个显示器工作。
其他显示强制镜像。其他 GPU 端口不工作。journalctl -b | grep drm
输出:
[drm] amdgpu kernel modesetting enabled.
[drm] initializing kernel modesetting (HAWAII 0x1002:0x67B0 0x1002:0x0B00 0x00).
[drm] register mmio base: 0xFB980000
[drm] register mmio size: 262144
[drm] add ip block number 0 <cik_common>
[drm] add ip block number 1 <gmc_v7_0>
[drm] add ip block number 2 <cik_ih>
[drm] add ip block number 3 <gfx_v7_0>
[drm] add ip block number 4 <cik_sdma>
[drm] add ip block number 5 <powerplay>
[drm] add ip block number 6 <dm>
[drm] add ip block number 7 <uvd_v4_2>
[drm] add ip block number 8 <vce_v2_0>
[drm] vm size is 128 GB, 2 levels, block size is 10-bit, fragment size is 9-bit
[drm:gmc_v7_0_sw_init [amdgpu]] *ERROR* Failed to load mc firmware!
[drm:amdgpu_device_init.cold.31 [amdgpu]] *ERROR* sw_init of IP block <gmc_v7_0> failed -2
[drm] amdgpu: finishing device.
最后一次成功启动 Linux 内核 4.18.20。
所有显示器均正常工作。一切正常。以下是journalctl | grep drm
输出,供参考:
[drm] amdgpu kernel modesetting enabled.
fb: switching to amdgpudrmfb from VESA VGA
[drm] initializing kernel modesetting (HAWAII 0x1002:0x67B0 0x1002:0x0B00 0x00).
[drm] register mmio base: 0xFB980000
[drm] register mmio size: 262144
[drm] probing gen 2 caps for device 8086:151 = 261ac83/e
[drm] probing mlw for device 8086:151 = 261ac83
[drm] add ip block number 0 <cik_common>
[drm] add ip block number 1 <gmc_v7_0>
[drm] add ip block number 2 <cik_ih>
[drm] add ip block number 3 <ci_dpm>
[drm] add ip block number 4 <dm>
[drm] add ip block number 5 <gfx_v7_0>
[drm] add ip block number 6 <cik_sdma>
[drm] add ip block number 7 <uvd_v4_2>
[drm] add ip block number 8 <vce_v2_0>
[drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment size is 9-bit
[drm] Detected VRAM RAM=4096M, BAR=256M
[drm] RAM width 512bits GDDR5
[drm] amdgpu: 4096M of VRAM memory ready
[drm] amdgpu: 4096M of GTT memory ready.
[drm] GART: num cpu pages 262144, num gpu pages 262144
[drm] PCIE GART of 1024M enabled (table at 0x000000F4007E9000).
[drm] Internal thermal controller with fan control
[drm] Invalid PCC GPIO: 13!
[drm] amdgpu: dpm initialized
[drm] Found UVD firmware Version: 1.64 Family ID: 9
[drm] Found VCE firmware Version: 50.10 Binary ID: 2
[drm] PCIE gen 3 link speeds already enabled
[drm] dce110_link_encoder_construct: Failed to get encoder_cap_info from VBIOS with error code 4!
[drm] dce110_link_encoder_construct: Failed to get encoder_cap_info from VBIOS with error code 4!
[drm] dce110_link_encoder_construct: Failed to get encoder_cap_info from VBIOS with error code 4!
[drm] Display Core initialized with v3.1.44!
[drm] SADs count is: -524, don't need to read it
[drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[drm] Driver supports precise vblank timestamp query.
[drm] UVD initialized successfully.
[drm] VCE initialized successfully.
[drm] fb mappable at 0xD0BD0000
[drm] vram apper at 0xD0000000
[drm] size 8294400
[drm] fb depth is 24
[drm] pitch is 7680
fbcon: amdgpudrmfb (fb0) is primary device
[drm] dce_get_required_clocks_state: clocks unsupported disp_clk 681000 pix_clk 148500
amdgpu 0000:01:00.0: fb0: amdgpudrmfb frame buffer device
[drm] Initialized amdgpu 3.26.0 20150101 for 0000:01:00.0 on minor 0
答案1
感谢 Alex Deucher(Linux 的 AMD 驱动程序开发人员),他帮助我排除故障并开始解决我自己的问题。
该问题和解决方法首先记录在该错误跟踪器中......https://bugs.freedesktop.org/show_bug.cgi?id=108781
下面我将详细介绍的解决方案不太可能在 Linux 内核 4.19.x 和 4.20.x 中得到修复。我希望它能在未来的内核中得到解决。如果您想要简单的东西,请坚持使用 4.18.20 或更低版本。如果您想利用 4.19.x/4.20.x 内核中的任何修复,那么您可以尝试下面的方法,这对我有用……
解决方法:
- 从 Linux 命令行中完全删除了 amdgpu.dpm=x 并更新了 grub。‘0’ 或 ‘1’ 将不起作用,无法启动,甚至无法启动 tty
- 将 /lib/firmware/radeon/* 复制到 /lib/firmware/amdgpu/
- 备份 /lib/firmware/radeon/* 的所有内容
- 已删除 /lib/firmware/radeon/
- 确保 4.20.0 的 initrd 位于 /boot 位置
~$ sudo update-initramfs -u
- 通过确认功能/工作内核的内容,
lsinitramfs /boot/initrd.img-<YOUR-KERNEL>-generic | grep hawaii
即使我们已经删除它,它仍然需要指向 /lib/firmware/radeon。 - 确认无法运行的新内核的内容。对我来说,内核是
lsinitramfs /boot/initrd.img-4.20.0-042000-generic | grep hawaii
。它应该只包含 /lib/firmware/amdgpu/* - 从备份中恢复 /lib/firmware/radeon/*。这样您就可以在必要时恢复到以前的内核版本。
- 重启/重新启动
- [可选-重要] 如果一切正常(对我来说是这样的),那么为了避免与未来的内核发生冲突,请删除 /lib/firmware/radeon,然后删除现在正在运行的新内核之前的所有旧内核。如果您不这样做并安装新内核,然后运行命令 update-initramfs,那么您将在 initrd 中为未来的内核获得重复的路径。不确定发生这种情况时会发生什么,我没有测试以找出原因,因为没有时间。