在 Ubuntu 19.10 上为 AMD GPU 安装 OpenCL 2.0 驱动程序？我有哪些选择？

Question

最后，我让它运行起来，这是来自clinfo（ROCm 二进制捆绑）的结果。

umber of platforms:              1
  Platform Profile:              FULL_PROFILE
  Platform Version:              OpenCL 2.1 AMD-APP (3004.6)
  Platform Name:                 AMD Accelerated Parallel Processing
  Platform Vendor:               Advanced Micro Devices, Inc.
  Platform Extensions:               cl_khr_icd cl_amd_event_callback cl_amd_offline_devices 


  Platform Name:                 AMD Accelerated Parallel Processing
Number of devices:               1
  Device Type:                   CL_DEVICE_TYPE_GPU
  Vendor ID:                     1002h
  Board name:                    Unknown AMD GPU
  Device Topology:               PCI[ B#7, D#0, F#0 ]
  Max compute units:                 8
  Max work items dimensions:             3
    Max work items[0]:               1024
    Max work items[1]:               1024
    Max work items[2]:               1024
  Max work group size:               256
  Preferred vector width char:           4
  Preferred vector width short:          2
  Preferred vector width int:            1
  Preferred vector width long:           1
  Preferred vector width float:          1
  Preferred vector width double:         1
  Native vector width char:          4
  Native vector width short:             2
  Native vector width int:           1
  Native vector width long:          1
  Native vector width float:             1
  Native vector width double:            1
  Max clock frequency:               1100Mhz
  Address bits:                  64
  Max memory allocation:             1825361100
  Image support:                 Yes
  Max number of images read arguments:       128
  Max number of images write arguments:      64
  Max image 2D width:                16384
  Max image 2D height:               16384
  Max image 3D width:                2048
  Max image 3D height:               2048
  Max image 3D depth:                2048
  Max samplers within kernel:            16
  Max size of kernel argument:           1024
  Alignment (bits) of base address:      2048
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                     No
    Quiet NaNs:                  Yes
    Round to nearest even:           Yes
    Round to zero:               Yes
    Round to +ve and infinity:           Yes
    IEEE754-2008 fused multiply-add:         Yes
  Cache type:                    Read/Write
  Cache line size:               64
  Cache size:                    16384
  Global memory size:                7301444400
  Constant buffer size:              1825361100
  Max number of constant args:           8
  Local memory type:                 Scratchpad
  Local memory size:                 65536
  Max pipe arguments:                16
  Max pipe active reservations:          16
  Max pipe packet size:              1825361100
  Max global variable size:          1642824960
  Max global variable preferred total size:  7301444400
  Max read/write image args:             64
  Max on device events:              1024
  Queue on device max size:          8388608
  Max on device queues:              1
  Queue on device preferred size:        262144
  SVM capabilities:              
    Coarse grain buffer:             Yes
    Fine grain buffer:               Yes
    Fine grain system:               No
    Atomics:                     No
  Preferred platform atomic alignment:       0
  Preferred global atomic alignment:         0
  Preferred local atomic alignment:      0
  Kernel Preferred work group size multiple:     64
  Error correction support:          0
  Unified memory for Host and Device:        1
  Profiling timer resolution:            1
  Device endianess:              Little
  Available:                     Yes
  Compiler available:                Yes
  Execution capabilities:                
    Execute OpenCL kernels:          Yes
    Execute native function:             No
  Queue on Host properties:              
    Out-of-Order:                No
    Profiling :                  Yes
  Queue on Device properties:                
    Out-of-Order:                Yes
    Profiling :                  Yes
  Platform ID:                   0x7f6233d65f10
  Name:                      gfx902
  Vendor:                    Advanced Micro Devices, Inc.
  Device OpenCL C version:           OpenCL C 2.0 
  Driver version:                3004.6 (PAL,HSAIL)
  Profile:                   FULL_PROFILE
  Version:                   OpenCL 2.0 AMD-APP (3004.6)
  Extensions:                    cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_khr_gl_depth_images cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_gl_event cl_khr_depth_images cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_amd_copy_buffer_p2p

让我们来聊聊。现在我的系统是

内核版本：5.3.18通用
图形 API：Mesa（最新版本即可）
OpenCL：2.1 通过 ROCm 3.1 或通过命令安装可能提供的任何版本apt。

重点是完成 ROCm 的安装后。你必须下载最新的AMD 驱动程序。截至我撰写本文时，版本为 19.50，具体链接为amdgpu-专业版-19.50-967956-ubuntu-18.04.tar.xz。即使您像我一样使用的是 19.10，也不要担心 Ubuntu 的发布版本，重要的可能是出于安全考虑保留内核 5.3.x，这将适合 ROCm 和 AMD 驱动程序。

然后使用/修改来自 tuxutku 的脚本这里。您可以注释掉下载远程文件的行，因为您现在自己下载它，然后在文件系统中对该文件进行离线操作。请仔细查看最后一个命令，它将结果文件复制到其中/。您甚至可以注释掉该行并自己手动执行以确保安全。

脚本的要点是，我们实际上并没有从 AMD 驱动程序包中安装任何东西，而只是取出共享库和一些配置文件，然后将其放入我们的系统中。ROCm 将与其交互。如果您已经安装了可以正常工作的 AMDGPU-PRO，则需要重命名共享库以避免冲突，因此这将隔离问题。

请记住，其内容/opt/OpenCL/vendors/amdocl64.icd为libamdocl64.so。我们已经将提取的.so文件复制到，/因此它将引用它而不是 ROCm 的捆绑.so文件。如果您引用 ROCm 的，它将不起作用，并且会大喊未找到设备的错误。

现在一切都完成了。您可以使用单独的二进制文件clinfo或 ROCm 捆绑的二进制文件来验证所有内容，以检查一切是否顺利。

最好有一些基于 OpenCL 的应用程序来测试它。我用的是Phoronix 测试套件即pts/juliagpu测试pts/luxmarkOpenCL 的能力。对于 Blender，它会检测并列出以下内容

选择第一个（未知）似乎是 Blender 的真正 GPU，当选择第二个时，场景的“GPU 计算”设置将变灰。

是的，您需要确保它没有变灰，以确保它确实是 GPU 渲染的。或者您可以radeontop在渲染场景时使用它来监视活动。

最后，虽然现在可以正常工作，Blender 也可以检测到我的 GPU。但是，使用 CPU 的性能比以前更差，单击渲染场景时加载会很滞后。这搅拌机问题表示它仅正式支持专有驱动程序。因此下一步将尝试纯粹使用 AMDGPU-PRO 并进行测试，但您必须付出努力才能在开源驱动程序和封闭驱动程序之间正确切换。

更新：

我总结了上述解决方案，并进行了一些改进，使其更容易实现视频。

Answer 1