如何获取 OpenCL 可用的 GPU 内存大小?

如何获取 OpenCL 可用的 GPU 内存大小?

如何获取 GPU 上的内存大小,该大小可用于使用 OpenCL 进行计算的程序,例如暗桌

我知道lspci其中提供了一些一般信息,但不是我正在寻找的信息。

$ sudo lspci -v -s 01:00.0
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Curacao XT [Radeon R9 270X] (prog-if 00 [VGA controller])
    Subsystem: Gigabyte Technology Co., Ltd Device 227d
    Flags: bus master, fast devsel, latency 0, IRQ 49
    Memory at d0000000 (64-bit, prefetchable) [size=256M]
    Memory at fe780000 (64-bit, non-prefetchable) [size=256K]
    I/O ports at c000 [size=256]
    Expansion ROM at fe7c0000 [disabled] [size=128K]
    Capabilities: [48] Vendor Specific Information: Len=08 <?>
    Capabilities: [50] Power Management version 3
    Capabilities: [58] Express Legacy Endpoint, MSI 00
    Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
    Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
    Capabilities: [150] Advanced Error Reporting
    Capabilities: [270] #19
    Capabilities: [2b0] Address Translation Service (ATS)
    Capabilities: [2c0] #13
    Capabilities: [2d0] #1b
    Kernel driver in use: fglrx_pci

它显示 256MB,这是不现实的,而且太少了(GPU 具有 4GB 总内存),因为 darktable 可以与 OpenCL 配合使用,并且至少需要 768MB。

然后是clinfo(clinfo 包),它给出以下内容:

Number of platforms:                 1
  Platform Profile:              FULL_PROFILE
  Platform Version:              OpenCL 1.2 AMD-APP (1411.4)
  Platform Name:                 AMD Accelerated Parallel Processing
  Platform Vendor:               Advanced Micro Devices, Inc.
  Platform Extensions:               cl_khr_icd cl_amd_event_callback cl_amd_offline_devices cl_amd_hsa 


  Platform Name:                 AMD Accelerated Parallel Processing
Number of devices:               2
  Device Type:                   CL_DEVICE_TYPE_GPU
  Device ID:                     4098
  Board name:                    AMD Radeon R9 200 Series
  Device Topology:               PCI[ B#1, D#0, F#0 ]
  Max compute units:                 20
  Max work items dimensions:             3
    Max work items[0]:               256
    Max work items[1]:               256
    Max work items[2]:               256
  Max work group size:               256
  Preferred vector width char:           4
  Preferred vector width short:          2
  Preferred vector width int:            1
  Preferred vector width long:           1
  Preferred vector width float:          1
  Preferred vector width double:         1
  Native vector width char:          4
  Native vector width short:             2
  Native vector width int:           1
  Native vector width long:          1
  Native vector width float:             1
  Native vector width double:            1
  Max clock frequency:               1100Mhz
  Address bits:                  32
  Max memory allocation:             1073741824
  Image support:                 Yes
  Max number of images read arguments:       128
  Max number of images write arguments:      8
  Max image 2D width:                16384
  Max image 2D height:               16384
  Max image 3D width:                2048
  Max image 3D height:               2048
  Max image 3D depth:                2048
  Max samplers within kernel:            16
  Max size of kernel argument:           1024
  Alignment (bits) of base address:      2048
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                     No
    Quiet NaNs:                  Yes
    Round to nearest even:           Yes
    Round to zero:               Yes
    Round to +ve and infinity:           Yes
    IEEE754-2008 fused multiply-add:         Yes
  Cache type:                    Read/Write
  Cache line size:               64
  Cache size:                    16384
  Global memory size:                3221225472
  Constant buffer size:              65536
  Max number of constant args:           8
  Local memory type:                 Scratchpad
  Local memory size:                 32768
  Kernel Preferred work group size multiple:     64
  Error correction support:          0
  Unified memory for Host and Device:        0
  Profiling timer resolution:            1
  Device endianess:              Little
  Available:                     Yes
  Compiler available:                Yes
  Execution capabilities:                
    Execute OpenCL kernels:          Yes
    Execute native function:             No
  Queue properties:              
    Out-of-Order:                No
    Profiling :                  Yes
  Platform ID:                   0x00007fce5d932500
  Name:                      Pitcairn
  Vendor:                    Advanced Micro Devices, Inc.
  Device OpenCL C version:           OpenCL C 1.2 
  Driver version:                1411.4 (VM)
  Profile:                   FULL_PROFILE
  Version:                   OpenCL 1.2 AMD-APP (1411.4)
  Extensions:                    cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir 


  Device Type:                   CL_DEVICE_TYPE_CPU
  Device ID:                     4098
  Board name:                    
  Max compute units:                 2
  Max work items dimensions:             3
    Max work items[0]:               1024
    Max work items[1]:               1024
    Max work items[2]:               1024
  Max work group size:               1024
  Preferred vector width char:           16
  Preferred vector width short:          8
  Preferred vector width int:            4
  Preferred vector width long:           2
  Preferred vector width float:          4
  Preferred vector width double:         2
  Native vector width char:          16
  Native vector width short:             8
  Native vector width int:           4
  Native vector width long:          2
  Native vector width float:             4
  Native vector width double:            2
  Max clock frequency:               2664Mhz
  Address bits:                  64
  Max memory allocation:             2147483648
  Image support:                 Yes
  Max number of images read arguments:       128
  Max number of images write arguments:      8
  Max image 2D width:                8192
  Max image 2D height:               8192
  Max image 3D width:                2048
  Max image 3D height:               2048
  Max image 3D depth:                2048
  Max samplers within kernel:            16
  Max size of kernel argument:           4096
  Alignment (bits) of base address:      1024
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                     Yes
    Quiet NaNs:                  Yes
    Round to nearest even:           Yes
    Round to zero:               Yes
    Round to +ve and infinity:           Yes
    IEEE754-2008 fused multiply-add:         Yes
  Cache type:                    Read/Write
  Cache line size:               64
  Cache size:                    32768
  Global memory size:                6258630656
  Constant buffer size:              65536
  Max number of constant args:           8
  Local memory type:                 Global
  Local memory size:                 32768
  Kernel Preferred work group size multiple:     1
  Error correction support:          0
  Unified memory for Host and Device:        1
  Profiling timer resolution:            1
  Device endianess:              Little
  Available:                     Yes
  Compiler available:                Yes
  Execution capabilities:                
    Execute OpenCL kernels:          Yes
    Execute native function:             Yes
  Queue properties:              
    Out-of-Order:                No
    Profiling :                  Yes
  Platform ID:                   0x00007fce5d932500
  Name:                      Intel(R) Core(TM)2 Duo CPU     E6750  @ 2.66GHz
  Vendor:                    GenuineIntel
  Device OpenCL C version:           OpenCL C 1.2 
  Driver version:                1411.4 (sse2)
  Profile:                   FULL_PROFILE
  Version:                   OpenCL 1.2 AMD-APP (1411.4)
  Extensions:                    cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_spir cl_amd_svm 

有一些值的名称中带有内存,但其中哪个是可用内存总量?在哪个单位?全局内存大小为 512MB 位,最大内存分配单位为 256MB。本地内存大小可能是 4GB (MB)。 clinfo 没有联机帮助页或内置帮助-h

如何正确解释所有这些值以获得可用 GPU 内存量?还有其他我可以使用的程序吗?

另外:为什么还没有 OpenCL 的标签?

答案1

您现在可能已经有了答案,但输出clinfo是以字节为单位而不是位。因此,全局内存大小约为 3 GB,而不是 512 MB。

答案2

所以您需要一个通用的 Linux 全球通用脚本等实用程序来获取此信息?恐怕获取如此具体的信息并不容易。我不熟悉 clinfo 包,我猜您必须使用 sudo apt-get install 它。

因为如果它不必是通用的,您可以编写一个 OpenCL 应用程序来获取此信息。我相信OpenCL应该有一些方法来给你这样的信息,这只是编写简单的应用程序的问题,它初始化OpenCL上下文和printfs GPU_MEMORY(或类似的东西)到控制台。

至于 OpenCL 标签,我认为你会更幸运https://stackoverflow.com/questions/tagged/opencl

相关内容