在proxmox虚拟容器中安装cuda驱动

在proxmox虚拟容器中安装cuda驱动

我已经2.6.32-17-pve在主机上安装了:

    02:00.0 VGA compatible controller: NVIDIA Corporation Device 11c6 (rev a1) (prog-if 00 [VGA controller])
    Subsystem: Giga-byte Technology Device 3557
    Flags: fast devsel, IRQ 16
    Memory at fa000000 (32-bit, non-prefetchable) [size=16M]
    Memory at c0000000 (64-bit, prefetchable) [size=256M]
    Memory at d0000000 (64-bit, prefetchable) [size=32M]
    I/O ports at e000 [size=128]
    Expansion ROM at fb000000 [disabled] [size=512K]
    Capabilities: [60] Power Management version 3
    Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
    Capabilities: [78] Express Endpoint, MSI 00
    Capabilities: [b4] Vendor Specific Information: Len=14 <?>
    Capabilities: [100] Virtual Channel
    Capabilities: [128] Power Budgeting <?>
    Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
    Capabilities: [900] #19

创建 ubuntu 12.04 虚拟容器后,我尝试按如下方式安装 cuda-driver:

  1. vzctl set 100 --pci_add 02:00.0在主机上,并lspci -v在 vz 中打印:

    02:00.0 VGA compatible controller: NVIDIA Corporation Device 11c6 (rev a1) (prog-if 00 [VGA controller])
    Subsystem: Giga-byte Technology Device 3557
    Flags: fast devsel, IRQ 16
    Memory at fa000000 (32-bit, non-prefetchable) [size=16M]
    Memory at c0000000 (64-bit, prefetchable) [size=256M]
    Memory at d0000000 (64-bit, prefetchable) [size=32M]
    I/O ports at e000 [size=128]
    Expansion ROM at fb000000 [disabled] [size=512K]
    Capabilities: <access denied>
    Kernel modules: nouveau, nvidiafb
    
  2. 我安装了推荐的安装 cuda 的软件包,并将 gcc 版本更改为 4.4

  3. 为了安装 pve 的内核头,我添加了 pve 的 sources.list:deb http://download.proxmox.com/debian squeeze pve并运行sudo apt-get install pve-headers-2.6.32-17-pve
  4. 现在我正在尝试安装驱动程序,但是:

    ERROR: Unable to load the kernel module 'nvidia.ko'.  This happens most
    frequently when this kernel module was built against the wrong or
    improperly configured kernel sources, with a version of gcc that
    differs from the one used to build the target kernel, or if a driver
    such as rivafb, nvidiafb, or nouveau is present and prevents the
    NVIDIA kernel module from obtaining ownership of the NVIDIA graphics
    device(s), or NVIDIA GPU installed in this system is not supported
    by this NVIDIA Linux graphics driver release.
    

我猜原因可能是 vz 没有获得图形设备的所有权,但我不确定,也不知道如何修复它。有人能给我一些建议吗?

谢谢

答案1

禁止从 OpenVZ 容器内部加载内核模块:这是一种安全措施,因为内核在主机和所有容器之间共享。

什么可能工作:在主机上加载所需的内核驱动程序,使用 --devnodes 添加对任何相关设备的访问权限,并以防万一使用 --capability 启用所有功能

相关内容