如何使全新的 ubuntu 20.04 VM 中的 GPU 可用?

如何使全新的 ubuntu 20.04 VM 中的 GPU 可用?

我整天都在尝试让这个 (v100) GPU 在新的 ubuntu VM 上运行。我尝试安装驱动程序并重新启动,还清除/卸载与 nvidia 相关的所有内容,但这些似乎都不起作用。

我特别运行了这个:

apt update;
apt install build-essential;

sudo add-apt-repository ppa:graphics-drivers
sudo apt install ubuntu-drivers-common
ubuntu-drivers devices
sudo apt-get install nvidia-driver-460
sudo reboot now

然后有时似乎 nvidia-smi 正在工作(截至撰写此问题时它还没有工作,所以我无法复制粘贴它工作时所说的内容)但是当它不工作时它会说这样的话:

(synthesis) miranda9@miranda9:~$ nvidia-smi
Unable to determine the device handle for GPU 0000:00:06.0: Unknown Error

任何帮助都将受到赞赏。

请注意,我也无法访问虚拟机的 vmx 文件,因此这个问题和答案对我来说毫无用处/毫无意义:https://forums.developer.nvidia.com/t/nvidia-smi-reports-unable-to-determine-the-device-handle-for-gpu/46835

此外,我还尝试卸载 nivida 中的所有内容,然后重新安装:

sudo apt-get --purge remove "*nvidia*"
sudo /usr/bin/nvidia-uninstall

然后

apt update;
apt install build-essential;

sudo add-apt-repository ppa:graphics-drivers
sudo apt install ubuntu-drivers-common
ubuntu-drivers devices
sudo apt-get install nvidia-driver-460
sudo reboot now

但这似乎不起作用


更多信息以防有帮助:

(synthesis) miranda9@miranda9:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.2 LTS
Release:        20.04
Codename:       focal

还:

(synthesis) miranda9@miranda9:~$ python
Python 3.9.5 (default, Jun  4 2021, 12:28:51) 
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
/home/miranda9/miniconda3/envs/synthesis/lib/python3.9/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 101: invalid device ordinal (Triggered internally at  /opt/conda/conda-bld/pytorch_1623448238472/work/c10/cuda/CUDAFunctions.cpp:115.)
  return torch._C._cuda_getDeviceCount() > 0
False

根据评论的要求:

# lspci
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
00:01.2 USB controller: Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] (rev 01)
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 01)
00:02.0 VGA compatible controller: Cirrus Logic GD 5446
00:03.0 SCSI storage controller: XenSource, Inc. Xen Platform Device (rev 01)
00:05.0 System peripheral: XenSource, Inc. Citrix XenServer PCI Device for Windows Update (rev 01)
00:06.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe 16GB] (rev a1)

另一台虚拟机:

$ lspci
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
00:01.2 USB controller: Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] (rev 01)
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 01)
00:02.0 VGA compatible controller: Cirrus Logic GD 5446
00:03.0 SCSI storage controller: XenSource, Inc. Xen Platform Device (rev 01)
00:05.0 System peripheral: XenSource, Inc. Citrix XenServer PCI Device for Windows Update (rev 01)
00:06.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe 16GB] (rev a1)

我寻求帮助的资源:

答案1

虚拟机模仿图形卡,因此对​​于客户系统来说,主机系统上拥有的本机卡应该是透明的。虚拟机用于“共享”资源 - 而不是可以直接访问其硬件的真实系统。因此,在主机系统上安装 Nvidia 驱动程序是没有意义的。您可以通过检查虚拟机中的当前驱动程序来检查这一点:

inxi -G

(在终端中执行)将向您显示 VM/oracle 驱动程序,而不是您的本机卡。

通过调整和技巧可能会获得高性能的图形输出,但虚拟机并不适合这样的工作......

相关内容