Docker 上的 Nvidia 渲染

Docker 上的 Nvidia 渲染

我有一台 ubuntu 20.04 服务器,其中 nvidia 驱动程序已启动并正在运行。该服务器是无头的。如果我nvidia-smi在主机上运行,​​我会得到

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.38       Driver Version: 455.38       CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  TITAN X (Pascal)    Off  | 00000000:01:00.0 Off |                  N/A |
| 23%   35C    P8    16W / 250W |     51MiB / 12192MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A       963      G   /usr/lib/xorg/Xorg                 49MiB |
+-----------------------------------------------------------------------------+

如果我运行,glxgears -display 0我可以通过 nvidia-smi 确认 GPU 正在运行。太棒了!现在我想在 docker 机器上做同样的事情!作为示例,我将使用标准 nvidia-docker 映像:

sudo docker run -v /tmp/.X11-unix:/tmp/.X11-unix -e DISPLAY=:0 --rm --gpus all --entrypoint /bin/bash -it nvidia/cuda:11.0-base

在docker容器中,我得到glxgears: apt update apt install mesa-utils -y

但是,这里的 nvidia-smi 看起来不太好:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.38       Driver Version: 455.38       CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  TITAN X (Pascal)    Off  | 00000000:01:00.0 Off |                  N/A |
| 23%   34C    P8    16W / 250W |     51MiB / 12192MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

但事实上却glxgears不起作用:

root@75776a0b57b1:/# glxgears -display :0
libGL error: No matching fbConfigs or visuals found
libGL error: failed to load driver: swrast
X Error of failed request:  GLXBadContext
  Major opcode of failed request:  151 (GLX)
  Minor opcode of failed request:  6 (X_GLXIsDirect)
  Serial number of failed request:  43
  Current serial number in output stream:  42

我究竟做错了什么?

答案1

您将需要使用nvidia/cudagl:11.0-base图像而不是nvidia/cuda:11.0-base利用 NVIDIA 的 opengl 支持库。

相关内容