我有一台 ubuntu 20.04 服务器,其中 nvidia 驱动程序已启动并正在运行。该服务器是无头的。如果我nvidia-smi
在主机上运行,我会得到
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.38 Driver Version: 455.38 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 TITAN X (Pascal) Off | 00000000:01:00.0 Off | N/A |
| 23% 35C P8 16W / 250W | 51MiB / 12192MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 963 G /usr/lib/xorg/Xorg 49MiB |
+-----------------------------------------------------------------------------+
如果我运行,glxgears -display 0
我可以通过 nvidia-smi 确认 GPU 正在运行。太棒了!现在我想在 docker 机器上做同样的事情!作为示例,我将使用标准 nvidia-docker 映像:
sudo docker run -v /tmp/.X11-unix:/tmp/.X11-unix -e DISPLAY=:0 --rm --gpus all --entrypoint /bin/bash -it nvidia/cuda:11.0-base
在docker容器中,我得到glxgears:
apt update apt install mesa-utils -y
但是,这里的 nvidia-smi 看起来不太好:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.38 Driver Version: 455.38 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 TITAN X (Pascal) Off | 00000000:01:00.0 Off | N/A |
| 23% 34C P8 16W / 250W | 51MiB / 12192MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
但事实上却glxgears
不起作用:
root@75776a0b57b1:/# glxgears -display :0
libGL error: No matching fbConfigs or visuals found
libGL error: failed to load driver: swrast
X Error of failed request: GLXBadContext
Major opcode of failed request: 151 (GLX)
Minor opcode of failed request: 6 (X_GLXIsDirect)
Serial number of failed request: 43
Current serial number in output stream: 42
我究竟做错了什么?
答案1
您将需要使用nvidia/cudagl:11.0-base
图像而不是nvidia/cuda:11.0-base
利用 NVIDIA 的 opengl 支持库。