docker:守护进程的错误响应:无法选择具有以下功能的设备驱动程序“”:[[gpu]]

docker:守护进程的错误响应:无法选择具有以下功能的设备驱动程序“”:[[gpu]]

尽管有 4 个 GPU,每个 GPU 都有约 20GB vRAM,但 docker 无法使用以下命令运行。我该如何解决这个问题?

[20:08:28] jalal@echo:~/research/code$ docker run --shm-size 2GB -it --gpus all docurdt/heal
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
ERRO[0000] error waiting for container: context canceled 


[20:08:20] jalal@echo:~/research/code$ nvidia-smi
Fri Apr  1 20:08:28 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |
| 31%   41C    P8    23W / 350W |    301MiB / 24576MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:21:00.0 Off |                  N/A |
| 30%   39C    P8    18W / 350W |     14MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA GeForce ...  Off  | 00000000:4A:00.0 Off |                  N/A |
| 30%   32C    P8    23W / 350W |     14MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA GeForce ...  Off  | 00000000:4B:00.0 Off |                  N/A |
| 30%   40C    P8    18W / 350W |     14MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

我也有:

$ uname -a
Linux echo 5.4.0-99-generic #112-Ubuntu SMP Thu Feb 3 13:50:55 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
$ lsb_release -a
LSB Version:    core-11.1.0ubuntu2-noarch:security-11.1.0ubuntu2-noarch
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.3 LTS
Release:    20.04
Codename:   focal

$ docker -v
Docker version 20.10.7, build 20.10.7-0ubuntu5~20.04.2

还,

$ df -h | grep /dev/shm
tmpfs                                126G  199M  126G   1% /dev/shm

在此处输入图片描述

$  cat /boot/config-$(uname -r) | grep -i seccomp
CONFIG_SECCOMP=y
CONFIG_HAVE_ARCH_SECCOMP_FILTER=y
CONFIG_SECCOMP_FILTER=y

[20:33:17] (dpcc) jalal@echo:~$ lspci -vv | grep -i nvidia
01:00.0 VGA compatible controller: NVIDIA Corporation Device 2204 (rev a1) (prog-if 00 [VGA controller])
    Kernel driver in use: nvidia
    Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
01:00.1 Audio device: NVIDIA Corporation Device 1aef (rev a1)
21:00.0 VGA compatible controller: NVIDIA Corporation Device 2204 (rev a1) (prog-if 00 [VGA controller])
    Kernel driver in use: nvidia
    Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
21:00.1 Audio device: NVIDIA Corporation Device 1aef (rev a1)
4a:00.0 VGA compatible controller: NVIDIA Corporation Device 2204 (rev a1) (prog-if 00 [VGA controller])
    Kernel driver in use: nvidia
    Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
4a:00.1 Audio device: NVIDIA Corporation Device 1aef (rev a1)
4b:00.0 VGA compatible controller: NVIDIA Corporation Device 2204 (rev a1) (prog-if 00 [VGA controller])
    Kernel driver in use: nvidia
    Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
4b:00.1 Audio device: NVIDIA Corporation Device 1aef (rev a1)

答案1

  1. $ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) && curl -s -L https://nvidia.github.io/libnvidia-container/gpgkey | sudo apt-key add - && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

  2. $ sudo apt-get update

  3. $ sudo apt-get install -y nvidia-docker2

  4. $ sudo systemctl restart docker

  5. $ docker run --shm-size 2GB -it --gpus all docurdt/heal (base) root@9f66ed7b7c1b:/Workspace#

非常grym感谢关联

答案2

以下三个命令解决了我的问题

sudo apt install -y nvidia-docker2 
sudo systemctl daemon-reload
sudo systemctl restart docker

这是来源 https://forums.developer.nvidia.com/t/could-not-select-device-driver-with-capabilities-gpu/80200

相关内容