使用 Docker 的解决方案，但没有镜像

Question

--require=cuda>=10.0 brand=tesla,driver>=384,driver<385

这表明这是驱动程序问题。我不太明白为什么。

使用 Docker 的解决方案，但没有镜像

最简单的解决方案是使用不同的 Azure 映像：两者都NVIDIA GPU Cloud Image将NVIDIA GPU Cloud Image for Deep Learning and HPC运行该 Docker 映像。

使用您的图像的解决方案，但不使用 Docker

或者，您仍然可以使用Data Science Virtual Machine for Linux (Ubuntu)Docker，但无需容器化。例如，Conda 可以设置一个环境（其中yes |对安装软件包的提示的初始回答是）：

yes | conda create -n TF python=2.7 scipy==1.0.0 tensorflow-gpu==1.8 Keras==2.1.3 pandas==0.22.0 numpy==1.14.0 matplotlib scikit-learn
export PATH=$PATH:/data/anaconda/envs/TF/bin
export PATH=$PATH:/data/anaconda/envs/py35/bin

这些命令从 Tensorflow 中提取官方模型：

git clone https://github.com/tensorflow/models.git
export PYTHONPATH="$PYTHONPATH:./models"

第一次调用nvidia-smi显示 GPU 没有正在运行的进程：

$ nvidia-smi
Mon Jan 21 16:26:02 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.44                 Driver Version: 396.44                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla M60           On   | 0000DB4D:00:00.0 Off |                  Off |
| N/A   39C    P8    14W / 150W |      0MiB /  8129MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

当你让官方 MNIST 模型在后台运行一段时间时，你将看到一个使用 GPU 的进程：

$ python models/official/mnist/mnist.py &
[1] 25967
$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.44                 Driver Version: 396.44                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla M60           On   | 0000DB4D:00:00.0 Off |                  Off |
| N/A   37C    P0    77W / 150W |   7851MiB /  8129MiB |     93%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     26077      C   python                                      7840MiB |
+-----------------------------------------------------------------------------+

Answer 1

此 NVIDIA GitHub 问题错误消息的这一部分：

--require=cuda>=10.0 brand=tesla,driver>=384,driver<385

这表明这是驱动程序问题。我不太明白为什么。

使用 Docker 的解决方案，但没有镜像

最简单的解决方案是使用不同的 Azure 映像：两者都NVIDIA GPU Cloud Image将NVIDIA GPU Cloud Image for Deep Learning and HPC运行该 Docker 映像。

使用您的图像的解决方案，但不使用 Docker

或者，您仍然可以使用Data Science Virtual Machine for Linux (Ubuntu)Docker，但无需容器化。例如，Conda 可以设置一个环境（其中yes |对安装软件包的提示的初始回答是）：

yes | conda create -n TF python=2.7 scipy==1.0.0 tensorflow-gpu==1.8 Keras==2.1.3 pandas==0.22.0 numpy==1.14.0 matplotlib scikit-learn
export PATH=$PATH:/data/anaconda/envs/TF/bin
export PATH=$PATH:/data/anaconda/envs/py35/bin

这些命令从 Tensorflow 中提取官方模型：

git clone https://github.com/tensorflow/models.git
export PYTHONPATH="$PYTHONPATH:./models"

第一次调用nvidia-smi显示 GPU 没有正在运行的进程：

$ nvidia-smi
Mon Jan 21 16:26:02 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.44                 Driver Version: 396.44                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla M60           On   | 0000DB4D:00:00.0 Off |                  Off |
| N/A   39C    P8    14W / 150W |      0MiB /  8129MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

当你让官方 MNIST 模型在后台运行一段时间时，你将看到一个使用 GPU 的进程：

$ python models/official/mnist/mnist.py &
[1] 25967
$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.44                 Driver Version: 396.44                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla M60           On   | 0000DB4D:00:00.0 Off |                  Off |
| N/A   37C    P0    77W / 150W |   7851MiB /  8129MiB |     93%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     26077      C   python                                      7840MiB |
+-----------------------------------------------------------------------------+

使用 Docker 的解决方案，但没有镜像

答案1

使用 Docker 的解决方案，但没有镜像

使用您的图像的解决方案，但不使用 Docker

相关内容