未找到适用于 ubuntu 20.04 的 cuda 11 的 Cuda 库

未找到适用于 ubuntu 20.04 的 cuda 11 的 Cuda 库

我在 Azure 上使用 Nvidia VM 和 Ubuntu 20.04,我已经安装了 nvidia 和 cuda,但在运行我的程序时仍然显示未找到库

nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.103.01   Driver Version: 470.103.01   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000001:00:00.0 Off |                  Off |
| N/A   32C    P0    25W /  70W |      0MiB / 16127MiB |      6%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

这是我看到的多个 cuda 库的错误:

Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory
2022-05-02 05:33:53.131224: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory
2022-05-02 05:33:53.131235: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Could not load dynamic library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory
2022-05-02 05:33:55.738515: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1592] Cannot dlopen some GPU libraries. 

我不确定这个错误是否是由于 GPG 密钥错误旋转还是其他原因,因为我也尝试过单独安装驱动程序,但它一直给出找不到驱动程序的错误。

我也尝试过:

sudo apt-get -y install cuda

Reading package lists... Done

Building dependency tree

Reading state information... Done

Some packages could not be installed. This may mean that you have

requested an impossible situation or if you are using the unstable

distribution that some required packages have not yet been created

or been moved out of Incoming.

The following information may help to resolve the situation:

The following packages have unmet dependencies:

cuda : Depends: cuda-11-6 (>= 11.6.2) but it is not going to be installed

E: Unable to correct problems, you have held broken packages.

我正在使用 tensorflow-gpu-2.1.3,因为这是我的程序的要求。

我也在 nvidia 论坛上发布了此内容。

答案1

我可以提供的解决方案选项:

  1. 使用不同的组合在 Google 上搜索您的问题。(此问题也可能与正确的“libcudnn.so.7”驱动程序路径有关,很可能是这样)与libcudnn.so.7错误转储中看到CUDA Version: 11.4的结果nvidia-smi真的一样吗?您必须确定这一点。

  2. 使用现成的 nvidia-docker 插件,您可以发现 tensorflow、torch 等容器已准备就绪并正在运行。当前nvidi-smi转储表明这是可能的。

  3. 可以使用不同的打包和配置系统(例如 anaconda 和 miniconda)获得更稳定、更完整的开发环境。如有必要,您可以使用环境获得 3 个不同的 TF 和 Cuda 版本。

相关内容