我正在尝试在 Ubuntu 16.04 64x 上为带有 Python 3.6 的 conda 环境安装支持 GPU 的 Tensorflow。
我尝试安装所有GPU 要求然后pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.10.0-cp36-cp36m-linux_x86_64.whl
从我的 conda 环境运行。
但是,当我打开 Python 终端并尝试时,import tensorflow as tf
我得到了一个ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory
。
需求状态如下:
- NVIDIA 驱动程序版本:384.130(这是的输出
nvidia-smi
) - CUDA 编译器驱动程序:(
release 7.5, V7.5.17
这是的输出nvcc -v
) - CUDA:版本 9.2.148(这是的输出
cat /usr/local/cuda/version.txt
)。我真的很困惑,因为我在其他地方读到 CUDA 版本和 nvcc 版本应该匹配。 - cuDNN:我想我已经安装了它?我下载了 .deb 包,然后
sudo dpkg -i /path/to/deb/file
按照 进行操作sudo apt-get install -f
。但互联网告诉我执行cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
应该会给我 cuDNN 版本,并且它抱怨该文件不存在。 - CUPTI:我
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/extras/CUPTI/lib64
按照 TensorFlow 要求指南中的指示进行运行。
我现在应该尝试什么?
完整错误跟踪:
>>> import tensorflow
Traceback (most recent call last):
File "/home/jsevillamol/anaconda3/envs/ctlearn/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
from tensorflow.python.pywrap_tensorflow_internal import *
File "/home/jsevillamol/anaconda3/envs/ctlearn/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
_pywrap_tensorflow_internal = swig_import_helper()
File "/home/jsevillamol/anaconda3/envs/ctlearn/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "/home/jsevillamol/anaconda3/envs/ctlearn/lib/python3.6/imp.py", line 243, in load_module
return load_dynamic(name, filename, file)
File "/home/jsevillamol/anaconda3/envs/ctlearn/lib/python3.6/imp.py", line 343, in load_dynamic
return _load(spec)
ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/jsevillamol/anaconda3/envs/ctlearn/lib/python3.6/site-packages/tensorflow/__init__.py", line 22, in <module>
from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import
File "/home/jsevillamol/anaconda3/envs/ctlearn/lib/python3.6/site-packages/tensorflow/python/__init__.py", line 49, in <module>
from tensorflow.python import pywrap_tensorflow
File "/home/jsevillamol/anaconda3/envs/ctlearn/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 74, in <module>
raise ImportError(msg)
ImportError: Traceback (most recent call last):
File "/home/jsevillamol/anaconda3/envs/ctlearn/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
from tensorflow.python.pywrap_tensorflow_internal import *
File "/home/jsevillamol/anaconda3/envs/ctlearn/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
_pywrap_tensorflow_internal = swig_import_helper()
File "/home/jsevillamol/anaconda3/envs/ctlearn/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "/home/jsevillamol/anaconda3/envs/ctlearn/lib/python3.6/imp.py", line 243, in load_module
return load_dynamic(name, filename, file)
File "/home/jsevillamol/anaconda3/envs/ctlearn/lib/python3.6/imp.py", line 343, in load_dynamic
return _load(spec)
ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory
答案1
可以使用以下教程中的代码安装 Cuda 9.0
https://www.tensorflow.org/install/gpu
# Add NVIDIA package repository
sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_9.1.85-1_amd64.deb
sudo apt install ./cuda-repo-ubuntu1604_9.1.85-1_amd64.deb
wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/nvidia-machine-learning-repo-ubuntu1604_1.0.0-1_amd64.deb
sudo apt install ./nvidia-machine-learning-repo-ubuntu1604_1.0.0-1_amd64.deb
sudo apt update
# Install CUDA and tools. Include optional NCCL 2.x
sudo apt install cuda9.0 cuda-cublas-9-0 cuda-cufft-9-0 cuda-curand-9-0 \
cuda-cusolver-9-0 cuda-cusparse-9-0 libcudnn7=7.2.1.38-1+cuda9.0 \
libnccl2=2.2.13-1+cuda9.0 cuda-command-line-tools-9-0
# Optional: Install the TensorRT runtime (must be after CUDA install)
sudo apt update
sudo apt install libnvinfer4=4.1.2-1+cuda9.0
答案2
已修复!原来默认安装的 TF 发行版不支持 CUDA 9.2。我降级到 CUDA 9.0,现在暂时可以正常工作。