如何安装带有 CUDA 9.0 和 CUDNN 7.0 的 tensorflow?

如何安装带有 CUDA 9.0 和 CUDNN 7.0 的 tensorflow?

我安装 CUDA 9.0 和 CUDNN 7.0 成功,但安装 tensorflow 1.4 失败。

我的错误信息:

sam@sam:~/code/download/CNN/tensorflow_1.4/tensorflow$ bazel build -c opt --copt=-march="haswell" --config=cuda //tensorflow/tools/pip_package:build_pip_package
.................
WARNING: The lower priority option '-c opt' does not override the previous value '-c opt'
ERROR: Skipping '//tensorflow/tools/pip_package:build_pip_package': error loading package 'tensorflow/tools/pip_package': Encountered error while reading extension file 'cuda/build_defs.bzl': no such package '@local_config_cuda//cuda': Traceback (most recent call last):
    File "/home/sam/code/download/CNN/tensorflow_1.4/tensorflow/third_party/gpus/cuda_configure.bzl", line 1042
        _create_local_cuda_repository(repository_ctx)
    File "/home/sam/code/download/CNN/tensorflow_1.4/tensorflow/third_party/gpus/cuda_configure.bzl", line 905, in _create_local_cuda_repository
        _get_cuda_config(repository_ctx)
    File "/home/sam/code/download/CNN/tensorflow_1.4/tensorflow/third_party/gpus/cuda_configure.bzl", line 662, in _get_cuda_config
        _cudnn_version(repository_ctx, cudnn_install_base..., ...)
    File "/home/sam/code/download/CNN/tensorflow_1.4/tensorflow/third_party/gpus/cuda_configure.bzl", line 360, in _cudnn_version
        _find_cudnn_header_dir(repository_ctx, cudnn_install_base...)
    File "/home/sam/code/download/CNN/tensorflow_1.4/tensorflow/third_party/gpus/cuda_configure.bzl", line 612, in _find_cudnn_header_dir
        auto_configure_fail(("Cannot find cudnn.h under %s" ...))
    File "/home/sam/code/download/CNN/tensorflow_1.4/tensorflow/third_party/gpus/cuda_configure.bzl", line 129, in auto_configure_fail
        fail(("\n%sCuda Configuration Error:%...)))

Cuda Configuration Error: Cannot find cudnn.h under /usr/lib/x86_64-linux-gnu
WARNING: Target pattern parsing failed.
ERROR: error loading package 'tensorflow/tools/pip_package': Encountered error while reading extension file 'cuda/build_defs.bzl': no such package '@local_config_cuda//cuda': Traceback (most recent call last):
    File "/home/sam/code/download/CNN/tensorflow_1.4/tensorflow/third_party/gpus/cuda_configure.bzl", line 1042
        _create_local_cuda_repository(repository_ctx)
    File "/home/sam/code/download/CNN/tensorflow_1.4/tensorflow/third_party/gpus/cuda_configure.bzl", line 905, in _create_local_cuda_repository
        _get_cuda_config(repository_ctx)
    File "/home/sam/code/download/CNN/tensorflow_1.4/tensorflow/third_party/gpus/cuda_configure.bzl", line 662, in _get_cuda_config
        _cudnn_version(repository_ctx, cudnn_install_base..., ...)
    File "/home/sam/code/download/CNN/tensorflow_1.4/tensorflow/third_party/gpus/cuda_configure.bzl", line 360, in _cudnn_version
        _find_cudnn_header_dir(repository_ctx, cudnn_install_base...)
    File "/home/sam/code/download/CNN/tensorflow_1.4/tensorflow/third_party/gpus/cuda_configure.bzl", line 612, in _find_cudnn_header_dir
        auto_configure_fail(("Cannot find cudnn.h under %s" ...))
    File "/home/sam/code/download/CNN/tensorflow_1.4/tensorflow/third_party/gpus/cuda_configure.bzl", line 129, in auto_configure_fail
        fail(("\n%sCuda Configuration Error:%...)))

Cuda Configuration Error: Cannot find cudnn.h under /usr/lib/x86_64-linux-gnu
INFO: Elapsed time: 3.466s
FAILED: Build did NOT complete successfully (0 packages loaded)
    currently loading: tensorflow/tools/pip_package
sam@sam:~/code/download/CNN/tensorflow_1.4/tensorflow$ 

我的 CUDA 9.0 安装说明:

mkdir -p ~/code/download/lib/cuda/
cd ~/code/download/lib/cuda/
wget -c https://developer.nvidia.com/compute/cuda/9.0/Prod/local_installers/cuda_9.0.176_384.81_linux-run
chmod 777 cuda_9.0.176_384.81_linux-run
sudo apt-get install nvidia-375
sudo sh ./cuda_9.0.176_384.81_linux-run
export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export CUDA_HOME=/usr/local/cuda

我的 Cudnn 7.0 安装说明:

sudo dpkg -i libcudnn7_7.0.4.31-1+cuda9.0_amd64.deb

我的Tensorflow 1.4配置过程:

echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list
curl https://bazel.build/bazel-release.pub.gpg | sudo apt-key add -
sudo apt-get update
sudo apt-get install bazel
sudo apt install python-dev python-pip python-nose gcc g++ git gfortran vim libopenblas-dev liblapack-dev libatlas-base-dev openjdk-8-jdk
sudo pip install -U --pre pip setuptools wheel
sudo pip install -U --pre numpy scipy matplotlib scikit-learn scikit-image
mkdir -p ~/code/download/CNN/tensorflow_1.4/
cd ~/code/download/CNN/tensorflow_1.4/
git clone https://github.com/tensorflow/tensorflow.git -b r1.4
cd tensorflow
./configure
sam@sam:~/code/download/CNN/tensorflow_1.4/tensorflow$ ./configure
You have bazel 0.7.0 installed.
Please specify the location of python. [Default is /usr/bin/python]: 


Found possible Python library paths:
  /usr/local/lib/python2.7/dist-packages
  /usr/lib/python2.7/dist-packages
  /home/sam/code/download/CNN/caffe_1.0_RC5/caffe-rc5/python
Please input the desired Python library path to use.  Default is [/usr/local/lib/python2.7/dist-packages]

Do you wish to build TensorFlow with jemalloc as malloc support? [Y/n]: n
No jemalloc as malloc support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: n
No Google Cloud Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: n
No Hadoop File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: n
No Amazon S3 File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with XLA JIT support? [y/N]: N
No XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with GDR support? [y/N]: N
No GDR support will be enabled for TensorFlow.

Do you wish to build TensorFlow with VERBS support? [y/N]: N
No VERBS support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL support? [y/N]: N
No OpenCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.

Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 8.0]: 9.0


Please specify the location where CUDA 9.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: 


Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 6.0]: 7.0


Please specify the location where cuDNN 7.0 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:/usr/lib/x86_64-linux-gnu/


Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 5.0]


Do you want to use clang as CUDA compiler? [y/N]: N
nvcc will be used as CUDA compiler.

Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: 


Do you wish to build TensorFlow with MPI support? [y/N]: y
MPI support will be enabled for TensorFlow.

Please specify the MPI toolkit folder. [Default is /usr]: 


Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]: 


Add "--config=mkl" to your bazel command to build with MKL support.
Please note that MKL on MacOS or windows is still not supported.
If you would like to use a local MKL instead of downloading, please set the environment variable "TF_MKL_ROOT" every time before build.
Configuration finished
sam@sam:~/code/download/CNN/tensorflow_1.4/tensorflow$ 

下一步我该做什么?

谢谢〜

=======================

我通过安装另一个 deb 解决了上述问题:

sudo dpkg -i libcudnn7-dev_7.0.4.31-1+cuda9.0_amd64.deb

然后我使用以下命令编译 tensorflow:

bazel build -c opt --copt=-march="haswell" --config=cuda //tensorflow/tools/pip_package:build_pip_package

并输出另一个错误:

tensorflow/contrib/batching/kernels/batch_kernels.cc:258:19: note: 'batcher_queue' was declared here
     BatcherQueue* batcher_queue;
                   ^
ERROR: /home/sam/code/download/CNN/tensorflow_1.4/tensorflow/tensorflow/python/BUILD:1232:1: Linking of rule '//tensorflow/python:gen_checkpoint_ops_py_wrappers_cc' failed (Exit 1)
/usr/bin/ld: warning: libcufft.so.9.0, needed by bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so, not found (try using -rpath or -rpath-link)
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to `[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to `[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to `[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to `[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to `[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to `[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to `[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to `[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to `[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to `[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to `[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to `[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to `[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to `[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to `[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to `[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to `[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to `[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to `[email protected]'
collect2: error: ld returned 1 exit status
Target //tensorflow/tools/pip_package:build_pip_package failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 2439.556s, Critical Path: 155.31s
FAILED: Build did NOT complete successfully
sam@sam:~/code/download/CNN/tensorflow_1.4/tensorflow$ 

下一步我该做什么?

谢谢〜

答案1

我找到答案了!

我需要创建软链接:

sudo ln -s /usr/local/cuda-9.0/lib64/libcufft.so /usr/lib/libcufft.so.9.0

然后我重新配置 MPI 支持为 false。

之后这个命令就成功了!

At global scope:
cc1plus: warning: unrecognized command line option '-Wno-self-assign'
Target //tensorflow/tools/pip_package:build_pip_package up-to-date:
  bazel-bin/tensorflow/tools/pip_package/build_pip_package
INFO: Elapsed time: 275.306s, Critical Path: 36.05s
INFO: Build completed successfully, 602 total actions
sam@sam:~/code/download/CNN/tensorflow_1.4/tensorflow$ 

然后我运行:

bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

sam@sam:~/code/download/CNN/tensorflow_1.4/tensorflow$ bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
一 11月 20 09:53:08 CST 2017 : === Using tmpdir: /tmp/tmp.xpC8nRamZR
~/code/download/CNN/tensorflow_1.4/tensorflow/bazel-bin/tensorflow/tools/pip_package/build_pip_package.runfiles ~/code/download/CNN/tensorflow_1.4/tensorflow
~/code/download/CNN/tensorflow_1.4/tensorflow
/tmp/tmp.xpC8nRamZR ~/code/download/CNN/tensorflow_1.4/tensorflow
一 11月 20 09:53:10 CST 2017 : === Building wheel
warning: no files found matching '*.dll' under directory '*'
warning: no files found matching '*.lib' under directory '*'
warning: no files found matching '*.h' under directory 'tensorflow/include/tensorflow'
warning: no files found matching '*' under directory 'tensorflow/include/Eigen'
warning: no files found matching '*' under directory 'tensorflow/include/external'
warning: no files found matching '*.h' under directory 'tensorflow/include/google'
warning: no files found matching '*' under directory 'tensorflow/include/third_party'
warning: no files found matching '*' under directory 'tensorflow/include/unsupported'
~/code/download/CNN/tensorflow_1.4/tensorflow
一 11月 20 09:53:35 CST 2017 : === Output wheel file is in: /tmp/tensorflow_pkg
sam@sam:~/code/download/CNN/tensorflow_1.4/tensorflow$ 

然后我发现它创建了 tensorflow whl 文件:

sam@sam:~/code/download/CNN/tensorflow_1.4/tensorflow$ ls /tmp/tensorflow_pkg
tensorflow-1.4.1-cp27-cp27mu-linux_x86_64.whl
sam@sam:~/code/download/CNN/tensorflow_1.4/tensorflow$ 

然后我删除旧的 tensorflow:

sudo pip uninstall tensorflow-gpu
sudo pip uninstall tensorflow-tensorboard

我安装了新的,我编译成功了!

sudo pip install --upgrade /tmp/tensorflow_pkg/tensorflow-1.4.1-cp27-cp27mu-linux_x86_64.whl

然后我创建 CUDA 的软链接:

sudo ln -s /usr/local/cuda-9.0/lib64/libcusolver.so /usr/lib/libcusolver.so.9.0

然后我测试tensorflow成功!

sam@sam:~/code/download/lib/cudnn7$ python -c 'import os; import inspect; import tensorflow; print(os.path.dirname(inspect.getfile(tensorflow)))'
/usr/local/lib/python2.7/dist-packages/tensorflow
sam@sam:~/code/download/lib/cudnn7$ 

谢谢〜

答案2

相关内容