CUDA 工具包路径 /usr 与 /usr/lib 不一致,TensorFlow 从源代码构建

CUDA 工具包路径 /usr 与 /usr/lib 不一致,TensorFlow 从源代码构建

当我使用默认的 tensorflow 构建时出现Illegal instruction core dumped错误。

根据我的研究https://stackoverflow.com/questions/60858317/how-to-fix-illegal-instruction-core-dumped&https://github.com/tensorflow/tensorflow/issues/17411 我需要从源代码构建 tensorflow。

我开始基于以下内容从源代码构建 TensorFlow:https://www.tensorflow.org/install/source

当我想运行 ./configure 时出现错误:

Could not find any cuda.h matching version '10' in any subdirectory:
    ''
    'include'
    'include/cuda'
    'include/*-linux-gnu'
    'extras/CUPTI/include'
    'include/cuda/CUPTI'
    'local/cuda/extras/CUPTI/include'
of:
    '/lib'
    '/lib/i386-linux-gnu'
    '/lib/x86_64-linux-gnu'
    '/usr'
    '/usr/lib/x86_64-linux-gnu/libfakeroot'

因此我根据这篇文章安装了 cuda 工具包和 cudnn: https://towardsdatascience.com/installing-tensorflow-gpu-in-ubuntu-20-04-4ee3ca4cb75d

我现在有:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

和 Cudddn:

cudnn-10.1-linux-x64-v7.6.5.32 

我的 cuda 在:

whereis cuda
cuda: /usr/lib/cuda /usr/include/cuda.h

并且 nvidia-smi 返回:

nvidia-smi

现在我希望能够运行./configure 我收到以下消息:

WARNING: current bazel installation is not a release version.
Make sure you are running at least bazel 3.7.2
Please specify the location of python. [Default is /usr/bin/python3]: 


 Found possible Python library paths:
   /usr/lib/python3/dist-packages
    /usr/local/lib/python3.8/dist-packages
   Please input the desired Python library path to use.  Default is [/usr/lib/python3/dist-packages]

  Do you wish to build TensorFlow with ROCm support? [y/N]: 
  No ROCm support will be enabled for TensorFlow.

  Do you wish to build TensorFlow with CUDA support? [y/N]: y
  CUDA support will be enabled for TensorFlow.

  Do you wish to build TensorFlow with TensorRT support? [y/N]: 
  No TensorRT support will be enabled for TensorFlow.

  Inconsistent CUDA toolkit path: /usr vs /usr/lib
  Asking for detailed CUDA configuration...


 Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 10]: 


 Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7]: 


 Please specify the locally installed NCCL version you want to use. [Leave empty to use 
 http://github.com/nvidia/nccl]: 


 Please specify the comma-separated list of base paths to look for CUDA libraries and headers. 
 [Leave empty to use the default]: 

但无法传递这个错误:

不一致的 CUDA 工具包路径:/usr vs /usr/lib 询问详细的 CUDA 配置...

出了什么问题?我该如何解决这个问题?

答案1


我在尝试构建 Deepspeech 时遇到了同样的问题,并通过从nvidia 存储库
例如,如果您想在 Ubuntu 20.04 上并通过网络安装 CUDA 11.3,则必须输入以下命令:
$ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
$ sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
$ sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub
$ sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /"
$ sudo apt-get update
$ sudo apt-get -y install cuda

来源:https://developer.nvidia.com/cuda-11.3.0-download-archive

在我的情况下,这些说明将 CUDA 安装在 /usr/local/ 下,而不是直接安装在 /usr/ 下。这样,配置过程就能找到 CUDA 安装。

我希望这可以帮助别人。

相关内容