当我使用默认的 tensorflow 构建时出现Illegal instruction core dumped
错误。
根据我的研究https://stackoverflow.com/questions/60858317/how-to-fix-illegal-instruction-core-dumped&https://github.com/tensorflow/tensorflow/issues/17411 我需要从源代码构建 tensorflow。
我开始基于以下内容从源代码构建 TensorFlow:https://www.tensorflow.org/install/source
当我想运行 ./configure 时出现错误:
Could not find any cuda.h matching version '10' in any subdirectory:
''
'include'
'include/cuda'
'include/*-linux-gnu'
'extras/CUPTI/include'
'include/cuda/CUPTI'
'local/cuda/extras/CUPTI/include'
of:
'/lib'
'/lib/i386-linux-gnu'
'/lib/x86_64-linux-gnu'
'/usr'
'/usr/lib/x86_64-linux-gnu/libfakeroot'
因此我根据这篇文章安装了 cuda 工具包和 cudnn: https://towardsdatascience.com/installing-tensorflow-gpu-in-ubuntu-20-04-4ee3ca4cb75d
我现在有:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
和 Cudddn:
cudnn-10.1-linux-x64-v7.6.5.32
我的 cuda 在:
whereis cuda
cuda: /usr/lib/cuda /usr/include/cuda.h
并且 nvidia-smi 返回:
现在我希望能够运行./configure 我收到以下消息:
WARNING: current bazel installation is not a release version.
Make sure you are running at least bazel 3.7.2
Please specify the location of python. [Default is /usr/bin/python3]:
Found possible Python library paths:
/usr/lib/python3/dist-packages
/usr/local/lib/python3.8/dist-packages
Please input the desired Python library path to use. Default is [/usr/lib/python3/dist-packages]
Do you wish to build TensorFlow with ROCm support? [y/N]:
No ROCm support will be enabled for TensorFlow.
Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.
Do you wish to build TensorFlow with TensorRT support? [y/N]:
No TensorRT support will be enabled for TensorFlow.
Inconsistent CUDA toolkit path: /usr vs /usr/lib
Asking for detailed CUDA configuration...
Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 10]:
Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7]:
Please specify the locally installed NCCL version you want to use. [Leave empty to use
http://github.com/nvidia/nccl]:
Please specify the comma-separated list of base paths to look for CUDA libraries and headers.
[Leave empty to use the default]:
但无法传递这个错误:
不一致的 CUDA 工具包路径:/usr vs /usr/lib 询问详细的 CUDA 配置...
出了什么问题?我该如何解决这个问题?
答案1
我在尝试构建 Deepspeech 时遇到了同样的问题,并通过从nvidia 存储库。
例如,如果您想在 Ubuntu 20.04 上并通过网络安装 CUDA 11.3,则必须输入以下命令:
$ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
$ sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
$ sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub
$ sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /"
$ sudo apt-get update
$ sudo apt-get -y install cuda
来源:https://developer.nvidia.com/cuda-11.3.0-download-archive。
在我的情况下,这些说明将 CUDA 安装在 /usr/local/ 下,而不是直接安装在 /usr/ 下。这样,配置过程就能找到 CUDA 安装。
我希望这可以帮助别人。