我对神经网络非常感兴趣,目前正在尝试安装并运行 tensorflow 和 keras。当然,我想在我的 GPU 上运行所有训练,但我的安装有些奇怪。我能够让所有东西运行,但找不到 libcublas.so.10 库。我彻底卸载了所有东西一两次,然后根据 tensorflow 安装指南重新安装。在这样做不起作用之后,我尝试了这个指南https://towardsdatascience.com/installing-nvidia-drivers-cuda-10-cudnn-for-tensorflow-2-1-on-ubuntu-18-04-lts-f1db8bff9ea成功率一般。据我所知,我只安装了 cuda 10.1。我检查了 nvidia smi 输出,它告诉我安装了 cuda 11。如果我查看 /usr/local 文件夹,我会得到以下输出:
/usr/local$ ls -la
total 48
drwxr-xr-x 12 root root 4096 Aug 29 20:44 .
drwxr-xr-x 13 root root 4096 Mai 19 2019 ..
drwxr-xr-x 2 root root 4096 Aug 29 20:44 bin
lrwxrwxrwx 1 root root 9 Aug 29 20:44 cuda -> cuda-10.1
drwxr-xr-x 15 root root 4096 Aug 29 20:44 cuda-10.1
drwxr-xr-x 3 root root 4096 Aug 29 20:42 cuda-10.2
drwxr-xr-x 2 root root 4096 Apr 17 2014 etc
drwxr-xr-x 2 root root 4096 Apr 17 2014 games
drwxr-xr-x 3 root root 4096 Mär 17 09:37 include
drwxr-xr-x 6 root root 4096 Mär 26 20:56 lib
lrwxrwxrwx 1 root root 9 Dez 24 2014 man -> share/man
drwxr-xr-x 2 root root 4096 Apr 17 2014 sbin
drwxr-xr-x 11 root root 4096 Mär 26 20:33 share
drwxr-xr-x 2 root root 4096 Apr 17 2014 src
这对我来说完全没有意义。有人能帮我解决这个问题吗?
以下是 nvidia smi 输出和 pip 输出
Mon Aug 31 13:26:32 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.66 Driver Version: 450.66 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 980 Off | 00000000:01:00.0 On | N/A |
| 26% 32C P0 51W / 195W | 353MiB / 4042MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1414 G /usr/lib/xorg/Xorg 158MiB |
| 0 N/A N/A 2234 G /usr/bin/gnome-shell 190MiB |
+-----------------------------------------------------------------------------+
2020-08-31 11:28:29.764303: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-08-31 11:28:30.627193: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-08-31 11:28:30.658355: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-08-31 11:28:30.658633: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 980 computeCapability: 5.2
coreClock: 1.2785GHz coreCount: 16 deviceMemorySize: 3.95GiB deviceMemoryBandwidth: 208.91GiB/s
2020-08-31 11:28:30.658659: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-08-31 11:28:30.658786: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcublas.so.10'; dlerror: libcublas.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.1/lib64
2020-08-31 11:28:30.659849: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-08-31 11:28:30.660053: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-08-31 11:28:30.661125: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-08-31 11:28:30.661716: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-08-31 11:28:30.663991: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2020-08-31 11:28:30.664007: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1753] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
2020-08-31 11:28:30.895203: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-08-31 11:28:30.899538: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 4000070000 Hz
2020-08-31 11:28:30.899808: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5b63340 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-08-31 11:28:30.899817: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-08-31 11:28:30.900635: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-08-31 11:28:30.900643: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]
编辑:目前我按照官方安装指南安装了 tensorflow