我在 Ubuntu 18.04 服务器上安装 CUDA 10.1 后遇到了一个奇怪的问题。我发现 CUDA 文件夹下的所有文件都是空的!有人能帮我解决这个问题吗?提前感谢您的时间。
以下是我的问题的一些详细信息。
首先,我按照以下步骤卸载现有的 CUDA,并安装最新版本:
卸载
sudo apt-get purge nvidia*
sudo apt-get autoremove
sudo apt-get autoclean
sudo rm -rf /usr/local/cuda*
安装
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt updat
sudo ubuntu-drivers autoinstall
sudo apt install nvidia-cuda-toolkit
目前一切看起来都很好,以下是一些安装结果:
$ nvidia-smi
Mon May 25 02:32:42 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82 Driver Version: 440.82 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro K600 Off | 00000000:02:00.0 On | N/A |
| 31% 62C P0 N/A / N/A | 422MiB / 974MiB | 1% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 106... Off | 00000000:03:00.0 Off | N/A |
| 0% 40C P8 4W / 120W | 15MiB / 6078MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1432 G /usr/lib/xorg/Xorg 31MiB |
| 0 1500 G /usr/bin/gnome-shell 51MiB |
| 0 1681 G /usr/lib/xorg/Xorg 95MiB |
| 0 1799 G /usr/bin/gnome-shell 89MiB |
| 0 2687 G ...AAAAAAAAAAAAAAgAAAAAAAAA --shared-files 148MiB |
| 1 1432 G /usr/lib/xorg/Xorg 5MiB |
| 1 1681 G /usr/lib/xorg/Xorg 6MiB |
+-----------------------------------------------------------------------------+
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85
然后问题来了。通常 CUDA 应该在 /usr/local 下,但我在那里找不到它。因此我尝试:
(定位功能已更新sudo updatedb
)
locate cuda | grep /cuda$
/home/zihan/.local/lib/python3.6/site-packages/torch/cuda
/home/zihan/.local/lib/python3.6/site-packages/torch/backends/cuda
/home/zihan/.local/lib/python3.6/site-packages/torch/include/ATen/cuda
/home/zihan/.local/lib/python3.6/site-packages/torch/include/c10/cuda
/home/zihan/.local/lib/python3.6/site-packages/torch/include/torch/csrc/cuda
/home/zihan/anaconda3/lib/python3.7/site-packages/numba/cuda
/usr/include/thrust/system/cuda
/usr/lib/cuda
然后我去 /usr/lib/cuda 检查。但是,我发现 CUDA 里面的所有文件夹都是空的。我的意思是,一切都消失了!
/usr/lib/cuda$ ls
bin include lib64 nvvm version.txt
# version.txt shows CUDA Version 9.1.85
/usr/lib/cuda/bin$ ls -l
total 0
# (questions are same in other folders, like lib64)
/usr/lib/cuda/lib64$ ls -l
total 0
请问这些文件去哪儿了?为什么会发生这种情况,我该如何解决?非常感谢您的帮助。