apt-get 升级后 Nvidia 显卡驱动程序和 CUDA 出现问题

apt-get 升级后 Nvidia 显卡驱动程序和 CUDA 出现问题

我之前使用 Nvidia 的“deb(网络)”安装包在 Ubuntu 14.04 上安装了 CUDA 7.5。它已经运行了几个月,直到sudo apt-get upgrade今天我才运行。执行此操作后,我遇到了以下问题

$ nvidia-smi
modprobe: ERROR: ../libkmod/libkmod-module.c:809 kmod_module_insert_module() could not find module by name='nvidia_352'
modprobe: ERROR: could not insert 'nvidia_352': Function not implemented
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

运行sudo nvidia-smi也没什么不同。我无法以 GUI 模式登录(输入密码后它只会返回到登录屏幕),但我可以访问终端。

我已经能够恢复图形功能,但之后重新安装 CUDA 时遇到了困难。您能帮帮我吗?

恢复图形

我发现我可以通过以下方式让图形再次工作

$ sudo apt-get remove --purge nvidia*
$ sudo apt-get autoremove

然后编辑/etc/apt/sources.list.d/cuda.list删除所有行,然后执行

$ sudo apt-get install nvidia-352

并重新启动系统。此后,nvidia-smi它又可以正常工作了。但是,我仍然需要重新安装 CUDA。

尝试重新安装 CUDA

我尝试恢复的内容/etc/apt/sources.list.d/cuda.list,然后执行sudo apt-get install cuda。我注意到此错误消息:

Loading new nvidia-352-352.93 DKMS files...
Building only for 3.13.0-68-generic
Building for architecture x86_64
Building initial module for 3.13.0-68-generic
ERROR: Cannot create report: [Errno 17] File exists: '/var/crash/nvidia-352.0.crash'
Error! Bad return status for module build on kernel: 3.13.0-68-generic (x86_64)

执行此操作后,系统将返回到开始时的行为。例如,nvidia-smi打印上面的错误消息,并且在构建和运行后deviceQuery我收到类似的错误:

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

modprobe: ERROR: ../libkmod/libkmod-module.c:809 kmod_module_insert_module() could not find module by name='nvidia_352'
modprobe: ERROR: could not insert 'nvidia_352': Function not implemented
cudaGetDeviceCount returned 38
-> no CUDA-capable device is detected
Result = FAIL

我似乎记得,当我第一次安装 CUDA 时,只有在没有nvidia-352从 Nvidia 存储库更新软件包的情况下,它才会起作用。但是,现在我似乎没有这样做的选项,因为当我运行sudo apt-get install cuda它时会自动升级nvidia-352软件包:

Unpacking nvidia-352 (352.93-0ubuntu1) over (352.63-0ubuntu0.14.04.1) ...

如果我尝试明确设置版本,我会得到

$ sudo apt-get install cuda-drivers nvidia-352=352.63-0ubuntu0.14.04.1 nvidia-352-dev=352.63-0ubuntu0.14.04.1
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies.
 cuda-drivers : Depends: nvidia-352 (>= 352.93) but 352.63-0ubuntu0.14.04.1 is to be installed
                Depends: nvidia-352-dev (>= 352.93) but 352.63-0ubuntu0.14.04.1 is to be installed
E: Unable to correct problems, you have held broken packages.

事实上,如果我尝试使用版本352.63-0ubuntu1而不是352.63-0ubuntu0.14.04.1这样做

$ sudo apt-get install nvidia-352=352.63-0ubuntu1

那么这足以破坏图形登录并导致nvidia-smi显示上述错误消息。

诊断

$ lspci | grep -i vga
01:00.0 VGA compatible controller: NVIDIA Corporation GM200 [GeForce GTX TITAN X] (rev a1)

$ dpkg -l | grep -i nvidia
ii  bbswitch-dkms                                         0.7-2ubuntu1                                        amd64        Interface for toggling the power on nVidia Optimus video cards
ii  libcuda1-352                                          352.93-0ubuntu1                                     amd64        NVIDIA CUDA runtime library
ii  nvidia-352                                            352.93-0ubuntu1                                     amd64        NVIDIA binary driver - version 352.93
ii  nvidia-352-dev                                        352.93-0ubuntu1                                     amd64        NVIDIA binary Xorg driver development files
ii  nvidia-352-uvm                                        352.93-0ubuntu1                                     amd64        Transitional package for nvidia-352
ii  nvidia-modprobe                                       352.93-0ubuntu1                                     amd64        Load the NVIDIA kernel driver and create device files
ii  nvidia-opencl-icd-352                                 352.93-0ubuntu1                                     amd64        NVIDIA OpenCL ICD
ii  nvidia-prime                                          0.6.2                                               amd64        Tools to enable NVIDIA's Prime
ii  nvidia-settings                                       352.93-0ubuntu1                                     amd64        Tool for configuring the NVIDIA graphics driver

答案1

我遇到了类似的问题。通过安装推荐版本的 nvidia 驱动程序可以解决这个问题。

sudo apt-get install ubuntu-drivers-common

sudo ubuntu-drivers devices

sudo apt-get install <recommended version>

答案2

一位朋友帮我解决了这个问题!

他向我展示的解决方案是(像以前一样删除所有 nvidia 包后)

$ sudo add-apt-repository ppa:graphics-drivers/ppa
$ sudo apt-get install nvidia-364

然后从 Nvidia 下载 .run CUDA 安装程序(对我来说是 cuda_7.5.18_linux.run),当询问是否要安装与 CUDA 一起打包的驱动程序时,请小心选择“否”。

相关内容