NVIDIA-SMI 找不到 libnvidia-ml.so 库,且 /usr/lib/nvidia 中没有任何内容

NVIDIA-SMI 找不到 libnvidia-ml.so 库,且 /usr/lib/nvidia 中没有任何内容

我运行时出现以下错误nvidia-smi

$ nvidia-smi
NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.
Please also try adding directory that contains libnvidia-ml.so to your system PATH.

我寻找libnvidia-ml.so可能的地方:

$ locate libnvidia-ml.so
/usr/lib/i386-linux-gnu/libnvidia-ml.so
/usr/lib/x86_64-linux-gnu/libnvidia-ml.so

它可以检测显卡:

$ lspci -vnn | grep VGA -A 12
03:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP104 [GeForce GTX 1070 Ti] [10de:1b82] (rev a1) (prog-if 00 [VGA controller])
        Subsystem: eVga.com. Corp. GP104 [GeForce GTX 1070 Ti] [3842:5671]
        Flags: bus master, fast devsel, latency 0, IRQ 59
        Memory at fa000000 (32-bit, non-prefetchable) [size=16M]
        Memory at c0000000 (64-bit, prefetchable) [size=256M]
        Memory at d0000000 (64-bit, prefetchable) [size=32M]
        I/O ports at e000 [size=128]
        [virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
        Capabilities: <access denied>
        Kernel driver in use: nvidia
        Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

尝试查找 nvidia-driver 文件,没有我安装的驱动程序(440),但除了这里(384)之外我找不到任何其他参考资料 我应该删除该文件夹吗?

$ ls /usr/lib/nvidia*
/usr/lib/nvidia/
/usr/lib/nvidia-384/

系统信息:

$ uname -m && cat /etc/*release
x86_64
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.3 LTS"

$ uname -r
4.15.0-65-generic

我在安装新驱动程序之前肯定没有正确清除以前的 nvidia 驱动程序,现在我已经尝试了以下步骤很多次,但都无法让它正常工作:

$ sudo apt-get remove --purge nvidia-*
$ sudo add-apt-repository ppa:graphics-drivers/ppa
$ sudo apt update
$ ubuntu-drivers devices
$ sudo apt-get install nvidia-driver-440 # the recommended one by ubuntu-drivers
$ sudo reboot

答案1

按照以下建议解决ubfan1以及来自各种来源的其他建议。具体来说:

顺便说一下,这一切都是在控制台模式下(对我来说,alt+ctrl+F2)

照常输入登录名+密码

删除所有 nvidia 软件:

sudo apt-get purge nvidia* 

检查剩余内容:

dpkg -l | grep nvidia

然后我删除了出现的那些(主要是 libnvidia-* 但也有 xserver-xorg-video-nvidia-xxx`)

sudo apt-get purge libnvidia* xserver-xorg-video-nvidia-440 
sudo apt autoremove 

现在重新安装所有内容,包括 nvidia-common

sudo apt-get install nvidia-common

再次找到正确的驱动程序

sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
ubuntu-drivers devices
sudo apt-get install nvidia-driver-440 # the recommended one by ubuntu-drivers
update-initramfs -u # needed to do this so rebooting wouldn't lose configuration I think
sudo reboot 

相关内容