NVIDIA-SMI 找不到 libnvidia-ml.so 库

NVIDIA-SMI 找不到 libnvidia-ml.so 库

我的笔记本电脑中有以下 Nvidia 显卡

ant@Anthill ~> lspci -k | grep -EA2 'VGA|3D'
00:02.0 VGA compatible controller: Intel Corporation 4th Gen Core Processor Integrated Graphics Controller (rev 06)
    Subsystem: Lenovo 4th Gen Core Processor Integrated Graphics Controller
    Kernel driver in use: i915
--
07:00.0 3D controller: NVIDIA Corporation GK208M [GeForce GT 740M] (rev a1)
    Subsystem: Lenovo GK208M [GeForce GT 740M]
    Kernel modules: nvidiafb, nouveau

我已经按照以下方式安装了驱动程序

sudo apt-add-repository ppa:graphics-drivers/ppa
sudo apt-get install nvidia-370 nvidia-prime

并从 nvidia 官方网站下载 cuda-7.5 二进制文件来获取 cuda 工具包

sudo ./NVidia-cuda-7.5.run

所有这些安装都是在转移到终端并停止 XOrg

sudo service lightdm stop

现在重启后

ant@Anthill ~> nvidia-smi
NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.
Please also try adding directory that contains libnvidia-ml.so to your system PATH.

libnvidia-ml.so就在这里

ant@Anthill ~> ls /usr/lib/nvidia-370
alt_ld.so.conf                 libGLX_indirect.so.0@            libnvidia-fatbinaryloader.so.370.28
bin/                           libGLX_nvidia.so.0@              libnvidia-fbc.so.370.28
ld.so.conf                     libGLX_nvidia.so.370.28          libnvidia-glcore.so.370.28
libEGL_nvidia.so.0@            libGLX.so@                       libnvidia-glsi.so.370.28
libEGL_nvidia.so.370.28        libGLX.so.0                      libnvidia-ifr.so@
libEGL.so@                     libnvcuvid.so@                   libnvidia-ifr.so.1@
libEGL.so.1                    libnvcuvid.so.1@                 libnvidia-ifr.so.370.28
libGLdispatch.so.0             libnvcuvid.so.370.28             libnvidia-ml.so@
libGLESv1_CM_nvidia.so.1@      libnvidia-cfg.so@                libnvidia-ml.so.1@
libGLESv1_CM_nvidia.so.370.28  libnvidia-cfg.so.1@              libnvidia-ml.so.370.28
libGLESv1_CM.so@               libnvidia-cfg.so.370.28          libnvidia-ptxjitcompiler.so.370.28
libGLESv1_CM.so.1              libnvidia-compiler.so@           libnvidia-tls.so.370.28
libGLESv2_nvidia.so.2@         libnvidia-compiler.so.1@         libnvidia-wfb.so.370.28
libGLESv2_nvidia.so.370.28     libnvidia-compiler.so.370.28     libOpenGL.so@
libGLESv2.so@                  libnvidia-eglcore.so.370.28      libOpenGL.so.0
libGLESv2.so.2                 libnvidia-egl-wayland.so.370.28  tls/
libGL.so@                      libnvidia-encode.so@             vdpau/
libGL.so.1@                    libnvidia-encode.so.1@           xorg/
libGL.so.1.0.0                 libnvidia-encode.so.370.28

我也尝试将此目录添加到 PATH 和 LD_LIBRARY_PATH。但都没有成功。

还,

ls /dev | grep nvidia

什么也没有产生。也就是说没有设备存在/dev/nivida*

有什么建议可以让这个工作吗?在哪里nvidia-smi尝试找到libnvidia-ml.so

答案1

LD_PRELOAD=/usr/lib/nvidia-367/libnvidia-ml.so nvidia-smi

答案2

我的错误通过这种方式解决

这让我找到了另一个解决方案/etc/nvidia-container-runtime/config.tomlldconfig 默认设置为“@/sbin/ldconfig”。由于某种原因,这似乎不起作用并且还产生了上述错误:

root@banshee:/var/log# docker run --rm --gpus=all nvidia/cuda:11.4-base nvidia-smi
NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.
Please also try adding directory that contains libnvidia-ml.so to your system PATH.

将 ldconfig 路径更改为“/sbin/ldconfig”(而不是“@/sbin/ldconfig”)确实可以解决问题:

root@banshee:/var/log# docker run --rm --gpus=all nvidia/cuda:11.4-base nvidia-smi
Sun Jan  5 20:39:45 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.64       Driver Version: 430.64       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 970     On   | 00000000:01:00.0  On |                  N/A |
| 32%   39C    P8    16W / 170W |    422MiB /  4038MiB |      3%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

来源 :github

答案3

在一些 nvidia-docker 容器崩溃后,我遇到了这个问题。libnvidia-ml.so出现在/usr/lib/nvidia-<version>,但nvidia-smi一直在抱怨。

我通过以下方式解决了问题sudo ldconfig.real

答案4

驱动程序升级后我遇到了这个问题。

我通过更改 LDCONFIG 文件修复了这个问题:

sudo vi /etc/ld.so.conf.d/cuda-8-0.conf 

随着内容

/usr/local/cuda-8.0/targets/x86_64-linux/lib 
/usr/lib/nvidia-<PUT_YOUR_VERSION_HERE>

相关内容