我刚刚购买了 RTX 2060,到目前为止,在我的环境/设置中一切运行良好。然而,我仍然无法分析我的代码——
(nvidia) brandon@b350-gaming-pc:~/projects/nvidia$ nvprof ./example.py
==29983== NVPROF is profiling process 29983, command: python3 ./example.py
Time: 0.05056905746459961
==29983== Warning: ERR_NVGPUCTRPERM - The user does not have permission to profile on the target device. See the following link for instructions to enable permissions and get more information: https://developer.nvidia.com/ERR_NVGPUCTRPERM
==29983== Profiling application: python3 ./example.py
==29983== Profiling result:
No kernels were profiled.
No API activities were profiled.
==29983== Warning: Some profiling data are not recorded. Make sure cudaProfilerStop() or cuProfilerStop() is called before application exit to flush profile data.
我知道这显然是一个权限“错误”,所以我继续添加以下内容——
(nvidia) brandon@b350-gaming-pc:~/projects/nvidia$ cat /etc/modprobe.d/cuda.conf
NVreg_RestrictProfilingToAdminUsers=0
但是,重新启动后,我在尝试分析我的代码时收到相同的消息。而且,
(nvidia) brandon@b350-gaming-pc:~/projects/nvidia$ sudo update-initramfs -u
[sudo] password for brandon:
update-initramfs: Generating /boot/initrd.img-4.15.0-55-generic
libkmod: ERROR ../libkmod/libkmod-config.c:656 kmod_config_parse: /etc/modprobe.d/cuda.conf line 1: ignoring bad line starting with 'NVreg_RestrictProfilingToAdminUsers=0'
libkmod: ERROR ../libkmod/libkmod-config.c:656 kmod_config_parse: /etc/modprobe.d/cuda.conf line 1: ignoring bad line starting with 'NVreg_RestrictProfilingToAdminUsers=0'
libkmod: ERROR ../libkmod/libkmod-config.c:656 kmod_config_parse: /etc/modprobe.d/cuda.conf line 1: ignoring bad line starting with 'NVreg_RestrictProfilingToAdminUsers=0'
...
这个命令似乎永远重复。
我在这里缺少什么吗?
以下是有关驱动程序和我的环境的更多信息--
(base) brandon@b350-gaming-pc:~$ nvidia-smi
Mon Sep 9 11:12:51 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 2060 On | 00000000:0A:00.0 On | N/A |
| 0% 45C P8 20W / 170W | 1323MiB / 5903MiB | 38% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 2603 G /usr/lib/firefox/firefox 3MiB |
| 0 4300 G /usr/lib/xorg/Xorg 34MiB |
| 0 4894 G /usr/bin/gnome-shell 51MiB |
| 0 5806 G /usr/lib/xorg/Xorg 254MiB |
| 0 5920 G /usr/bin/gnome-shell 899MiB |
| 0 10378 G ...quest-channel-token=3880407371781342003 36MiB |
+-----------------------------------------------------------------------------+
(base) brandon@b350-gaming-pc:~$ uname -r
4.15.0-55-generic
(base) brandon@b350-gaming-pc:~$ lsmod | grep -i nvidia
nvidia_uvm 798720 0
nvidia_drm 45056 8
nvidia_modeset 1093632 17 nvidia_drm
nvidia 18194432 718 nvidia_uvm,nvidia_modeset
drm_kms_helper 167936 1 nvidia_drm
drm 401408 11 drm_kms_helper,nvidia_drm
ipmi_msghandler 53248 2 ipmi_devintf,nvidia
(base) brandon@b350-gaming-pc:~$ which nvprof
/usr/local/cuda-10.1/bin/nvprof
(base) brandon@b350-gaming-pc:~$ which python
/home/brandon/anaconda3/bin/python
如果您想查看我的系统的其他内容/输出,请告诉我。
答案1
我认为您刚刚错过了该文件的整个选项/etc/modprobe.d/cuda.conf
。试试这个:
options nvidia "NVreg_RestrictProfilingToAdminUsers=0"
答案2
如果不是像 sudo 这样的 root 用户,请从您的登录名运行以下命令:
systemctlisolatemultiuser#停止窗口管理器。 modprobe -r nvidia_uvm nvidia_drm nvidia_modeset nvidia-vgpu-vfio nvidia sudo setcap cap_sys_admin+ep modprobe nvidia NVreg_RestrictProfilingToAdminUsers=0 ;;将以下内容添加到 /etc/modprobe.d/<.conf> systemctl 隔离图形
在设置插入模块键集或取消设置之前,应停止窗口管理器并卸载所有旧模块。插入模块密钥后,请确保启动窗口管理器。
如果您仍然发现错误,请打印当前用户运行上述命令的命令输出: $ capsh --print|grep -i "cap_sys_admin"