无法使用 NVreg_RestrictProfilingToAdminUsers=0 解析 NVIDIA/nvprof ERR_NVGPUCTRPERM

无法使用 NVreg_RestrictProfilingToAdminUsers=0 解析 NVIDIA/nvprof ERR_NVGPUCTRPERM

我刚刚购买了 RTX 2060,到目前为止,在我的环境/设置中一切运行良好。然而,我仍然无法分析我的代码——

(nvidia) brandon@b350-gaming-pc:~/projects/nvidia$ nvprof ./example.py 
==29983== NVPROF is profiling process 29983, command: python3 ./example.py
Time: 0.05056905746459961
==29983== Warning: ERR_NVGPUCTRPERM - The user does not have permission to profile on the target device. See the following link for instructions to enable permissions and get more information: https://developer.nvidia.com/ERR_NVGPUCTRPERM 
==29983== Profiling application: python3 ./example.py
==29983== Profiling result:
No kernels were profiled.
No API activities were profiled.
==29983== Warning: Some profiling data are not recorded. Make sure cudaProfilerStop() or cuProfilerStop() is called before application exit to flush profile data.

我知道这显然是一个权限“错误”,所以我继续添加以下内容——

(nvidia) brandon@b350-gaming-pc:~/projects/nvidia$ cat /etc/modprobe.d/cuda.conf 
NVreg_RestrictProfilingToAdminUsers=0

但是,重新启动后,我在尝试分析我的代码时收到相同的消息。而且,

(nvidia) brandon@b350-gaming-pc:~/projects/nvidia$ sudo update-initramfs -u
[sudo] password for brandon: 
update-initramfs: Generating /boot/initrd.img-4.15.0-55-generic
libkmod: ERROR ../libkmod/libkmod-config.c:656 kmod_config_parse: /etc/modprobe.d/cuda.conf line 1: ignoring bad line starting with 'NVreg_RestrictProfilingToAdminUsers=0'
libkmod: ERROR ../libkmod/libkmod-config.c:656 kmod_config_parse: /etc/modprobe.d/cuda.conf line 1: ignoring bad line starting with 'NVreg_RestrictProfilingToAdminUsers=0'
libkmod: ERROR ../libkmod/libkmod-config.c:656 kmod_config_parse: /etc/modprobe.d/cuda.conf line 1: ignoring bad line starting with 'NVreg_RestrictProfilingToAdminUsers=0'
...

这个命令似乎永远重复。

我在这里缺少什么吗?

以下是有关驱动程序和我的环境的更多信息--

(base) brandon@b350-gaming-pc:~$ nvidia-smi 
Mon Sep  9 11:12:51 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00    Driver Version: 418.87.00    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 2060    On   | 00000000:0A:00.0  On |                  N/A |
|  0%   45C    P8    20W / 170W |   1323MiB /  5903MiB |     38%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      2603      G   /usr/lib/firefox/firefox                       3MiB |
|    0      4300      G   /usr/lib/xorg/Xorg                            34MiB |
|    0      4894      G   /usr/bin/gnome-shell                          51MiB |
|    0      5806      G   /usr/lib/xorg/Xorg                           254MiB |
|    0      5920      G   /usr/bin/gnome-shell                         899MiB |
|    0     10378      G   ...quest-channel-token=3880407371781342003    36MiB |
+-----------------------------------------------------------------------------+
(base) brandon@b350-gaming-pc:~$ uname -r
4.15.0-55-generic
(base) brandon@b350-gaming-pc:~$ lsmod | grep -i nvidia
nvidia_uvm            798720  0
nvidia_drm             45056  8
nvidia_modeset       1093632  17 nvidia_drm
nvidia              18194432  718 nvidia_uvm,nvidia_modeset
drm_kms_helper        167936  1 nvidia_drm
drm                   401408  11 drm_kms_helper,nvidia_drm
ipmi_msghandler        53248  2 ipmi_devintf,nvidia
(base) brandon@b350-gaming-pc:~$ which nvprof 
/usr/local/cuda-10.1/bin/nvprof
(base) brandon@b350-gaming-pc:~$ which python
/home/brandon/anaconda3/bin/python

如果您想查看我的系统的其他内容/输出,请告诉我。

答案1

我认为您刚刚错过了该文件的整个选项/etc/modprobe.d/cuda.conf。试试这个:

options nvidia "NVreg_RestrictProfilingToAdminUsers=0"

答案2

如果不是像 sudo 这样的 root 用户,请从您的登录名运行以下命令:

systemctlisolatemultiuser#停止窗口管理器。 modprobe -r nvidia_uvm nvidia_drm nvidia_modeset nvidia-vgpu-vfio nvidia sudo setcap cap_sys_admin+ep modprobe nvidia NVreg_RestrictProfilingToAdminUsers=0 ;;将以下内容添加到 /etc/modprobe.d/<.conf> systemctl 隔离图形

在设置插入模块键集或取消设置之前,应停止窗口管理器并卸载所有旧模块。插入模块密钥后,请确保启动窗口管理器。

如果您仍然发现错误,请打印当前用户运行上述命令的命令输出: $ capsh --print|grep -i "cap_sys_admin"

相关内容