我应该向用户授予 sudo 或添加 CAP_SYS_ADMIN 功能以使用 nvprof/ncu?为什么?

我应该向用户授予 sudo 或添加 CAP_SYS_ADMIN 功能以使用 nvprof/ncu?为什么?

从 CUDA 10.1 开始,用户需要拥有 sudo 权限才能使用 cuda 分析工具(例如nvprof或 nsightcompute ncu)收集高级指标。

这里描述了解决这个问题的替代方案:

上面的链接提到可以使用 CAP_SYS_ADMIN 来启用这些指标的收集。

为了理解这个问题,我发现了这个富有洞察力的堆栈溢出响应:

如果我错了,请纠正我,但为了继续使用 CAP_SYS_ADMIN 路径,我应该启用应用程序的功能用户(如果是非 root 用户)。

我不熟悉 Linux 功能,并且不确定是否最好将 CAP_SYS_ADMIN 授予用户/应用程序或仅授予用户 SUDO 访问权限。为什么一个比另一个更好?


编辑:截至目前,我仍然无法让它工作。

# First I executed
$ sudo setcap cap_sys_admin+ep /usr/local/cuda/bin/nvprof
# This is the command that I am executing after installing the CUDA toolkit 10.2.
$ /usr/local/cuda/bin/nvprof -o output-detailed.nvvp -f --analysis-metrics /usr/local/cuda/extras/demo_suite/vectorAdd

[Vector addition of 50000 elements]
==142443== NVPROF is profiling process 142443, command: /usr/local/cuda/extras/demo_suite/vectorAdd
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
==142443== Some kernel(s) will be replayed on device 0 in order to collect all events/metrics.
==142443== Warning: ERR_NVGPUCTRPERM - The user does not have permission to profile on the target device. See the following link for instructions to enable permissions and get more information: https://developer.nvidia.com/ERR_NVGPUCTRPERM
Failed to launch vectorAdd kernel (error code unknown error)!
==142443== Warning: ERR_NVGPUCTRPERM - The user does not have permission to profile on the target device. See the following link for instructions to enable permissions and get more information: https://developer.nvidia.com/ERR_NVGPUCTRPERM
==142443== Warning: Some profiling data are not recorded. Make sure cudaProfilerStop() or cuProfilerStop() is called before application exit to flush profile data.
==142443== Generated result file: /results/nvprof/output-detailed.nvvp

但只有当我使用 sudo 运行时它才有效。

$ sudo /usr/local/cuda/bin/nvprof -o output-detailed.nvvp -f --analysis-metrics /usr/local/cuda/extras/demo_suite/vectorAdd

[Vector addition of 50000 elements]
==142687== NVPROF is profiling process 142687, command: /usr/local/cuda/extras/demo_suite/vectorAdd
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
==142687== Some kernel(s) will be replayed on device 0 in order to collect all events/metrics.
Replaying kernel "vectorAdd(float const *, float const *, float*, int)" (done)
Copy output data from the CUDA device to the host memory
Test PASSED
Done
==142687== Generated result file: /home/agostini/Development/nvprof/output-detailed.nvvp

为什么向可执行文件授予功能并让超级用户在没有 sudo 的情况下运行应用程序是不够的。即使对于 sudo 组中的用户来说,是否真的需要 PAM 设置?

答案1

我不知道具体是如何CAP_SYS_ADMIN工作的,但遵循可能会更容易说明这使得非 root 用户可以使用探查器。

echo 'options nvidia "NVreg_RestrictProfilingToAdminUsers=0"' | sudo tee -a /etc/modprobe.d/nvidia.conf
sudo update-initramfs -u 
sudo reboot

相关内容