nvidia-smi 必须由 root 运行,然后普通用户才能使用

nvidia-smi 必须由 root 运行,然后普通用户才能使用

在新搭建的 Ubuntu 16.04 机器上,nvidia-smi以普通用户身份运行失败

$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

以 root 身份运行

$ sudo nvidia-smi
[sudo] password for hanxue: 
Fri Jul 19 10:05:49 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.130                Driver Version: 384.130                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  Off  | 00000000:3B:00.0 Off |                    0 |
| N/A   38C    P0    31W / 250W |      0MiB / 16276MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla P100-PCIE...  Off  | 00000000:5E:00.0 Off |                    0 |
| N/A   33C    P0    31W / 250W |      0MiB / 16276MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla P100-PCIE...  Off  | 00000000:86:00.0 Off |                    0 |
| N/A   31C    P0    31W / 250W |      0MiB / 16276MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla P100-PCIE...  Off  | 00000000:AF:00.0 Off |                    0 |
| N/A   31C    P0    28W / 250W |      0MiB / 16276MiB |      4%      Default |
+-------------------------------+----------------------+----------------------+

随后以普通用户身份运行

$ nvidia-smi
Fri Jul 19 10:09:00 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.130                Driver Version: 384.130                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  Off  | 00000000:3B:00.0 Off |                    0 |
| N/A   40C    P0    31W / 250W |      0MiB / 16276MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla P100-PCIE...  Off  | 00000000:5E:00.0 Off |                    0 |
| N/A   35C    P0    31W / 250W |      0MiB / 16276MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla P100-PCIE...  Off  | 00000000:86:00.0 Off |                    0 |
| N/A   33C    P0    31W / 250W |      0MiB / 16276MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla P100-PCIE...  Off  | 00000000:AF:00.0 Off |                    0 |
| N/A   33C    P0    27W / 250W |      0MiB / 16276MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

是否存在需要首先由 root 用户运行的错误配置nvidia-smi,是否有解决方案?例如手动加载 NVIDIA 内核模块

相关内容