在新搭建的 Ubuntu 16.04 机器上,nvidia-smi
以普通用户身份运行失败
$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
以 root 身份运行
$ sudo nvidia-smi
[sudo] password for hanxue:
Fri Jul 19 10:05:49 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.130 Driver Version: 384.130 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... Off | 00000000:3B:00.0 Off | 0 |
| N/A 38C P0 31W / 250W | 0MiB / 16276MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P100-PCIE... Off | 00000000:5E:00.0 Off | 0 |
| N/A 33C P0 31W / 250W | 0MiB / 16276MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla P100-PCIE... Off | 00000000:86:00.0 Off | 0 |
| N/A 31C P0 31W / 250W | 0MiB / 16276MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla P100-PCIE... Off | 00000000:AF:00.0 Off | 0 |
| N/A 31C P0 28W / 250W | 0MiB / 16276MiB | 4% Default |
+-------------------------------+----------------------+----------------------+
随后以普通用户身份运行
$ nvidia-smi
Fri Jul 19 10:09:00 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.130 Driver Version: 384.130 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... Off | 00000000:3B:00.0 Off | 0 |
| N/A 40C P0 31W / 250W | 0MiB / 16276MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P100-PCIE... Off | 00000000:5E:00.0 Off | 0 |
| N/A 35C P0 31W / 250W | 0MiB / 16276MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla P100-PCIE... Off | 00000000:86:00.0 Off | 0 |
| N/A 33C P0 31W / 250W | 0MiB / 16276MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla P100-PCIE... Off | 00000000:AF:00.0 Off | 0 |
| N/A 33C P0 27W / 250W | 0MiB / 16276MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
是否存在需要首先由 root 用户运行的错误配置nvidia-smi
,是否有解决方案?例如手动加载 NVIDIA 内核模块