没有脚本运行,但 GPU 内存仍然分配

没有脚本运行,但 GPU 内存仍然分配

我正在从本地计算机访问远程 Linux 服务器。远程服务器上没有运行任何脚本,但 GPU 内存仍被分配。 PS:这可能是由于某些崩溃造成的。

演出节目nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-PCI...  Off  | 00000000:31:00.0 Off |                    0 |
| N/A   34C    P0    42W / 250W |  19403MiB / 40960MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA A100-PCI...  Off  | 00000000:4B:00.0 Off |                    0 |
| N/A   35C    P0    59W / 250W |  10886MiB / 40960MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A       583      C                                    1001MiB |
|    0   N/A  N/A     16158      C                                    5065MiB |
|    0   N/A  N/A     35103      C                                    1291MiB |
|    0   N/A  N/A     46387      C                                    1337MiB |
|    0   N/A  N/A     54860      C                                    1273MiB |
|    0   N/A  N/A     71766      C                                    2077MiB |
|    0   N/A  N/A     80967      C                                    4991MiB |
|    0   N/A  N/A     83598      C                                    1071MiB |
|    0   N/A  N/A     93077      C                                    1293MiB |
|    1   N/A  N/A       583      C                                     917MiB |
|    1   N/A  N/A     47859      C                                    1297MiB |
|    1   N/A  N/A     74282      C                                    1273MiB |
|    1   N/A  N/A     90599      C                                    7397MiB |
+-----------------------------------------------------------------------------+

当我尝试终止它时,出现错误“没有这样的进程”:

>>> kill -9 16158
-bash: kill: (16158) - No such process

并且ps -p PID也无法检测到该进程:

>>> ps -p 583
 PID TTY          TIME CMD

我怎样才能释放这段记忆?这个问题已经持续了几个星期,今天导致了 OOM 问题。

相关内容