我正在从本地计算机访问远程 Linux 服务器。远程服务器上没有运行任何脚本,但 GPU 内存仍被分配。 PS:这可能是由于某些崩溃造成的。
演出节目nvidia-smi
:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A100-PCI... Off | 00000000:31:00.0 Off | 0 |
| N/A 34C P0 42W / 250W | 19403MiB / 40960MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA A100-PCI... Off | 00000000:4B:00.0 Off | 0 |
| N/A 35C P0 59W / 250W | 10886MiB / 40960MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 583 C 1001MiB |
| 0 N/A N/A 16158 C 5065MiB |
| 0 N/A N/A 35103 C 1291MiB |
| 0 N/A N/A 46387 C 1337MiB |
| 0 N/A N/A 54860 C 1273MiB |
| 0 N/A N/A 71766 C 2077MiB |
| 0 N/A N/A 80967 C 4991MiB |
| 0 N/A N/A 83598 C 1071MiB |
| 0 N/A N/A 93077 C 1293MiB |
| 1 N/A N/A 583 C 917MiB |
| 1 N/A N/A 47859 C 1297MiB |
| 1 N/A N/A 74282 C 1273MiB |
| 1 N/A N/A 90599 C 7397MiB |
+-----------------------------------------------------------------------------+
当我尝试终止它时,出现错误“没有这样的进程”:
>>> kill -9 16158
-bash: kill: (16158) - No such process
并且ps -p PID
也无法检测到该进程:
>>> ps -p 583
PID TTY TIME CMD
我怎样才能释放这段记忆?这个问题已经持续了几个星期,今天导致了 OOM 问题。