如何摆脱 CUDA 内存不足的问题而无需重启机器?

如何摆脱 CUDA 内存不足的问题而无需重启机器?

Ubuntu 20.04 中是否有一种破解方法可以消除以下 CUDA 内存不足错误,而无需重新启动机器?

RuntimeError:CUDA 内存不足。尝试分配 40.00 MiB(GPU 0;总容量 7.80 GiB;已分配 6.34 GiB;32.44 MiB 可用;PyTorch 总共保留了 6.54 GiB)

我知道下面的方法有效,但同时也会杀死我的 Jupyter 笔记本。有没有办法释放 GPU 中的内存而不必杀死 Jupyter 笔记本?

(base) mona@mona:~/research/facial_landmark$ nvidia-smi
Tue Oct  6 20:28:05 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06    Driver Version: 450.51.06    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 2070    Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   47C    P8     9W /  N/A |   7883MiB /  7982MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1306      G   /usr/lib/xorg/Xorg                255MiB |
|    0   N/A  N/A      1743      G   /usr/bin/gnome-shell              151MiB |
|    0   N/A  N/A      3273      G   /usr/lib/firefox/firefox            2MiB |
|    0   N/A  N/A      3359      G   /usr/lib/firefox/firefox            2MiB |
|    0   N/A  N/A      3844      G   /usr/lib/firefox/firefox            2MiB |
|    0   N/A  N/A      4222      G   /usr/lib/firefox/firefox            2MiB |
|    0   N/A  N/A      4587      C   ...mona/anaconda3/bin/python     7459MiB |
+-----------------------------------------------------------------------------+
(base) mona@mona:~/research/facial_landmark$ kill -9  4587
(base) mona@mona:~/research/facial_landmark$ nvidia-smi
Tue Oct  6 20:28:24 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06    Driver Version: 450.51.06    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 2070    Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   47C    P8     9W /  N/A |    433MiB /  7982MiB |      4%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1306      G   /usr/lib/xorg/Xorg                255MiB |
|    0   N/A  N/A      1743      G   /usr/bin/gnome-shell              152MiB |
|    0   N/A  N/A      3273      G   /usr/lib/firefox/firefox            2MiB |
|    0   N/A  N/A      3359      G   /usr/lib/firefox/firefox            2MiB |
|    0   N/A  N/A      3844      G   /usr/lib/firefox/firefox            2MiB |
|    0   N/A  N/A      4222      G   /usr/lib/firefox/firefox            2MiB |
+-----------------------------------------------------------------------------+
(base) mona@mona:~/research/facial_landmark$ 

答案1

你可以尝试使用torch.cuda.empty_cache(),因为 PyTorch 是占用 CUDA 内存的那个。

相关内容