1

On my Windows 10, if I directly create a GPU tensor, I can successfully release its memory.

import torch
a = torch.zeros(300000000, dtype=torch.int8, device='cuda')
del a
torch.cuda.empty_cache()

But if I create a normal tensor and convert it to GPU tensor, I can no longer release its memory.

import torch
a = torch.zeros(300000000, dtype=torch.int8)
a.cuda()
del a
torch.cuda.empty_cache()

Why this is happening.

talonmies
  • 68,743
  • 34
  • 184
  • 258
John
  • 1,647
  • 1
  • 20
  • 46

4 Answers4

2

At least in Ubuntu, your script does not release memory when it is run in the interactive shell and works as expected when running as a script. I think there are some reference issues in the in-place call. The following will work in both the interactive shell and as a script.

import torch
a = torch.zeros(300000000, dtype=torch.int8)
a = a.cuda()
del a
torch.cuda.empty_cache()
hkchengrex
  • 3,260
  • 19
  • 28
1

Yes, this also happens on my pc with following configurations:

  • 20.04.1-Ubuntu
  • 1.7.1+cu110

According to information from fastai discussion:https://forums.fast.ai/t/gpu-memory-not-being-freed-after-training-is-over/10265/8

This is related to the python garbage collector in ipython environment.

def pretty_size(size):
    """Pretty prints a torch.Size object"""
    assert(isinstance(size, torch.Size))
    return " × ".join(map(str, size))

def dump_tensors(gpu_only=True):
    """Prints a list of the Tensors being tracked by the garbage collector."""
    import gc
    total_size = 0
    for obj in gc.get_objects():
        try:
            if torch.is_tensor(obj):
                if not gpu_only or obj.is_cuda:
                    print("%s:%s%s %s" % (type(obj).__name__, 
                                          " GPU" if obj.is_cuda else "",
                                          " pinned" if obj.is_pinned else "",
                                          pretty_size(obj.size())))
                    total_size += obj.numel()
            elif hasattr(obj, "data") and torch.is_tensor(obj.data):
                if not gpu_only or obj.is_cuda:
                    print("%s → %s:%s%s%s%s %s" % (type(obj).__name__, 
                                                   type(obj.data).__name__, 
                                                   " GPU" if obj.is_cuda else "",
                                                   " pinned" if obj.data.is_pinned else "",
                                                   " grad" if obj.requires_grad else "", 
                                                   " volatile" if obj.volatile else "",
                                                   pretty_size(obj.data.size())))
                    total_size += obj.data.numel()
        except Exception as e:
            pass        
    print("Total size:", total_size)

if I do something like

import torch as th
a = th.randn(10, 1000, 1000)
aa = a.cuda()
del aa
th.cuda.empty_cache()

you will not see any decrease in nvidia-smi/nvtop. But you can find out what is happening using handy function

dump_tensors()

and you may observe following informations:

Tensor: GPU pinned 10 × 1000 × 1000
Total size: 10000000

That means your gc still holds the resources.

One may refer to more discussions for python gc mechanism.

  1. Force garbage collection in Python to free memory
wstcegg
  • 51
  • 5
1

I meet the same issue. Solution:

cuda = torch.device('cuda')
a.to(cuda)
Hey TV
  • 11
  • 1
-1

You should not use torch.cuda.empty_cache() as it it will slow down your code for no gain https://discuss.pytorch.org/t/what-is-torch-cuda-empty-cache-do-and-where-should-i-add-it/40975

Igor
  • 104
  • 2
  • 6