6

Working on google colab. Using tf.keras and tensorflow version 2.3.0 I'm getting crazy because I can't use the model I've trained to run predictions with model.predict because it runs out of CPU RAM. I've been able to reproduce the issue with a very minimal example.

import numpy as np
import tensorflow as tf
from tensorflow.keras import backend as K
from tensorflow.keras.layers import Input,Conv2D, Activation

matrixSide = 512 #define a big enough matrix to give memory issues

inputL = Input([matrixSide,matrixSide,12]) #create a toy model
l1 = Conv2D(32,3,activation='relu',padding='same') (inputL) #120
l1 = Conv2D(64,1,activation='relu',padding='same')(l1)
l1 = Conv2D(64,3,activation='relu',padding='same')(l1)
l1 = Conv2D(1,1,padding='same')(l1)
l1 = Activation('linear')(l1)
model = Model(inputs= inputL,outputs = l1)


#run predictions
inImm = np.zeros((64,matrixSide,matrixSide,12))
for i in range (60):
  print(i)
  outImm = model.predict(inImm)
# K.clear_session() #somebody suggested it...

Basically, when working on GPU, it uses 3.0 GB of CPU RAM in the first 4 iterations,then it goes up to 7, then to 10 then it crashes because it exhausted all the available RAM! When running on CPU it lasts for more iterations, sometimes it even decreases the amount of RAM it's using from 9 GB back to 3 GB but in the end it still crashes after 20 or so iterations.

This previous example ( Keras predict loop memory leak using tf.data.Dataset but not with a numpy array ) had similar issues when using tf.data but not with numpy. Somebody suggested on github issues for tensorflow 1.14 to do a K.clear_session in each loop... but it doesn't help!

Any idea on how to fix this?

Poe Dator
  • 3,779
  • 2
  • 11
  • 31
user26067
  • 143
  • 1
  • 10

4 Answers4

6

This is my understanding after posting this as a bug to Tensorflow.

Changing the code to;

in_imm = np.zeros((64,matrix_side,matrix_side,12))
for i in range (60):
  print(i)
  tensor = tf.convert_to_tensor(in_imm, dtype=tf.float32)
  out_imm = model.predict(tensor)

Using tf.keras.Model.predict in a for loop with a numpy input creates a new graph every iteration because the numpy array is created with a different signature. Converting the numpy array to a tensor maintains the same signature and avoids creating new graphs.

Codey McCodeface
  • 2,898
  • 6
  • 27
  • 53
  • 1
    Thank you, I'll try this! By the way, is there a clear explanation somewhere of what these "graphs" actually are and how they behave? And if it was a graph problem, why didn't K.clear_sessions() work (but gc.collect() did)? – user26067 Nov 10 '20 at 10:22
3

I've found a fix for the memory leak. While K.clear_session() doesn't do anything in my case, adding a garbage collection after each call with _ = gc.collect() actually does the trick! The memory used actually is constant now and I can run as many prediction as I want.

user26067
  • 143
  • 1
  • 10
2

I solved the problem by using K.clear_session(). First of all you need to define a session before one can clear it. The purpose for this is explained in both of these, here and here.

config= tf.ConfigProto(log_device_placement=True) 
config.gpu_options.allow_growth = True
session = tf.Session(config=config)
K.set_session(session)

At first, using K.clear_session() in the loop results in an error after the first prediction. In my opinion, tf loses the connection to the model. For this reason, I create a new model within every run of the loop. This negativly effects the code's speed for the first multiple runs, however an accumulation of RAM storage is prevented.

The following code contains the suggested improvements:

import numpy as np
import tensorflow as tf
from tensorflow.keras import backend as K
from tensorflow.keras.layers import Input,Conv2D, Activation

matrixSide = 512 #define a big enough matrix to give memory issues

config = tf.ConfigProto(log_device_placement=True)
config.gpu_options.allow_growth = True
session = tf.Session(config=config)
K.set_session(session)

def create_model(matrixSide_v):
    inputL = Input([matrixSide_v,matrixSide_v,12]) #create a toy model
    l1 = Conv2D(32,3,activation='relu',padding='same') (inputL) #120
    l1 = Conv2D(64,1,activation='relu',padding='same')(l1)
    l1 = Conv2D(64,3,activation='relu',padding='same')(l1)
    l1 = Conv2D(1,1,padding='same')(l1)
    l1 = Activation('linear')(l1)
    c_model = Model(inputs= inputL,outputs = l1)
    return c_model

#run predictions
inImm = np.zeros((64,matrixSide,matrixSide,12))
for i in range(64):
    print(i)
    model = create_model(matrixSide)
    outImm = model.predict(inImm)
    K.clear_session()
  • In my use case I use a trained model to predict a very large number of samples. It doesn't make sense to load a model and clear session for each prediction, as it is both slow, and loading a model has also a documented memory-leak issue. Any suggestion how to apply your solution to my use case? – Itamar Katz Apr 21 '21 at 07:12
  • I agree, reloading the model each iteration will slow things up 10000 times and is not really a practical solution. – Jeremy Sep 05 '21 at 03:14
0

I'am using simple solution based on keras docs

For small amount of inputs that fit in one batch, directly using call() is recommended for faster execution, e.g., model(x), or model(x, training=False)

for filename in image_filenames:
  # read of data
  input = load_image(filename)

  # prediction
  output = model(input) # executes __call__() or call()

Using of __call__() or model(input) avoids memory leaks inside predict method which creates a data generator with one data item each time of execution and doesn't release memory.