How to add report_tensor_allocations_upon_oom to RunOptions in Keras

Question

I'm trying to train a neural net on a GPU using Keras and am getting a "Resource exhausted: OOM when allocating tensor" error. The specific tensor it's trying to allocate isn't very big, so I assume some previous tensor consumed almost all the VRAM. The error message comes with a hint that suggests this:

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

That sounds good, but how do I do it? RunOptions appears to be a Tensorflow thing, and what little documentation I can find for it associates it with a "session". I'm using Keras, so Tensorflow is hidden under a layer of abstraction and its sessions under another layer below that.

How do I dig underneath everything to set this option in such a way that it will take effect?

score 18 · Accepted Answer · edited Oct 02 '21 at 17:39

18

TF1 solution:

Its not as hard as it seems, what you need to know is that according to the documentation, the **kwargs parameter passed to model.compile will be passed to session.run

So you can do something like:

import tensorflow as tf
run_opts = tf.RunOptions(report_tensor_allocations_upon_oom = True)

model.compile(loss = "...", optimizer = "...", metrics = "..", options = run_opts)

And it should be passed directly each time session.run is called.

TF2:

The solution above works only for tf1. For tf2, unfortunately, it appears there is no easy solution yet.

edited Oct 02 '21 at 17:39

Manuel Popp

725
1
4
23

answered Apr 05 '18 at 14:47

Dr. Snoopy

52,950
7
111
130

1

I used options=run_opts, since it's a kwargs thing, and that worked – dspeyer Apr 07 '18 at 05:52
6

@Matias Valdenegro I get `ValueError: ('Some keys in session_kwargs are not supported at this time: %s', dict_keys(['options']))`. Any idea what I'm doing wrong? – Amila Jul 22 '18 at 09:58
I had this exact same issue. Using keras version 2.2.4... is there any solution? – zwep Dec 10 '18 at 18:00
Also, I received this error 'Protocol message RunOptions has no "report_tensor_allocations_upon_oom" field.' – zwep Dec 11 '18 at 11:12
@zwep this was resolved in 2.2.4. Are you sure you've updated? – Dan Grahn Feb 27 '19 at 12:57
1

This caused me a segmentation fault for some reason: `[1] 3957 segmentation fault python oom_net.py` – Zaccharie Ramzi Jul 31 '19 at 09:55
Apparently I was not the only one with a segfault: https://github.com/keras-team/keras/issues/11322 – Zaccharie Ramzi Jul 31 '19 at 11:55

score 4 · Answer 2 · answered Aug 15 '18 at 14:44

4

Currently, it is not possible to add the options to model.compile. See: https://github.com/tensorflow/tensorflow/issues/19911

answered Aug 15 '18 at 14:44

Richard

456
3
6

3

While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. – Enea Dume Aug 15 '18 at 15:03

score 2 · Answer 3 · answered Oct 06 '20 at 17:35

2

OOM means out of memory. May be it is using more memory at that time. Decrease batch_size significantly. I set to 16, then it worked fine

answered Oct 06 '20 at 17:35

naam

382
4
9

Whether that will work, and what batch size is appropriate, will depend entirely on the model in question, as well as the dataset. If one is attempting to debug a memory issue that doesn't depend on batch size, this doesn't help at all. – Adam Azarchs Feb 24 '21 at 23:37

score 0 · Answer 4 · edited Feb 02 '22 at 13:38

Got the same error, but only in case, the training dataset was about the same as my GPU memory. For example, with 4 Gb video card memory I can train the model with the ~3,5 GB dataset. The workaround for me was to create the data_generator custom function, with yield, indices, and lookback. The other way I was suggested was to start learning true tensorflow framework and with tf.Session (example).

How to add report_tensor_allocations_upon_oom to RunOptions in Keras

4 Answers4

Linked