Allowing GPU memory growth command does not work #11584

Schiboni · 2018-11-05T20:42:30Z

Hi, i have a memory problem.
I am running a training on a server. I have the following print out.

2018-11-05 21:08:07.907464: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:895] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-11-05 21:08:07.908090: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties:
name: GeForce GTX 980 major: 5 minor: 2 memoryClockRate(GHz): 1.2405
pciBusID: 0000:02:00.0
totalMemory: 3.95GiB freeMemory: 3.87GiB
2018-11-05 21:08:07.908116: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 980, pci bus id: 0000:02:00.0, compute capability: 5.2)

As you can see the total memory is higher than the free memory. Actually, running the code i get an "Out of memory" message.
So i applied the wrote code at the beginning of my script:
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.3, allow_growth=True)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))

Unfortunately the memory usage:
totalMemory: 3.95GiB freeMemory: 3.87GiB

does not change at all. What is the problem?

Thanks and best regards,
Giovanni

omalleyt12 · 2018-11-12T22:35:48Z

Can you please try the following at the top of your code:

import keras
gpu_options = tf.GPUOptions(allow_growth=True)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
keras.backend.tensorflow_backend.set_session(sess)

kevupton · 2018-11-14T02:23:19Z

Can confirm this is.

Keras v2.2.4

I am using:

    config = tf.ConfigProto(allow_soft_placement=True)
    config.gpu_options.per_process_gpu_memory_fraction = 0.9
    config.gpu_options.allow_growth = True
    sess = tf.Session(config=config)
    K.set_session(sess)

Also the per_process_gpu_memory_fraction doesnt work if the allow_growth option is True.

Errors in:

Limit:                  7730941132
InUse:                  3328523776
MaxInUse:               3328523776
NumAllocs:                      55
MaxAllocSize:           3315597312

If I remove allow_growth then the per_process_gpu_memory_fraction now works

Also @omalleyt12 that solution, I just tested and it didnt work for me.

Schiboni · 2018-11-14T10:11:49Z

@kevupton The code below does not work for me

from keras import backend as K
import tensorflow as tf
config = tf.ConfigProto(allow_soft_placement=True)
config.gpu_options.per_process_gpu_memory_fraction = 0.2
# config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
K.set_session(sess)

kevupton · 2018-11-14T10:31:54Z

@Schiboni
The per_process_gpu_memory_fraction property determines how much gpu memory percentage you will use. Are you wanting to make the memory small, then 0.2 would be ideal, otherwise the larger the better right ? haha
Did you try use a larger number such as 0.9 instead of 0.2. This way you are utilizing 90% instead of 20%

config.gpu_options.per_process_gpu_memory_fraction = 0.9

Schiboni · 2018-11-14T10:47:51Z

@kevupton
My main problem is that if i set use_multiprocessing=True while using fit_generator and keras.utils.Sequence the code get stuck and the gpu activity remains at 0%. No errors are shown.
So i am guess i have an out_of_memory problem in one of the workers or something like this. But i am just guessing. I have no actual idea.

kevupton · 2018-11-14T12:30:01Z

hmmm I think I had a stuck scenario one time, but I cannot remember how I fixed it. How long is it stuck for, before you retry ?
What does your model code look like ?

Schiboni · 2018-11-14T12:56:01Z

It can be stuck uninterruptedly for hours, no upper bound.
"What does your model code look like ?" What do you mean?

kevupton · 2018-11-14T13:06:41Z

are you compiling your own model ? Like model.compile() somewhere ?

Schiboni · 2018-11-14T15:30:08Z

Yes, of course:

print("[INFO] training with {} GPUs...".format(ngpus))
with tf.device("/cpu:0"):
model = build_model(x_shape, class_number, filters, lstm_dims, regularization_rate)
model = multi_gpu_model(model, gpus=ngpus)
model.compile(loss='categorical_crossentropy',
optimizer=RMSprop(),
metrics=['accuracy'])

Fordacre · 2019-02-18T04:52:23Z

@omalleyt12 I have this problem too. It seems gpu_options.allow_growth doesn't work together with gpu_options.per_process_gpu_memory_fraction. Here is my code:
tf_config = tf.ConfigProto() tf_config.gpu_options.allow_growth = True tf_config.log_device_placement = False tf_config.allow_soft_placement = True tf_config.gpu_options.per_process_gpu_memory_fraction = 0.9 sess = tf.Session(config=tf_config) set_session(sess)
Is there something wrong in my code? Can you please help me to fix it out, thanks.

saysx · 2019-03-20T04:09:49Z

Does anybody solve this problem??

buivancuong · 2019-12-30T02:31:05Z

You can try clear_session() with from keras.backend import clear_session before loading your model and after training data.

duplessisaa · 2020-01-28T07:51:43Z

@buivancuong Thanks for the suggstion, I tried this too...also not working for me:

    sess = tf.keras.backend.get_session()
    tf.keras.backend.clear_session()
    sess.close()
    sess = tf.keras.backend.get_session()

# GPU allow-growth

config = tf.ConfigProto()
config.gpu_options.allow_growth = True  # dynamically grow the memory used on the GPU
config.log_device_placement = True         # to log device placement
sess = tf.Session(config=config) 
set_session(sess)                                        # set this TF session as the default session for Keras

vasilevskykv · 2020-10-23T07:54:33Z

Hello! I have the same problem

`config=tf.compat.v1.ConfigProto(log_device_placement=True)
config.gpu_options.visible_device_list='0'
config.gpu_options.allow_growth = True
config.gpu_options.per_process_gpu_memory_fraction = 0.9
tf.compat.v1.enable_eager_execution()
tf.compat.v1.reset_default_graph()
with tf.compat.v1.Session(config=config) as sess:
x_train_multi, y_train_multi = multivariate_data(dataset, dataset[:, 1], 0,
TRAIN_SPLIT, past_history,
future_target, STEP)

a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
c = tf.matmul(a, b)
#print(sess.run(x_train_multi, y_train_multi))
print(sess.run(x_train_multi, y_train_multi))
sess.close()`

As a result: 40% CPU, 97% Physical memory and 2% GPU

sd3ntato · 2022-10-20T13:14:57Z

why is the issue closed if the problem is unsolved??

gabrieldemarmiesse added type:support User is asking for help / asking an implementation question. Stackoverflow would be better suited. type: TensorFlow labels Nov 5, 2018

Harshini-Gadige assigned omalleyt12 Nov 12, 2018

Harshini-Gadige added the stat:awaiting keras-eng Awaiting response from Keras engineer label Nov 12, 2018

fchollet removed the backend:tensorflow label Jun 16, 2021

fchollet closed this as completed Jun 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allowing GPU memory growth command does not work #11584

Allowing GPU memory growth command does not work #11584

Schiboni commented Nov 5, 2018

omalleyt12 commented Nov 12, 2018

kevupton commented Nov 14, 2018 •

edited

Loading

Schiboni commented Nov 14, 2018

kevupton commented Nov 14, 2018 •

edited

Loading

Schiboni commented Nov 14, 2018

kevupton commented Nov 14, 2018

Schiboni commented Nov 14, 2018

kevupton commented Nov 14, 2018

Schiboni commented Nov 14, 2018

Fordacre commented Feb 18, 2019

saysx commented Mar 20, 2019

buivancuong commented Dec 30, 2019

duplessisaa commented Jan 28, 2020 •

edited

Loading

vasilevskykv commented Oct 23, 2020 •

edited

Loading

sd3ntato commented Oct 20, 2022

Allowing GPU memory growth command does not work #11584

Allowing GPU memory growth command does not work #11584

Comments

Schiboni commented Nov 5, 2018

omalleyt12 commented Nov 12, 2018

kevupton commented Nov 14, 2018 • edited Loading

Schiboni commented Nov 14, 2018

kevupton commented Nov 14, 2018 • edited Loading

Schiboni commented Nov 14, 2018

kevupton commented Nov 14, 2018

Schiboni commented Nov 14, 2018

kevupton commented Nov 14, 2018

Schiboni commented Nov 14, 2018

Fordacre commented Feb 18, 2019

saysx commented Mar 20, 2019

buivancuong commented Dec 30, 2019

duplessisaa commented Jan 28, 2020 • edited Loading

vasilevskykv commented Oct 23, 2020 • edited Loading

sd3ntato commented Oct 20, 2022

kevupton commented Nov 14, 2018 •

edited

Loading

kevupton commented Nov 14, 2018 •

edited

Loading

duplessisaa commented Jan 28, 2020 •

edited

Loading

vasilevskykv commented Oct 23, 2020 •

edited

Loading