Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allowing GPU memory growth command does not work #11584

Closed
Schiboni opened this issue Nov 5, 2018 · 15 comments
Closed

Allowing GPU memory growth command does not work #11584

Schiboni opened this issue Nov 5, 2018 · 15 comments
Assignees
Labels
stat:awaiting keras-eng Awaiting response from Keras engineer type:support User is asking for help / asking an implementation question. Stackoverflow would be better suited.

Comments

@Schiboni
Copy link

Schiboni commented Nov 5, 2018

Hi, i have a memory problem.
I am running a training on a server. I have the following print out.

2018-11-05 21:08:07.907464: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:895] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-11-05 21:08:07.908090: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties:
name: GeForce GTX 980 major: 5 minor: 2 memoryClockRate(GHz): 1.2405
pciBusID: 0000:02:00.0
totalMemory: 3.95GiB freeMemory: 3.87GiB
2018-11-05 21:08:07.908116: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 980, pci bus id: 0000:02:00.0, compute capability: 5.2)

As you can see the total memory is higher than the free memory. Actually, running the code i get an "Out of memory" message.
So i applied the wrote code at the beginning of my script:
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.3, allow_growth=True)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))

Unfortunately the memory usage:
totalMemory: 3.95GiB freeMemory: 3.87GiB

does not change at all. What is the problem?

Thanks and best regards,
Giovanni

@gabrieldemarmiesse gabrieldemarmiesse added type:support User is asking for help / asking an implementation question. Stackoverflow would be better suited. type: TensorFlow labels Nov 5, 2018
@Harshini-Gadige Harshini-Gadige added the stat:awaiting keras-eng Awaiting response from Keras engineer label Nov 12, 2018
@omalleyt12
Copy link
Contributor

Can you please try the following at the top of your code:

import keras
gpu_options = tf.GPUOptions(allow_growth=True)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
keras.backend.tensorflow_backend.set_session(sess)

@kevupton
Copy link

kevupton commented Nov 14, 2018

Can confirm this is.

Keras v2.2.4

I am using:

    config = tf.ConfigProto(allow_soft_placement=True)
    config.gpu_options.per_process_gpu_memory_fraction = 0.9
    config.gpu_options.allow_growth = True
    sess = tf.Session(config=config)
    K.set_session(sess)

Also the per_process_gpu_memory_fraction doesnt work if the allow_growth option is True.

Errors in:

Limit:                  7730941132
InUse:                  3328523776
MaxInUse:               3328523776
NumAllocs:                      55
MaxAllocSize:           3315597312

If I remove allow_growth then the per_process_gpu_memory_fraction now works

Also @omalleyt12 that solution, I just tested and it didnt work for me.

@Schiboni
Copy link
Author

@kevupton The code below does not work for me

from keras import backend as K
import tensorflow as tf
config = tf.ConfigProto(allow_soft_placement=True)
config.gpu_options.per_process_gpu_memory_fraction = 0.2
# config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
K.set_session(sess)

@kevupton
Copy link

kevupton commented Nov 14, 2018

@Schiboni
The per_process_gpu_memory_fraction property determines how much gpu memory percentage you will use. Are you wanting to make the memory small, then 0.2 would be ideal, otherwise the larger the better right ? haha
Did you try use a larger number such as 0.9 instead of 0.2. This way you are utilizing 90% instead of 20%

config.gpu_options.per_process_gpu_memory_fraction = 0.9

@Schiboni
Copy link
Author

@kevupton
My main problem is that if i set use_multiprocessing=True while using fit_generator and keras.utils.Sequence the code get stuck and the gpu activity remains at 0%. No errors are shown.
So i am guess i have an out_of_memory problem in one of the workers or something like this. But i am just guessing. I have no actual idea.

@kevupton
Copy link

hmmm I think I had a stuck scenario one time, but I cannot remember how I fixed it. How long is it stuck for, before you retry ?
What does your model code look like ?

@Schiboni
Copy link
Author

It can be stuck uninterruptedly for hours, no upper bound.
"What does your model code look like ?" What do you mean?

@kevupton
Copy link

are you compiling your own model ? Like model.compile() somewhere ?

@Schiboni
Copy link
Author

Yes, of course:

print("[INFO] training with {} GPUs...".format(ngpus))
with tf.device("/cpu:0"):
model = build_model(x_shape, class_number, filters, lstm_dims, regularization_rate)
model = multi_gpu_model(model, gpus=ngpus)
model.compile(loss='categorical_crossentropy',
optimizer=RMSprop(),
metrics=['accuracy'])

@Fordacre
Copy link

@omalleyt12 I have this problem too. It seems gpu_options.allow_growth doesn't work together with gpu_options.per_process_gpu_memory_fraction. Here is my code:
tf_config = tf.ConfigProto() tf_config.gpu_options.allow_growth = True tf_config.log_device_placement = False tf_config.allow_soft_placement = True tf_config.gpu_options.per_process_gpu_memory_fraction = 0.9 sess = tf.Session(config=tf_config) set_session(sess)
Is there something wrong in my code? Can you please help me to fix it out, thanks.

@saysx
Copy link

saysx commented Mar 20, 2019

Does anybody solve this problem??

@buivancuong
Copy link

You can try clear_session() with from keras.backend import clear_session before loading your model and after training data.

@duplessisaa
Copy link

duplessisaa commented Jan 28, 2020

@buivancuong Thanks for the suggstion, I tried this too...also not working for me:

    sess = tf.keras.backend.get_session()
    tf.keras.backend.clear_session()
    sess.close()
    sess = tf.keras.backend.get_session()
# GPU allow-growth
config = tf.ConfigProto()
config.gpu_options.allow_growth = True  # dynamically grow the memory used on the GPU
config.log_device_placement = True         # to log device placement
sess = tf.Session(config=config) 
set_session(sess)                                        # set this TF session as the default session for Keras

@vasilevskykv
Copy link

vasilevskykv commented Oct 23, 2020

Hello! I have the same problem

`config=tf.compat.v1.ConfigProto(log_device_placement=True)
config.gpu_options.visible_device_list='0'
config.gpu_options.allow_growth = True
config.gpu_options.per_process_gpu_memory_fraction = 0.9
tf.compat.v1.enable_eager_execution()
tf.compat.v1.reset_default_graph()
with tf.compat.v1.Session(config=config) as sess:
x_train_multi, y_train_multi = multivariate_data(dataset, dataset[:, 1], 0,
TRAIN_SPLIT, past_history,
future_target, STEP)

a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
c = tf.matmul(a, b)
#print(sess.run(x_train_multi, y_train_multi))
print(sess.run(x_train_multi, y_train_multi))
sess.close()`

As a result: 40% CPU, 97% Physical memory and 2% GPU

@sd3ntato
Copy link

why is the issue closed if the problem is unsolved??

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stat:awaiting keras-eng Awaiting response from Keras engineer type:support User is asking for help / asking an implementation question. Stackoverflow would be better suited.
Projects
None yet
Development

No branches or pull requests