-
Notifications
You must be signed in to change notification settings - Fork 356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cuda memory errors when running pytorch example #828
Comments
This is odd, given that it's GPU memory I'm not sure it's from LIT necessarily - in particular, LIT doesn't know about CUDA or the GPU at all, and that's entirely handled through the model code. If you just instantiate the model class and call See https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/notebooks/LIT_Components_Example.ipynb for an example notebook that shows how to use LIT components without starting a server. Another thing you could try is running the server with In terms of how the data is handled: |
Hi, Thanks for the quick reply. It does feel odd and I can confirm running outside of the server within straight python/jupyter notebook still runs into cuda out of memory: Using the below in a notebook as an example.
Annoyingly just tested on a google colab and it worked fine... Although the colab instance has 15gb vram vs my 11ish gb. It all seems to happen when model.cuda() is called under the hood of the predict_batch function. And the GPU memory usage inflates to around 11gb... vs the usual 1069 MiB. So I can now confirm it is not actally the mode.cuda() that is the issue, its that the dataset has been I guess preloaded onto the cuda device or something? If I call model.cuda() BEFORE creating the dataset the models gpu usage is normal. So I guess its whatever is happening to the dataset is the problem here. But as I mentioned before, batch size changes nothing and the cuda memory issues are being caused by the dataset creation/loading via:
Any further thoughts? Thanks |
When the following script: https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/simple_pytorch_demo.py I am getting cuda out of memory issues, regardless of max_batch_size or number of gpus used. I have access to 10 gpus with around 11gb vram each, so definitely should be fine.
I am running the code as it is on the repo, so won't paste here. But here is the error:
I have got this working fine with the standard lit-nlp demo which I presume is using tensorflow backend by default, but my own models / codebases will require pytorch.
Any thoughts on what may be causing this? I am not an expert on how lit-nlp is processing the data behind the scenes, but its occuring during the predict_minibatch() and I can confirm it doesn't get past passing the model and then the batch to cuda.
e.g. I added some debugging prints to check what was going on with:
I0812 14:55:26.451673 140234135095104 caching.py:210] Prepared 872 inputs for model
encoded input is: {'input_ids': tensor([[ 101, 2009, 1005, 1055, 1037, 11951, 1998, 2411, 12473, 4990,
1012, 102],
[ 101, 4895, 10258, 2378, 8450, 2135, 21657, 1998, 7143, 102,
0, 0]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0]])}
cuda avaialble!
E0812 14:55:26.461915 140234135095104 wsgi_app.py:208] Uncaught error: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Any thoughts would be much appreciated. The GPU environment I have can handle these models very easily ordinarily.
Thanks in advance!
The text was updated successfully, but these errors were encountered: