Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resuming Training #4

Open
snrazavi opened this issue Feb 26, 2017 · 1 comment
Open

Resuming Training #4

snrazavi opened this issue Feb 26, 2017 · 1 comment

Comments

@snrazavi
Copy link

snrazavi commented Feb 26, 2017

Hello,
I am trying to train a seq-att model for translation, but always after 3 or 4 epochs, the training process stops unexpectedly (Kill). Moreover, when I try to resume the training using --model_in option I recieve out of memory error and this is regardless of how much GPU memory I use using --dynet-mem. I have a GTX 980 GPU with 8 Giga bytes of graphics RAM. Also I should add that the memory problem is more critical and severe when I try to use other optimization methods such as adam.
Many thanks in advance.

@neubig
Copy link
Owner

neubig commented Feb 26, 2017

This is two problems I think.

  1. The training process is stopping unexpectedly. I'll need more details about the exact error message you get then to try to debug this.
  2. When re-loading models you're running out of memory. Unfortunately this is a known bug in DyNet: reallocate memory when dynet_load_model cause double memory consumption. clab/dynet#110 and will probably not be fixed until this bug is fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants