-
Notifications
You must be signed in to change notification settings - Fork 27.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding gradient checkpointing to GPT2 #7446
Adding gradient checkpointing to GPT2 #7446
Conversation
Codecov Report
@@ Coverage Diff @@
## master #7446 +/- ##
==========================================
- Coverage 80.98% 79.06% -1.93%
==========================================
Files 181 181
Lines 35750 35757 +7
==========================================
- Hits 28953 28271 -682
- Misses 6797 7486 +689
Continue to review full report at Codecov.
|
use_cache=use_cache, | ||
output_attentions=output_attentions, | ||
) | ||
if getattr(self.config, "gradient_checkpointing", False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if self.config.gradient_checkpointing:
is nicer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most model config
s don't actually have this attribute, only the ones that support checkpointing (AFAIK, Bert and Longformer for now) so it's less risky to do things this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is in modeling_gpt2.py
which only works with configuration_gpt2.py
. So if you add gradient_checkpointing
to the config with default = False
I don't see why this would be risky
@@ -355,6 +365,10 @@ def test_gpt2_double_lm_head_model(self): | |||
config_and_inputs = self.model_tester.prepare_config_and_inputs() | |||
self.model_tester.create_and_check_double_lm_head_model(*config_and_inputs) | |||
|
|||
def test_gpt2_gradient_checkpointing(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome that you add a test!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Left mostly nits.
Would be great if you could run RUN_SLOW=1 pytest tests/test_modeling_gpt2.py
once to be sure.
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
The slow tests are passing - I've also added a test for generation with checkpointing, although of course to be sure, one should also check the contents of the backwards pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! Thanks a lot for adding this.
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for working on it @TevenLeScao!
This PR adds gradient checkpointing capabilities to GPT-2, imitating the Longformer and Bert checkpointing code. It also disables
find_unused_parameters
in Trainer if the model is using gradient checkpointing, as per #4659 they are incompatible.