-
Notifications
You must be signed in to change notification settings - Fork 433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ckpt-rewr] Get Optim State Dict Util API #3299
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eracah is this a refactor or is it adding anything new anywhere? makes it a bit easier to review if I know what parts I need to carefully read through. It seems mostly copy paste but as helper fn?
a lot if it is a refactor, but it adds ignore, include, precision, and explicit cpu_offload control |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
What does this PR do?
Adds an API for extracting optimizer state dict from a model and optimizer object.
State dict generation is a necessary operation before the save AND load of a checkpoint.
Currently in composer it is coupled with the State, and not very readable, hard to extend, hard to test, and hard for users to harness to do custom things. As such, we present a function to generate state_dict for the optimizer decoupled from State as a standalone function. By making an explicit function for the optimizer, it’s easier to test because we have a standalone function (we don’t have to make a dummy State function). Moreover, it’s easier to save each state dict as a separate file Also, an advanced user can just call these functions themselves if they have a custom, advanced script or callback.
This state dict generation function enables:
specify keys to includespecify keys to excludeThese are all options that will be useful for save and load. Because save and load require state dict generation, we need these options in state dict generation as well
GRT-2903