-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIR] Maintain checkpoint type information during serialization #28387
[AIR] Maintain checkpoint type information during serialization #28387
Conversation
Ah, I am a bit worried about this dict subclass. While it is fine for serialization, I don't think we should expose it to the user, as there are multiple points where it would become a regular dict again. Maybe we can have a reserved key instead of subclassing, or have it be purely internal and only used in setstate? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, just one doc nit
Can this be used beyond just the type of checkpoint?
And not only @krfricke @amogkam @bveeramani For reference, this is the PR: https://github.com/ray-project/ray/pull/28474/files |
we should raise an error here if We should prevent users from restoring a TensorflowCheckpoint as a TorchCheckpoint for example. |
@xwjiang2010 I don't think you can store arbitrary metadata like |
Thanks @bveeramani ! Yes, I am proposing to extend it. I think this is a reasonable developer expectation. Something like |
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com> Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com> Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @bveeramani! Looks great so far.
Primary things are:
- Let's add docstrings to all the Developer APIs
- Can we add more test coverage? In particular what is the type of the checkpoint if you save a
StubCheckpoint
but load back viaCheckpoint.from_*
? Also can we test that the metadata is saved even in the dict/bytes->dir and dir->dict/bytes workflows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @bveeramani- lgtm! just some minor comments on private vs. developer vs. public api based on our offline conversation
@bveeramani can you merge in master to fix the CI? |
failing tests are unrelated, going to merge |
…project#28387) These changes are needed to fix the errors described in ray-project#28134. Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu> Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com> Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Balaji Veeramani balaji@anyscale.com
Depends on:
Checkpoint.to_object_ref
andCheckpoint.from_object_ref
#28318Why are these changes needed?
These changes are needed to fix the errors described in #28134.
Related issue number
Closes #28134
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.