-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: adds support for detached commits #3028
Conversation
My original approach was much more complicated and used UUIDs as the version. However, if we keep the version as a u64 but borrow the most significant bit to flag detached vs. normal then we end up with much fewer changes and less overall complexity. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3028 +/- ##
==========================================
- Coverage 78.38% 78.26% -0.12%
==========================================
Files 240 240
Lines 77122 77284 +162
Branches 77122 77284 +162
==========================================
+ Hits 60449 60488 +39
- Misses 13565 13697 +132
+ Partials 3108 3099 -9
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems reasonable. Is there any special considerations we need to make for clean up?
// Detached versions should never show up first in a list operation which | ||
// means it needs to come lexicographically after all attached manifest | ||
// files and so we add the prefix `d`. There is no need to invert the | ||
// version number since detached versions are not part of the version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
I'm planning on tackling cleanup later. My thinking is that cleanup will be something along the lines of:
For the "detached versions are just temporary versions" case then For the "this is a secondary database and everything is detached" case then cleanup will be triggered by a cleanup of the primary database. After the cleanup of the primary database we will scan all remaining versions (in the primary database) and collect which secondary versions are still referenced. These will be passed in as |
bb46d90
to
78517ff
Compare
Failures are due to known issue with Ray (#3042). Merging |
A detached commit is a commit that is not part of the regular dataset lineage. It will never show up as the latest commit and is completely separate from the linear history of the dataset.
This can be useful for:
Closes #2889