-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement byte range requests for chunks and metadata through zarr interface [EAR-1274] #63
Conversation
EAR-1274 Implement correct `get` behavior when a byte range is supplied
Currently we fake it by just truncating the full response. This should be propagated to storage and metadata layers |
icechunk/src/zarr.rs
Outdated
NodeData::Array(zarr_metadata, _) => { | ||
Ok(ArrayMetadata::new(user_attributes, zarr_metadata).to_bytes()) | ||
} | ||
}?; | ||
|
||
if let Some(range) = Into::<Option<Range<usize>>>::into(range) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this sufficient for metadata? Much harder to control byte ranges for these...
Why is this needed? For sharding? If so, we need to discuss whether to prioritize sharding right now. In my view, Icechunk makes the existing Zarr sharding approach obsolete. |
compatability with |
icechunk/src/format/mod.rs
Outdated
@@ -89,6 +95,58 @@ pub struct ChunkIndices(pub Vec<u64>); | |||
pub type ChunkOffset = u64; | |||
pub type ChunkLength = u64; | |||
|
|||
#[derive(Debug, Clone, PartialEq, Eq)] | |||
pub enum ByteRange { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we create our own type? Why not type ByteRange = (Bound<ChunkOffset>, Bound<ChunkOffset>)
. I learned about Bound
recently, and it seems to have all you need
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for creating our own type is to be able to impl conversions and traits for it. Otherwise it is foreign and the glue code is uglier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, maybe let's use the newtype pattern:
struct ByteRange(Bound<ChunkOffset>, Bound<ChunkOffset>)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im ok with that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you prefer Option<ByteRange>
over ByteRange::All?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose ByteRange(Bound::Unbounded, Bound::Unbounded)
is easier to work with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so ByteRange::ALL
could be that constant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A warning, it is ugly though, because ChunkOffset
is u64
and not usize
. So it make implementing into Range<usize>
and RangeBounds<usize>
difficult.
First pass at implementing an interface for this. Not permanently attached to the abstraction