Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement byte range requests for chunks and metadata through zarr interface [EAR-1274] #63

Merged
merged 9 commits into from
Sep 16, 2024

Conversation

mpiannucci
Copy link
Contributor

First pass at implementing an interface for this. Not permanently attached to the abstraction

Copy link

linear bot commented Sep 16, 2024

EAR-1274 Implement correct `get` behavior when a byte range is supplied

Currently we fake it by just truncating the full response. This should be propagated to storage and metadata layers

NodeData::Array(zarr_metadata, _) => {
Ok(ArrayMetadata::new(user_attributes, zarr_metadata).to_bytes())
}
}?;

if let Some(range) = Into::<Option<Range<usize>>>::into(range) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this sufficient for metadata? Much harder to control byte ranges for these...

@mpiannucci mpiannucci marked this pull request as draft September 16, 2024 02:30
@rabernat
Copy link
Contributor

Why is this needed? For sharding?

If so, we need to discuss whether to prioritize sharding right now. In my view, Icechunk makes the existing Zarr sharding approach obsolete.

@mpiannucci
Copy link
Contributor Author

compatability with zarr-python. This was just done in a few hours last night becuase i wanted to fix the hack that was faking it. Not a priority.

@@ -89,6 +95,58 @@ pub struct ChunkIndices(pub Vec<u64>);
pub type ChunkOffset = u64;
pub type ChunkLength = u64;

#[derive(Debug, Clone, PartialEq, Eq)]
pub enum ByteRange {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we create our own type? Why not type ByteRange = (Bound<ChunkOffset>, Bound<ChunkOffset>). I learned about Bound recently, and it seems to have all you need

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for creating our own type is to be able to impl conversions and traits for it. Otherwise it is foreign and the glue code is uglier

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, maybe let's use the newtype pattern:

struct ByteRange(Bound<ChunkOffset>, Bound<ChunkOffset>)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im ok with that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you prefer Option<ByteRange> over ByteRange::All?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose ByteRange(Bound::Unbounded, Bound::Unbounded) is easier to work with.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so ByteRange::ALL could be that constant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A warning, it is ugly though, because ChunkOffset is u64 and not usize. So it make implementing into Range<usize> and RangeBounds<usize> difficult.

@mpiannucci mpiannucci marked this pull request as ready for review September 16, 2024 16:14
@mpiannucci mpiannucci requested a review from paraseba September 16, 2024 17:25
@mpiannucci mpiannucci requested a review from paraseba September 16, 2024 19:10
@mpiannucci mpiannucci merged commit a1a28f7 into main Sep 16, 2024
3 checks passed
@mpiannucci mpiannucci deleted the get-byte-range branch September 16, 2024 19:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants