-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: reconnect block compression #2878
feat: reconnect block compression #2878
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2878 +/- ##
==========================================
- Coverage 77.98% 77.84% -0.15%
==========================================
Files 231 231
Lines 70643 70199 -444
Branches 70643 70199 -444
==========================================
- Hits 55090 54643 -447
- Misses 12424 12685 +261
+ Partials 3129 2871 -258
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably expose the compression level. Then users could tune the compression ratio to compression speed tradeoff.
let bytes_encoding = ProtobufUtils::flat_encoding( | ||
/*bits_per_value=*/ 8, | ||
bytes_buffer_index, | ||
self.compression_scheme.clone(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clippy
self.compression_scheme.clone(), | |
self.compression_scheme, |
@wjones127 ah, now we run into the question of how we should allow this kind of advanced configuration We could use a separate metadata key In general, I don't like exposing too much configuration to the user (which seems hypocritical to say since I think I've added half a dozen environment variables in the last month) without good reason. Why would a user choose something other than the default compression level? |
ff5c1e4
to
0f80146
Compare
@wjones127 @westonpace Zstd offers 22 compression levels, each yielding different compression ratios. The compressed data size can vary by up to 50% between levels 1 and 22, depending on data distribution (e.g., [1]). Some databases, like AWS Athena, allow users to specify the Zstd compression level via DDL [2]. Similarly, it would be beneficial if Lance provided the ability to customize compression settings, enabling users to balance time and space efficiency based on their use cases. Thanks. [1] https://www.reddit.com/r/compression/comments/18e524n/zstd_compression_ratios_by_level/ |
I'm for it, if we can find a sensible place to put it. |
I'm ok with exposing it. I'd prefer a entry in the field metadata like |
This allows block compression (zstd) in the narrow case of using it for binary data. A more generalized approach to block compression can be handled as part of 2.1.