-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for .tar.zst to http_archive #10342
Comments
@philwo / @laurentlb can this be targeted anytime soon? Adding zstd support for extraction / packing is pretty low hanging fruit. Think about it from the point of view of:
Also noting: zstd requires much less cpu to decompress or compress than gzip. This will help all users assuming they switch to zstd assets and are using parallel jobs. |
This would be very nice to have in particular for situations which require large downloads, such as, for example, cross-platform builds that pull in massive toochains. I'm working with such a build at the moment, and decompression perf leaves much to be desired even with gzip. |
What could prevent adoption of zstd is the dependency on |
cc @meisterT Do you know how big is the dependency? We'd like to avoid significant increases in Bazel binary size. |
clocking in at 434k, which is inline with the zstd binary https://search.maven.org/artifact/com.github.luben/zstd-jni/1.4.4-9/jar . apache commons supports it so it looks like it could fall in to similar handling code wise |
Parallel compression makes me worry about reproducibility. Is that a guarantee? Also, let's decouple the using and producing conversations. |
Yes it is well worth pointing out that step one is support for extract/downloadAndExtract to allow for zstd. I don't think anything else has to use the jni implementation and other usage on the compressor side are completely seperate from this issue. Merely wanted to let know that getting parallel compression as one driving nice to have and think of that at the skylark/genrule level for implementation for easy -T specification unless the compressor really should live in bazel java land ( recalling docker rules calling gzip directly...). Given that it is block based compression, it should be perfectly reproducable/deterministic output as you div your bytestring up in blocks then process them, looks like it is now at least: facebook/zstd#1077 |
Is it perhaps cheaper to only include the decompression part of the library? |
@meisterT maybe, little off the beaten path. Are there any assets you can recompress with zstd to net break even or better? w/ libzstd compression enabled:
-Os
Setting ZSTD_LIB_COMPRESSION=0 and building,
-Os:
I'm not sure what you would do jni side though for this. You may have to trim the library or make sure there's stubs available. I think a final stripped result is also smaller since the zstd binary and precompiled jni releseases are clocking in at the sizes mentioned previously. |
I just started working on making @gjasny change compile for other platforms on newest codebase, will try to make it work in following days: #11968 @gjasny would you be able to add a comment with "@googlebot I consent" to the above PR so that it can include your commits? |
Here's what I did for this: https://github.com/1e100/cloud_archive This does require zstd support for |
This is mostly a rebase of bazelbuild#11968 with a few tweaks of my own. Fixes bazelbuild#10342. Co-authored-by: Grzegorz Lukasik <glukasik@nuro.ai> RELNOTES[new]: `repository_ctx.extract`, and thus `http_archive`, can now decompress zstandard-compressed archives.
This is mostly a rebase of bazelbuild#11968 with a few tweaks of my own. Fixes bazelbuild#10342. Co-authored-by: Grzegorz Lukasik <glukasik@nuro.ai> RELNOTES[new]: `repository_ctx.extract`, and thus `http_archive`, can now decompress zstandard-compressed archives.
@bazel-io fork 5.1 |
This is mostly a rebase of bazelbuild#11968 with a few tweaks of my own. Fixes bazelbuild#10342. Co-authored-by: Grzegorz Lukasik <glukasik@nuro.ai> RELNOTES[new]: `repository_ctx.extract`, and thus `http_archive`, can now decompress zstandard-compressed archives. Closes bazelbuild#15026. PiperOrigin-RevId: 436146779 (cherry picked from commit 00d74ff)
This is mostly a rebase of #11968 with a few tweaks of my own. Fixes #10342. Co-authored-by: Grzegorz Lukasik <glukasik@nuro.ai> RELNOTES[new]: `repository_ctx.extract`, and thus `http_archive`, can now decompress zstandard-compressed archives. Closes #15026. PiperOrigin-RevId: 436146779 (cherry picked from commit 00d74ff) Co-authored-by: Benjamin Peterson <benjamin@engflow.com>
In our project we're using repository rules to download and uncompress prebuilt C++ libraries. Due to the large archive size we were evaluating different compression algorithms. One interesting candidate (which is also supported by CMake) is Zstandard. Right now it is not available within Bazel.
The
Apache Commons Compress
now supports zstd through the zstd-jni library. Unfortunately it relies on a native library.I prepared a proof-of-concept branch with some shortcuts in gjasny/zstd-decompression and would like to ask if you'd consider adding support for Zstandard compression given the icky JNI implementation.
(With aircompressor there is native implementation available bit it lacks Big Endian and Stream support)
The text was updated successfully, but these errors were encountered: