Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for .tar.zst to http_archive #10342

Closed
gjasny opened this issue Dec 1, 2019 · 13 comments
Closed

Add support for .tar.zst to http_archive #10342

gjasny opened this issue Dec 1, 2019 · 13 comments
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website type: feature request

Comments

@gjasny
Copy link
Contributor

gjasny commented Dec 1, 2019

In our project we're using repository rules to download and uncompress prebuilt C++ libraries. Due to the large archive size we were evaluating different compression algorithms. One interesting candidate (which is also supported by CMake) is Zstandard. Right now it is not available within Bazel.

The Apache Commons Compress now supports zstd through the zstd-jni library. Unfortunately it relies on a native library.

I prepared a proof-of-concept branch with some shortcuts in gjasny/zstd-decompression and would like to ask if you'd consider adding support for Zstandard compression given the icky JNI implementation.

(With aircompressor there is native implementation available bit it lacks Big Endian and Stream support)

@laurentlb laurentlb added P3 We're not considering working on this, but happy to review a PR. (No assignee) type: feature request and removed untriaged labels Mar 6, 2020
@nevion
Copy link

nevion commented Apr 24, 2020

@philwo / @laurentlb can this be targeted anytime soon?

Adding zstd support for extraction / packing is pretty low hanging fruit. Think about it from the point of view of:

  • parallel tar/zstd asset downloads+archiving
  • extraction resources used

Also noting:
-zstd has native thread support (-T#) , gzip / tar can also use parallel compressors/settings (e.g. pigz). tar extraction are often critical path issues for fresh builds, a frequent case in some settings.
-The bazel deterministic tar python script is too slow, please find another way to deterministically tar contents... like a patched binary build to always set times 0 or something.

zstd requires much less cpu to decompress or compress than gzip. This will help all users assuming they switch to zstd assets and are using parallel jobs.

@depthwise
Copy link

This would be very nice to have in particular for situations which require large downloads, such as, for example, cross-platform builds that pull in massive toochains. I'm working with such a build at the moment, and decompression perf leaves much to be desired even with gzip.

@gjasny
Copy link
Contributor Author

gjasny commented Apr 27, 2020

What could prevent adoption of zstd is the dependency on zstd-jni but that's for the Bazel maintainers to decide.

@laurentlb
Copy link
Contributor

cc @meisterT

Do you know how big is the dependency? We'd like to avoid significant increases in Bazel binary size.

@nevion
Copy link

nevion commented Apr 27, 2020

clocking in at 434k, which is inline with the zstd binary https://search.maven.org/artifact/com.github.luben/zstd-jni/1.4.4-9/jar . apache commons supports it so it looks like it could fall in to similar handling code wise

@aiuto
Copy link
Contributor

aiuto commented Apr 27, 2020

Parallel compression makes me worry about reproducibility. Is that a guarantee?

Also, let's decouple the using and producing conversations.
This is issue is about http_archive, so we only are talking about adding decompression support to Bazel. For having capability to build zstd objects from Bazel, please file a feature request in github.com/bazelbuild/rules_pkg.

@nevion
Copy link

nevion commented Apr 27, 2020

Yes it is well worth pointing out that step one is support for extract/downloadAndExtract to allow for zstd.

I don't think anything else has to use the jni implementation and other usage on the compressor side are completely seperate from this issue. Merely wanted to let know that getting parallel compression as one driving nice to have and think of that at the skylark/genrule level for implementation for easy -T specification unless the compressor really should live in bazel java land ( recalling docker rules calling gzip directly...). Given that it is block based compression, it should be perfectly reproducable/deterministic output as you div your bytestring up in blocks then process them, looks like it is now at least: facebook/zstd#1077

@meisterT
Copy link
Member

Is it perhaps cheaper to only include the decompression part of the library?

@nevion
Copy link

nevion commented Apr 28, 2020

@meisterT maybe, little off the beaten path. Are there any assets you can recompress with zstd to net break even or better?

w/ libzstd compression enabled:
-O3

1.1M    libzstd.a 
888K    libzstd.so.1.4.5

-Os

604K    libzstd.a
496K    libzstd.so.1.4.5

Setting ZSTD_LIB_COMPRESSION=0 and building,
-O3 (default):

380K    libzstd.a
308K    libzstd.so.1.4.5

-Os:

208K    libzstd.a
164K    libzstd.so.1.4.5

I'm not sure what you would do jni side though for this. You may have to trim the library or make sure there's stubs available. I think a final stripped result is also smaller since the zstd binary and precompiled jni releseases are clocking in at the sizes mentioned previously.

@meisterT
Copy link
Member

@nevion we have thought about it, see #6318 and might reconsider it when we bundle zstd anyway.

@nevion @gjasny can any of you prepare a PR that works on all platforms?

@laurentlb laurentlb added team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website and removed team-Front-End labels Jul 10, 2020
@glukasiknuro
Copy link
Contributor

I just started working on making @gjasny change compile for other platforms on newest codebase, will try to make it work in following days: #11968

@gjasny would you be able to add a comment with "@googlebot I consent" to the above PR so that it can include your commits?

@1e100
Copy link

1e100 commented Jun 19, 2021

Here's what I did for this: https://github.com/1e100/cloud_archive

This does require zstd support for tar, however, and does not rely entirely on built in facilities, but it can do authenticated downloads using the various cloud CLIs, validation, patching, and it does support Zstandard. And you don't need to wait for 2 years to submit a patch. :-)

benjaminp added a commit to benjaminp/bazel that referenced this issue Mar 11, 2022
This is mostly a rebase of bazelbuild#11968 with a few tweaks of my own.

Fixes bazelbuild#10342.

Co-authored-by: Grzegorz Lukasik <glukasik@nuro.ai>

RELNOTES[new]: `repository_ctx.extract`, and thus `http_archive`, can now decompress zstandard-compressed archives.
benjaminp added a commit to benjaminp/bazel that referenced this issue Mar 15, 2022
This is mostly a rebase of bazelbuild#11968 with a few tweaks of my own.

Fixes bazelbuild#10342.

Co-authored-by: Grzegorz Lukasik <glukasik@nuro.ai>

RELNOTES[new]: `repository_ctx.extract`, and thus `http_archive`, can now decompress zstandard-compressed archives.
@brentleyjones
Copy link
Contributor

@bazel-io fork 5.1

brentleyjones pushed a commit to brentleyjones/bazel that referenced this issue Mar 21, 2022
This is mostly a rebase of bazelbuild#11968 with a few tweaks of my own.

Fixes bazelbuild#10342.

Co-authored-by: Grzegorz Lukasik <glukasik@nuro.ai>

RELNOTES[new]: `repository_ctx.extract`, and thus `http_archive`, can now decompress zstandard-compressed archives.

Closes bazelbuild#15026.

PiperOrigin-RevId: 436146779
(cherry picked from commit 00d74ff)
Wyverald pushed a commit that referenced this issue Mar 21, 2022
This is mostly a rebase of #11968 with a few tweaks of my own.

Fixes #10342.

Co-authored-by: Grzegorz Lukasik <glukasik@nuro.ai>

RELNOTES[new]: `repository_ctx.extract`, and thus `http_archive`, can now decompress zstandard-compressed archives.

Closes #15026.

PiperOrigin-RevId: 436146779
(cherry picked from commit 00d74ff)

Co-authored-by: Benjamin Peterson <benjamin@engflow.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website type: feature request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants