Add support for .tar.zst to http_archive #10342

gjasny · 2019-12-01T15:43:59Z

In our project we're using repository rules to download and uncompress prebuilt C++ libraries. Due to the large archive size we were evaluating different compression algorithms. One interesting candidate (which is also supported by CMake) is Zstandard. Right now it is not available within Bazel.

The Apache Commons Compress now supports zstd through the zstd-jni library. Unfortunately it relies on a native library.

I prepared a proof-of-concept branch with some shortcuts in gjasny/zstd-decompression and would like to ask if you'd consider adding support for Zstandard compression given the icky JNI implementation.

(With aircompressor there is native implementation available bit it lacks Big Endian and Stream support)

The text was updated successfully, but these errors were encountered:

nevion · 2020-04-24T20:27:55Z

@philwo / @laurentlb can this be targeted anytime soon?

Adding zstd support for extraction / packing is pretty low hanging fruit. Think about it from the point of view of:

parallel tar/zstd asset downloads+archiving
extraction resources used

Also noting:
-zstd has native thread support (-T#) , gzip / tar can also use parallel compressors/settings (e.g. pigz). tar extraction are often critical path issues for fresh builds, a frequent case in some settings.
-The bazel deterministic tar python script is too slow, please find another way to deterministically tar contents... like a patched binary build to always set times 0 or something.

zstd requires much less cpu to decompress or compress than gzip. This will help all users assuming they switch to zstd assets and are using parallel jobs.

depthwise · 2020-04-27T08:15:05Z

This would be very nice to have in particular for situations which require large downloads, such as, for example, cross-platform builds that pull in massive toochains. I'm working with such a build at the moment, and decompression perf leaves much to be desired even with gzip.

gjasny · 2020-04-27T09:17:08Z

What could prevent adoption of zstd is the dependency on zstd-jni but that's for the Bazel maintainers to decide.

laurentlb · 2020-04-27T10:30:46Z

cc @meisterT

Do you know how big is the dependency? We'd like to avoid significant increases in Bazel binary size.

nevion · 2020-04-27T11:05:17Z

clocking in at 434k, which is inline with the zstd binary https://search.maven.org/artifact/com.github.luben/zstd-jni/1.4.4-9/jar . apache commons supports it so it looks like it could fall in to similar handling code wise

aiuto · 2020-04-27T20:33:52Z

Parallel compression makes me worry about reproducibility. Is that a guarantee?

Also, let's decouple the using and producing conversations.
This is issue is about http_archive, so we only are talking about adding decompression support to Bazel. For having capability to build zstd objects from Bazel, please file a feature request in github.com/bazelbuild/rules_pkg.

nevion · 2020-04-27T22:19:41Z

Yes it is well worth pointing out that step one is support for extract/downloadAndExtract to allow for zstd.

I don't think anything else has to use the jni implementation and other usage on the compressor side are completely seperate from this issue. Merely wanted to let know that getting parallel compression as one driving nice to have and think of that at the skylark/genrule level for implementation for easy -T specification unless the compressor really should live in bazel java land ( recalling docker rules calling gzip directly...). Given that it is block based compression, it should be perfectly reproducable/deterministic output as you div your bytestring up in blocks then process them, looks like it is now at least: facebook/zstd#1077

meisterT · 2020-04-28T14:07:13Z

Is it perhaps cheaper to only include the decompression part of the library?

nevion · 2020-04-28T16:10:58Z

@meisterT maybe, little off the beaten path. Are there any assets you can recompress with zstd to net break even or better?

w/ libzstd compression enabled:
-O3

1.1M    libzstd.a 
888K    libzstd.so.1.4.5

-Os

604K    libzstd.a
496K    libzstd.so.1.4.5

Setting ZSTD_LIB_COMPRESSION=0 and building,
-O3 (default):

380K    libzstd.a
308K    libzstd.so.1.4.5

-Os:

208K    libzstd.a
164K    libzstd.so.1.4.5

I'm not sure what you would do jni side though for this. You may have to trim the library or make sure there's stubs available. I think a final stripped result is also smaller since the zstd binary and precompiled jni releseases are clocking in at the sizes mentioned previously.

meisterT · 2020-04-29T08:32:25Z

@nevion we have thought about it, see #6318 and might reconsider it when we bundle zstd anyway.

@nevion @gjasny can any of you prepare a PR that works on all platforms?

glukasiknuro · 2020-08-18T19:10:07Z

I just started working on making @gjasny change compile for other platforms on newest codebase, will try to make it work in following days: #11968

@gjasny would you be able to add a comment with "@googlebot I consent" to the above PR so that it can include your commits?

1e100 · 2021-06-19T06:56:19Z

Here's what I did for this: https://github.com/1e100/cloud_archive

This does require zstd support for tar, however, and does not rely entirely on built in facilities, but it can do authenticated downloads using the various cloud CLIs, validation, patching, and it does support Zstandard. And you don't need to wait for 2 years to submit a patch. :-)

This is mostly a rebase of bazelbuild#11968 with a few tweaks of my own. Fixes bazelbuild#10342. Co-authored-by: Grzegorz Lukasik <glukasik@nuro.ai> RELNOTES[new]: `repository_ctx.extract`, and thus `http_archive`, can now decompress zstandard-compressed archives.

brentleyjones · 2022-03-21T13:50:41Z

@bazel-io fork 5.1

This is mostly a rebase of bazelbuild#11968 with a few tweaks of my own. Fixes bazelbuild#10342. Co-authored-by: Grzegorz Lukasik <glukasik@nuro.ai> RELNOTES[new]: `repository_ctx.extract`, and thus `http_archive`, can now decompress zstandard-compressed archives. Closes bazelbuild#15026. PiperOrigin-RevId: 436146779 (cherry picked from commit 00d74ff)

This is mostly a rebase of #11968 with a few tweaks of my own. Fixes #10342. Co-authored-by: Grzegorz Lukasik <glukasik@nuro.ai> RELNOTES[new]: `repository_ctx.extract`, and thus `http_archive`, can now decompress zstandard-compressed archives. Closes #15026. PiperOrigin-RevId: 436146779 (cherry picked from commit 00d74ff) Co-authored-by: Benjamin Peterson <benjamin@engflow.com>

irengrig added team-Front-End untriaged labels Dec 2, 2019

laurentlb added P3 We're not considering working on this, but happy to review a PR. (No assignee) type: feature request and removed untriaged labels Mar 6, 2020

laurentlb added team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website and removed team-Front-End labels Jul 10, 2020

glukasiknuro mentioned this issue Aug 19, 2020

Support zstd decompression for external dependencies #11968

Closed

benjaminp mentioned this issue Mar 11, 2022

Support decompressing zstd tar archives for repository rules. #15026

Closed

bazel-io closed this as completed in 00d74ff Mar 21, 2022

bazel-io mentioned this issue Mar 21, 2022

[5.1] Add support for .tar.zst to http_archive #15086

Closed

brentleyjones mentioned this issue Mar 21, 2022

[5.1] Support decompressing zstd tar archives for repository rules. #15087

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for .tar.zst to http_archive #10342

Add support for .tar.zst to http_archive #10342

gjasny commented Dec 1, 2019

nevion commented Apr 24, 2020 •

edited

Loading

depthwise commented Apr 27, 2020

gjasny commented Apr 27, 2020

laurentlb commented Apr 27, 2020

nevion commented Apr 27, 2020

aiuto commented Apr 27, 2020

nevion commented Apr 27, 2020 •

edited

Loading

meisterT commented Apr 28, 2020

nevion commented Apr 28, 2020

meisterT commented Apr 29, 2020

glukasiknuro commented Aug 18, 2020

1e100 commented Jun 19, 2021

brentleyjones commented Mar 21, 2022

Add support for .tar.zst to http_archive #10342

Add support for .tar.zst to http_archive #10342

Comments

gjasny commented Dec 1, 2019

nevion commented Apr 24, 2020 • edited Loading

depthwise commented Apr 27, 2020

gjasny commented Apr 27, 2020

laurentlb commented Apr 27, 2020

nevion commented Apr 27, 2020

aiuto commented Apr 27, 2020

nevion commented Apr 27, 2020 • edited Loading

meisterT commented Apr 28, 2020

nevion commented Apr 28, 2020

meisterT commented Apr 29, 2020

glukasiknuro commented Aug 18, 2020

1e100 commented Jun 19, 2021

brentleyjones commented Mar 21, 2022

nevion commented Apr 24, 2020 •

edited

Loading

nevion commented Apr 27, 2020 •

edited

Loading