Skip to content

Zstandard v1.5.0

Compare
Choose a tag to compare
@senhuang42 senhuang42 released this 14 May 16:01
· 2177 commits to dev since this release
a488ba1

v1.5.0 is a major release featuring large performance improvements as well as API changes.

Performance

Improved Middle-Level Compression Speed

1.5.0 introduces a new default match finder for the compression strategies greedy, lazy, and lazy2, (which map to levels 5-12 for inputs larger than 256K). The optimization brings a massive improvement in compression speed with slight perturbations in compression ratio (< 0.5%) and equal or decreased memory usage.

Benchmarked with gcc, on an i9-9900K:

level silesia.tar speed delta enwik7 speed delta
5 +25% +25%
6 +50% +50%
7 +40% +40%
8 +40% +50%
9 +50% +65%
10 +65% +80%
11 +85% +105%
12 +110% +140%

On heavily loaded machines with significant cache contention, we have internally measured even larger gains: 2-3x+ speed at levels 5-7. 🚀

The biggest gains are achieved on files typically larger than 128KB. On files smaller than 16KB, by default we revert back to the legacy match finder which becomes the faster one. This default policy can be overriden manually: the new match finder can be forcibly enabled with the advanced parameter ZSTD_c_useRowMatchFinder, or through the CLI option --[no-]row-match-finder.

Note: only CPUs that support SSE2 realize the full extent of this improvement.

Improved High-Level Compression Ratio

Improving compression ratio via block splitting is now enabled by default for high compression levels (16+). The amount of benefit varies depending on the workload. Compressing archives comprised of heavily differing files will see more improvement than compression of single files that don’t vary much entropically (like text files/enwik). At levels 16+, we observe no measurable regression to compression speed.

level 22 compression

file ratio 1.4.9 ratio 1.5.0 ratio % delta
silesia.tar 4.021 4.041 +0.49%
calgary.tar 3.646 3.672 +0.71%
enwik7 3.579 3.579 +0.0%

The block splitter can be forcibly enabled on lower compression levels as well with the advanced parameter ZSTD_c_splitBlocks. When forcibly enabled at lower levels, speed regressions can become more notable. Additionally, since more compressed blocks may be produced, decompression speed on these blobs may also see small regressions.

Faster Decompression Speed

The decompression speed of data compressed with large window settings (such as --long or --ultra) has been significantly improved in this version. The gains vary depending on compiler brand and version, with clang generally benefiting the most.

The following benchmark was measured by compressing enwik9 at level --ultra -22 (with a 128 MB window size) on a core i7-9700K.

Compiler version D. Speed improvement
gcc-7 +15%
gcc-8 +10 %
gcc-9 +5%
gcc-10 +1%
clang-6 +21%
clang-7 +16%
clang-8 +16%
clang-9 +18%
clang-10 +16%
clang-11 +15%

Average decompression speed for “normal” payload is slightly improved too, though the impact is less impressive. Once again, mileage varies depending on exact compiler version, payload, and even compression level. In general, a majority of scenarios see benefits ranging from +1 to +9%. There are also a few outliers here and there, from -4% to +13%. The average gain across all these scenarios stands at ~+4%.

Library Updates

Dynamic Library Supports Multithreading by Default

It was already possible to compile libzstd with multithreading support. But it was an active operation. By default, the make build script would build libzstd as a single-thread-only library.

This changes in v1.5.0.
Now the dynamic library (typically libzstd.so.1 on Linux) supports multi-threaded compression by default.
Note that this property is not extended to the static library (typically libzstd.a on Linux) because doing so would have impacted the build script of existing client applications (requiring them to add -pthread to their recipe), thus potentially breaking their build. In order to avoid this disruption, the static library remains single-threaded by default.
Luckily, this build disruption does not extend to the dynamic library, which can be built with multi-threading support while existing applications linking to libzstd.so and expecting only single-thread capabilities will be none the wiser, and remain completely unaffected.

The idea is that starting from v1.5.0, applications can expect the dynamic library to support multi-threading should they need it, which will progressively lead to increased adoption of this capability overtime.
That being said, since the locally deployed dynamic library may, or may not, support multi-threading compression, depending on local build configuration, it’s always better to check this capability at runtime. For this goal, it’s enough to check the return value when changing parameter ZSTD_c_nbWorkers , and if it results in an error, then multi-threading is not supported.

Q: What if I prefer to keep the libraries in single-thread mode only ?
The target make lib-nomt will ensure this outcome.

Q: Actually, I want both static and dynamic library versions to support multi-threading !
The target make lib-mt will generate this outcome.

Promotions to Stable

Moving up to the higher digit 1.5 signals an opportunity to extend the stable portion of zstd public API.
This update is relatively minor, featuring only a few non-controversial newcomers.

ZSTD_defaultCLevel() indicates which level is default (applied when selecting level 0). It completes existing
ZSTD_minCLevel() and ZSTD_maxCLevel().
Similarly, ZSTD_getDictID_fromCDict() is a straightforward equivalent to already promoted ZSTD_getDictID_fromDDict().

Deprecations

Zstd-1.4.0 stabilized a new advanced API which allows users to pass advanced parameters to zstd. We’re now deprecating all the old experimental APIs that are subsumed by the new advanced API. They will be considered for removal in the next Zstd major release zstd-1.6.0. Note that only experimental symbols are impacted. Stable functions, like ZSTD_initCStream(), remain fully supported.

The deprecated functions are listed below, together with the migration. All the suggested migrations are stable APIs, meaning that once you migrate, the API will be supported forever. See the documentation for the deprecated functions for more details on how to migrate.

Header File Locations

Zstd has slightly re-organized the library layout to move all public headers to the top level lib/ directory. This is for consistency, so all public headers are in lib/ and all private headers are in a sub-directory. If you build zstd from source, this may affect your build system.

  • lib/common/zstd_errors.h has moved to lib/zstd_errors.h.
  • lib/dictBuilder/zdict.h has moved to lib/zdict.h.

Single-File Library

We have moved the scripts in contrib/single_file_libs to build/single_file_libs. These scripts, originally contributed by @cwoffenden, produce a single compilation-unit amalgamation of the zstd library, which can be convenient for integrating Zstandard into other source trees. This move reflects a commitment on our part to support this tool and this pattern of using zstd going forward.

Windows Release Artifact Format

We are slightly changing the format of the Windows release .zip files, to match our other release artifacts. The .zip files now bundle everything in a single folder whose name matches the archive name. The contents of that folder exactly match what was previously included in the root of the archive.

Signed Releases

We have created a signing key for the Zstandard project. This release and all future releases will be signed by this key. See #2520 for discussion.

Changelog

  • api: Various functions promoted from experimental to stable API: (#2579-#2581, @senhuang42)
    • ZSTD_defaultCLevel()
    • ZSTD_getDictID_fromCDict()
  • api: Several experimental functions have been deprecated and will emit a compiler warning (#2582, @senhuang42)
    • ZSTD_compress_advanced()
    • ZSTD_compress_usingCDict_advanced()
    • ZSTD_compressBegin_advanced()
    • ZSTD_compressBegin_usingCDict_advanced()
    • ZSTD_initCStream_srcSize()
    • ZSTD_initCStream_usingDict()
    • ZSTD_initCStream_usingCDict()
    • ZSTD_initCStream_advanced()
    • ZSTD_initCStream_usingCDict_advanced()
    • ZSTD_resetCStream()
  • api: ZSTDMT_NBWORKERS_MAX reduced to 64 for 32-bit environments (#2643, @Cyan4973)
  • perf: Significant speed improvements for middle compression levels (#2494, @senhuang42 & @terrelln)
  • perf: Block splitter to improve compression ratio, enabled by default for high compression levels (#2447, @senhuang42)
  • perf: Decompression loop refactor, speed improvements on clang and for --long modes (#2614 #2630, @Cyan4973)
  • perf: Reduced stack usage during compression and decompression entropy stage (#2522 #2524, @terrelln)
  • bug: Make the number of physical CPU cores detection more robust (#2517, @PaulBone)
  • bug: Improve setting permissions of created files (#2525, @felixhandte)
  • bug: Fix large dictionary non-determinism (#2607, @terrelln)
  • bug: Fix various dedicated dictionary search bugs (#2540 #2586, @senhuang42 @felixhandte)
  • bug: Fix non-determinism test failures on Linux i686 (#2606, @terrelln)
  • bug: Fix UBSAN error in decompression (#2625, @terrelln)
  • bug: Fix superblock compression divide by zero bug (#2592, @senhuang42)
  • bug: Ensure ZSTD_estimateCCtxSize*() monotonically increases with compression level (#2538, @senhuang42)
  • doc: Improve zdict.h dictionary training API documentation (#2622, @terrelln)
  • doc: Note that public ZSTD_free*() functions accept NULL pointers (#2521, @animalize)
  • doc: Add style guide docs for open source contributors (#2626, @Cyan4973)
  • tests: Better regression test coverage for different dictionary modes (#2559, @senhuang42)
  • tests: Better test coverage of index reduction (#2603, @terrelln)
  • tests: OSS-Fuzz coverage for seekable format (#2617, @senhuang42)
  • tests: Test coverage for ZSTD threadpool API (#2604, @senhuang42)
  • build: Dynamic library built multithreaded by default (#2584, @senhuang42)
  • build: Move zstd_errors.h and zdict.h to lib/ root (#2597, @terrelln)
  • build: Single file library build script moved to build/ directory (#2618, @felixhandte)
  • build: Allow ZSTDMT_JOBSIZE_MIN to be configured at compile-time, reduce default to 512KB (#2611, @Cyan4973)
  • build: Fixed Meson build (#2548, @SupervisedThinking & @kloczek)
  • build: ZBUFF_*() is no longer built by default (#2583, @senhuang42)
  • build: Fix excessive compiler warnings with clang-cl and CMake (#2600, @nickhutchinson)
  • build: Detect presence of md5 on Darwin (#2609, @felixhandte)
  • build: Avoid SIGBUS on armv6 (#2633, @bmwiedmann)
  • cli: --progress flag added to always display progress bar (#2595, @senhuang42)
  • cli: Allow reading from block devices with --force (#2613, @felixhandte)
  • cli: Fix CLI filesize display bug (#2550, @Cyan4973)
  • cli: Fix windows CLI --filelist end-of-line bug (#2620, @Cyan4973)
  • contrib: Various fixes for linux kernel patch (#2539, @terrelln)
  • contrib: Seekable format - Decompression hanging edge case fix (#2516, @senhuang42)
  • contrib: Seekable format - New seek table-only API (#2113 #2518, @mdittmer @Cyan4973)
  • contrib: Seekable format - Fix seek table descriptor check when loading (#2534, @foxeng)
  • contrib: Seekable format - Decompression fix for large offsets, (#2594, @azat)
  • misc: Automatically published release tarballs available on Github (#2535, @felixhandte)