-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preparation for release v1.5.5 #3585
Conversation
Somewhat surprisingly, calling `fchmod()` is non-trivially faster than calling `chmod()`, and so on. This commit introduces alternate variants to some common file util functions that take an optional fd. If present, they call the `f`-variant of the underlying function. Otherwise, they fall back to the regular filename-taking version of the function.
Note that the `fd` is only valid while the file is still open. So we need to move the setting calls to before we close the file. However! We cannot do so with the `utime()` call (even though `futimens()` exists) because the follow- ing `close()` call to the `fd` will reset the atime of the file. So it seems the `utime()` call has to happen after the file is closed.
It depends on the zstd program being built, and passes it as an env variable. Just like datagen. But for datagen, we explicitly depend on it, while for zstd, we assume it's built as part of "all". This can be wrong in two cases: - when running individual tests, meson can (re)build just what is needed for that one test - a later patch will handle building zstd but not by default
We need to run it for the tests, even if programs are disabled. So if they are disabled, create a build rule for the program, but don't install it. Just make it available for the test itself.
* fix and test MSVC AVX2 build * treat msbuild warnings as errors * fix incorrect MSVC 2019 compiler warning * fix MSVC error D9035: option 'Gm' has been deprecated and will be removed in a future release
* Fixes zstd-dll build (#3492): - Adds pool.o and threading.o dependency to the zstd-dll target - Moves custom allocation functions into header to avoid needing to add dependency on common.o - Adds test target for zstd-dll - Adds github workflow that buildis zstd-dll
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 2.2.1 to 2.2.4. - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](github/codeql-action@3ebbd71...17573ee) --- updated-dependencies: - dependency-name: github/codeql-action dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Github Action to generate Win64 artifacts
* Make Github workflows permissions read-only by default * Pins `skx/github-action-publish-binaries` action to specific hash
Implemented CI workflow for testing compilation with external compressors and without them. This serves as a sanity check to avoid any code dependencies on libraries that may not always be present. (Reference: #3497 for a bug fix related to this issue.)
Publishing release artifacts requires the `contents` permission, as documented by: https://docs.github.com/en/rest/overview/permissions-required-for-github-apps.
Fix cli-tests issues
…-permission Fix Permissions on Publish Release Artifacts Job
Use `f`-variants of `chmod()` and `chown()`
fix #3500 CMake 3.18 or later was required by #3392. Because it uses `CheckLinkerFlag`. But requiring CMake 3.18 or later is a bit aggressive. Because Ubuntu 20.04 LTS still uses CMake 3.16.3: https://packages.ubuntu.com/search?keywords=cmake This change disables `-z noexecstack` check with old CMake. This will not break any existing users. Because users who need `-z noexecstack` must already use CMake 3.18 or later.
meson: always build the zstd binary when tests are enabled
[contrib/pzstd] Detect and Select Maximum Available C++ Standard
Add instructions for building Universal2 on macOS via CMake
Provide an interface for fuzzing sequence producer plugins
Inlining `BIT_reloadDStream` provided >3% decompression speed improvement for clang PGO-optimized zstd binary, measured using the Silesia corpus with compression level 1. The win comes from improved register allocation which leads to fewer spills and reloads. Take a look at this comparison of profile-annotated hot assembly before and after this change: https://www.diffchecker.com/UjDGIyLz/. The diff is a bit messy, but notice three fewer moves after inlining. In general LLVM's register allocator works better when it can see more code. For example, when the register allocator sees a call instruction, it partitions the registers into caller registers and callee registers, and it is not free to do whatever it wants with all the registers for the current function. Inlining the callee lets the register allocation access all registers and use them more flexsibly.
Looking at the __builtin_expect in ZSTD_decodeSequence: { size_t offset; #if defined(__clang__) if (LIKELY(ofBits > 1)) { #else if (ofBits > 1) { #endif ZSTD_STATIC_ASSERT(ZSTD_lo_isLongOffset == 1); From profile-annotated assembly, the probability of ofBits > 1 is about 75% (101k counts out of 135k counts). This is much smaller than the recommended likelihood to use __builtin_expect which is 99%. As a result, clang moved the else block further away which hurts cache locality. Removing this __built_expect along with two others in ZSTD_decodeSequence gave better performance when PGO is enabled. I suggest to remove these branch hints and rely on PGO which leverages runtime profiles from actual workload to calculate branch probability instead.
* mmap for windows * remove enabling mmap for testing * rename FIO dictionary initialization methods + un-const dictionary objects in free functions * remove enabling mmap for testing * initDict returns void, underlying setDictBuffer methods return the size of the set buffer * fix comment
…ub/codeql-action-2.2.8 Bump github/codeql-action from 2.2.6 to 2.2.8
Disable linker flag detection on MSVC/ClangCL.
If I understand correctly, this should trigger the issue notified in #3569.
When decompressing a seekable file, if seeking forward within a frame (by issuing multiple ZSTD_seekable_decompress calls with a small gap between them), the frame will be unnecessarily reread from the beginning. This patch makes it continue using the current frame data and simply skip over the unneeded bytes.
This does the following: 1. Compress test data into multiple frames 2. Perform a series of small decompressions and seeks forward, checking that compressed data wasn't reread unnecessarily. 3. Perform some seeks forward and backward to ensure correctness.
decompression features automatic support of sparse files, aka a form of "compression" where entire blocks consists only of zeroes. This only works for some compatible file systems (like ext4), others simply ignore it (like afs). Triggering this feature relies of `fseek()`. But `fseek()` is not compatible with non-seekable devices, such as pipes. Therefore it's disabled for pipes. However, there are other objects which are not compatible with `fseek()`, such as block devices. Changed the logic, so that `fseek()` (and therefore sparse write) is only automatically enabled on regular files. Note that this automatic behavior can always be overridden by explicit commands `--sparse` and `--no-sparse`. fix #3583
Couple tweaks to improve decompression speed with clang PGO compilation
Increase tests timeout
added a Clang-CL Windows test to CI
Seekable format read optimization
* add check for valid dest buffer and fuzz on random dest ptr when malloc 0 * add uptrval to linux-kernel * remove bin files * get rid of uptrval * restrict max pointer value check to platforms where sizeof(size_t) == sizeof(void*)
fix decompression with -o writing into a block device
also : updated man pages
in preparation for v1.5.5
You have successfully added a new Scorecard configuration |
You have successfully added a new Scorecard configuration |
You have successfully added a new Scorecard configuration |
fix: fix rare corruption bug affecting the high compression mode, reported by @danlark1 (#3517, @terrelln)
perf: improve mid-level compression speed (#3529, #3533, #3543, @yoniko and #3552, @terrelln)
lib: deprecated bufferless block-level API (#3534) by @terrelln
cli: mmap large dictionaries to save memory, by @daniellerozenblit
cli: improve speed of --patch-from mode (+50%) (#3545) by @daniellerozenblit
cli: improve i/o speed (+10%) when processing lots of small files (#3479) by @felixhandte
cli: zstd no longer crashes when requested to write into write-protected directory (#3541) by @felixhandte
cli: fix decompression into block device using -o (#3584, @Cyan4973) reported by @georgmu
build: fix zstd CLI compiled with lzma support but not zlib support (#3494) by @Hello71
build: fix cmake does no longer require 3.18 as minimum version (#3510) by @kou
build: fix MSVC+ClangCL linking issue (#3569) by @tru
build: fix zstd-dll, version of zstd CLI that links to the dynamic library (#3496) by @yoniko
build: fix MSVC warnings (#3495) by @embg
doc: updated zstd specification to clarify corner cases, by @Cyan4973
doc: document how to create fat binaries for macos (#3568) by @rickmark
misc: improve seekable format ingestion speed (~+100%) for very small chunk sizes (#3544) by @Cyan4973
misc: tests/fullbench can benchmark multiple files (#3516) by @dloidolt