forked from facebook/zstd
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[huf] Improve fast C & ASM performance on small data
* Rename `ilimit` to `ilowest` and set it equal to `src` instead of `src + 6 + 8`. This is safe because the fast decoding loops guarantee to never read below `ilowest` already. This allows the fast decoder to run for at least two more iterations, because it consumes at most 7 bytes per iteration. * Continue the fast loop all the way until the number of safe iterations is 0. Initially, I thought that when it got towards the end, the computation of how many iterations of safe might become expensive. But it ends up being slower to have to decode each of the 4 streams individually, which makes sense. This drastically speeds up the Huffman decoder on the `github` dataset for the issue raised in facebook#3762, measured with `zstd -b1e1r github/`. | Decoder | Speed before | Speed after | |----------|--------------|-------------| | Fallback | 477 MB/s | 477 MB/s | | Fast C | 384 MB/s | 492 MB/s | | Assembly | 385 MB/s | 501 MB/s | We can also look at the speed delta for different block sizes of silesia using `zstd -b1e1r silesia.tar -B#`. | Decoder | -B1K ∆ | -B2K ∆ | -B4K ∆ | -B8K ∆ | -B16K ∆ | -B32K ∆ | -B64K ∆ | -B128K ∆ | |----------|--------|--------|--------|--------|---------|---------|---------|----------| | Fast C | +11.2% | +8.2% | +6.1% | +4.4% | +2.7% | +1.5% | +0.6% | +0.2% | | Assembly | +12.5% | +9.0% | +6.2% | +3.6% | +1.5% | +0.7% | +0.2% | +0.03% |
- Loading branch information
Nick Terrell
committed
Nov 20, 2023
1 parent
c7269ad
commit 6385862
Showing
2 changed files
with
61 additions
and
57 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters