-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for Huffman only strategy in pigz #79
Comments
@smikkelsendk your code creates a very compact executable that does provide nice performance for some datasets and should work on a huge range of architectures. You may also want to look at the level 1 compression strategy for the Intel IPP ZLIB which trades off compressed file size for extreme speed. An Intel-tuned implementation is here, while zlib-ng uses this strategy (and other optimizations) for many architectures. Below is my test of your code versus the minigzip executable generated when you statically compile zlib-ng. I wonder if one reason zlib-ng is so fast is they use SIMD instructions to compute the CRC, which might end up being a rate-limiting factor with the fastest compression levels. You can also build pigz with zlib-ng. My pigz-bench-python implements this with cmake. However, be aware that the current zlib-ng has an issue with pigz using level 1, so until this is resolved you should probably compile pigz to CloudFlare zlib (which will give you SIMD CRC and other optimizations, but does not use the Intel level 1 strategy). pigz is able to compute the compression in parallel, but the CRC must be computed in serial. Therefore, SIMD CRC has a huge benefit for pigz. So if you want a fast zlib implementation, I think you really want to build with zlib-ng or CloudFlare zlib.
|
Added in be56dba . |
I work in the bioinformatics industry. I had a case where I wanted to use gzip compression between two programs in a bioinformatics pipeline, as gzip is one of the most well supported formats.
However the performance overhead of the compression, even when running with lowest compression was simply too large to justify enabling it.
Almost by accident I tried to set the deflater strategy to HUFFMAN_ONLY and to my surprise not only did it double the speed of compression compared to gzip with the lowest compression level, it also compressed almost as well as gzip with default compression.
I have done some more benchmarks in my github repository: https://github.com/smikkelsendk/gziphm
I wanted to let you know in case you think it could be a useful option to add to pigz.
The text was updated successfully, but these errors were encountered: