-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not allocate buffer size before necessary #22
Conversation
We have had performance issues when parsing smaller OpenCTM files using lzma-rs. Profiling showed that the allocation of the LZCircularBuffer was the issue, where it will eventually call |
35eef9d
to
0117fcd
Compare
And I am not sure if this is the cleanest way of achieving dynamic resizing. Let me know if you have any suggestions :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran the benchmarks, which indeed suggests faster decompression of empty or small contents. However, decompress_big_file
went from 9.1ms to 9.6ms on my machine, which should be fine given that there are other areas for performance improvements.
I'm however not sure about the correctness at this point of the implementation, due to resize
potentially shrinking the buffer - see the comments in the code.
Thanks for the comments and feedback! And sorry about the late reply. I did not have time to get back to this before now. I agree with your input and will make sure we do not risk shrinking the buffer. |
This took a while, but I have adapted the code to your comments now. |
bors r+ |
22: Do not allocate buffer size before necessary r=gendx a=dragly This changes the way the LZCircularBuffer is allocated. Instead of allocating the full dict size immediately, the buffer is grown on demand. This helps when parsing smaller files with large dict sizes, especially when compiling to WebAssembly and running in browser where `alloc_zeroed` is slow. Co-authored-by: Svenn-Arne Dragly <dragly@cognite.com>
Timed out |
I can see around 5% regression in speed after this update. Is it possible to use a feature or a runtime option? Or having several buffer implementations, and then passing the one you like from outside? |
Good point. I also noticed it in #22 (review), but I think 5% could be acceptable. However, I've now filed #27 to track this regression, feel free to send a pull-request so that it can be fixed before the next release. |
Just to mention this is a similar intent of what I have done here : For resizing, I would suggest it is important to keep track of the dictionary size mentioned in the header, and not go over it. |
@catenacyber interesting! Thanks for sharing :) Why is it important not to go over it? Can the decoding fail if we do so, or is there a different reason to stay within the size? |
I am no LZMA expert, so I might be mistaken. But what is the behavior of other lzma tools, when the header specifies a dictionary size of, for instance, 4096 bytes, and the decompression somehow tries to access item 4097 of dictionary ? |
From what I see, the indices passed to
|
This changes the way the LZCircularBuffer is allocated. Instead of allocating the full dict size immediately, the buffer is grown on demand. This helps when parsing smaller files with large dict sizes, especially when compiling to WebAssembly and running in browser where
alloc_zeroed
is slow.