Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not allocate buffer size before necessary #22

Merged
merged 2 commits into from
Mar 4, 2020

Conversation

dragly
Copy link
Contributor

@dragly dragly commented Dec 19, 2019

This changes the way the LZCircularBuffer is allocated. Instead of allocating the full dict size immediately, the buffer is grown on demand. This helps when parsing smaller files with large dict sizes, especially when compiling to WebAssembly and running in browser where alloc_zeroed is slow.

@dragly
Copy link
Contributor Author

dragly commented Dec 19, 2019

We have had performance issues when parsing smaller OpenCTM files using lzma-rs. Profiling showed that the allocation of the LZCircularBuffer was the issue, where it will eventually call alloc_zeroed in the allocator. This slowdown is apparently caused by memset being pretty slow when running WebAssembly in browsers. In particular, files that take less than a few milliseconds to actually parse will spend an additional 100 ms just to allocate this buffer if the dict size is large. In our case, the dict size is 67M in these files, even though the file itself may be just a few hundred bytes.

@dragly dragly force-pushed the dragly/dynamic-buffer-size branch from 35eef9d to 0117fcd Compare December 19, 2019 20:56
@dragly
Copy link
Contributor Author

dragly commented Dec 19, 2019

And I am not sure if this is the cleanest way of achieving dynamic resizing. Let me know if you have any suggestions :)

Copy link
Owner

@gendx gendx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the benchmarks, which indeed suggests faster decompression of empty or small contents. However, decompress_big_file went from 9.1ms to 9.6ms on my machine, which should be fine given that there are other areas for performance improvements.

I'm however not sure about the correctness at this point of the implementation, due to resize potentially shrinking the buffer - see the comments in the code.

@dragly
Copy link
Contributor Author

dragly commented Feb 4, 2020

Thanks for the comments and feedback! And sorry about the late reply. I did not have time to get back to this before now. I agree with your input and will make sure we do not risk shrinking the buffer.

@dragly
Copy link
Contributor Author

dragly commented Mar 4, 2020

This took a while, but I have adapted the code to your comments now.

@gendx
Copy link
Owner

gendx commented Mar 4, 2020

bors r+

bors bot added a commit that referenced this pull request Mar 4, 2020
22: Do not allocate buffer size before necessary r=gendx a=dragly

This changes the way the LZCircularBuffer is allocated. Instead of allocating the full dict size immediately, the buffer is grown on demand. This helps when parsing smaller files with large dict sizes, especially when compiling to WebAssembly and running in browser where `alloc_zeroed` is slow.

Co-authored-by: Svenn-Arne Dragly <dragly@cognite.com>
@gendx gendx merged commit 94553d6 into gendx:master Mar 4, 2020
@bors
Copy link
Contributor

bors bot commented Mar 4, 2020

Timed out

@ibaryshnikov
Copy link
Contributor

I can see around 5% regression in speed after this update. Is it possible to use a feature or a runtime option? Or having several buffer implementations, and then passing the one you like from outside?

@gendx
Copy link
Owner

gendx commented Mar 5, 2020

I can see around 5% regression in speed after this update. Is it possible to use a feature or a runtime option? Or having several buffer implementations, and then passing the one you like from outside?

Good point. I also noticed it in #22 (review), but I think 5% could be acceptable. However, I've now filed #27 to track this regression, feel free to send a pull-request so that it can be fixed before the next release.

@dragly dragly deleted the dragly/dynamic-buffer-size branch March 9, 2020 12:18
@Shnatsel Shnatsel mentioned this pull request Apr 16, 2020
@catenacyber
Copy link

Just to mention this is a similar intent of what I have done here :
OISF/libhtp@fe16fa7#diff-523a8b9d16e6d9f006c824e158eeb4b7

For resizing, I would suggest it is important to keep track of the dictionary size mentioned in the header, and not go over it.

@dragly
Copy link
Contributor Author

dragly commented Jun 11, 2020

@catenacyber interesting! Thanks for sharing :)

Why is it important not to go over it? Can the decoding fail if we do so, or is there a different reason to stay within the size?

@catenacyber
Copy link

I am no LZMA expert, so I might be mistaken.

But what is the behavior of other lzma tools, when the header specifies a dictionary size of, for instance, 4096 bytes, and the decompression somehow tries to access item 4097 of dictionary ?
Let me know if I am not clear.
cf line 1041 here : OISF/libhtp@2fb0091#diff-1143601df653d71da0a4ff7c68de9e29R1041

@gendx
Copy link
Owner

gendx commented Jul 13, 2020

Just to mention this is a similar intent of what I have done here :
OISF/libhtp@fe16fa7#diff-523a8b9d16e6d9f006c824e158eeb4b7

For resizing, I would suggest it is important to keep track of the dictionary size mentioned in the header, and not go over it.

From what I see, the indices passed to get are currently always modulo the dict size, so we shouldn't reach past the dictionary size of the header. That said:

  • We could add checks in debug_assertions mode inside of get so that indeed this doesn't happen.
  • We should have unit tests for it. Implementing Add a test encoder taking LZ matches as input #24 would greatly help to write these kinds of unit tests (and such tests could be helpful to compare to other LZMA parsers as well).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants