Do not allocate buffer size before necessary #22

dragly · 2019-12-19T20:49:51Z

This changes the way the LZCircularBuffer is allocated. Instead of allocating the full dict size immediately, the buffer is grown on demand. This helps when parsing smaller files with large dict sizes, especially when compiling to WebAssembly and running in browser where alloc_zeroed is slow.

dragly · 2019-12-19T20:53:53Z

We have had performance issues when parsing smaller OpenCTM files using lzma-rs. Profiling showed that the allocation of the LZCircularBuffer was the issue, where it will eventually call alloc_zeroed in the allocator. This slowdown is apparently caused by memset being pretty slow when running WebAssembly in browsers. In particular, files that take less than a few milliseconds to actually parse will spend an additional 100 ms just to allocate this buffer if the dict size is large. In our case, the dict size is 67M in these files, even though the file itself may be just a few hundred bytes.

dragly · 2019-12-19T20:57:10Z

And I am not sure if this is the cleanest way of achieving dynamic resizing. Let me know if you have any suggestions :)

gendx

I ran the benchmarks, which indeed suggests faster decompression of empty or small contents. However, decompress_big_file went from 9.1ms to 9.6ms on my machine, which should be fine given that there are other areas for performance improvements.

I'm however not sure about the correctness at this point of the implementation, due to resize potentially shrinking the buffer - see the comments in the code.

src/decode/lzbuffer.rs

dragly · 2020-02-04T13:08:46Z

Thanks for the comments and feedback! And sorry about the late reply. I did not have time to get back to this before now. I agree with your input and will make sure we do not risk shrinking the buffer.

dragly · 2020-03-04T10:26:55Z

This took a while, but I have adapted the code to your comments now.

gendx · 2020-03-04T20:45:40Z

bors r+

22: Do not allocate buffer size before necessary r=gendx a=dragly This changes the way the LZCircularBuffer is allocated. Instead of allocating the full dict size immediately, the buffer is grown on demand. This helps when parsing smaller files with large dict sizes, especially when compiling to WebAssembly and running in browser where `alloc_zeroed` is slow. Co-authored-by: Svenn-Arne Dragly <dragly@cognite.com>

bors · 2020-03-04T22:00:42Z

Timed out

ibaryshnikov · 2020-03-05T18:20:33Z

I can see around 5% regression in speed after this update. Is it possible to use a feature or a runtime option? Or having several buffer implementations, and then passing the one you like from outside?

gendx · 2020-03-05T21:02:11Z

I can see around 5% regression in speed after this update. Is it possible to use a feature or a runtime option? Or having several buffer implementations, and then passing the one you like from outside?

Good point. I also noticed it in #22 (review), but I think 5% could be acceptable. However, I've now filed #27 to track this regression, feel free to send a pull-request so that it can be fixed before the next release.

catenacyber · 2020-06-08T19:58:20Z

Just to mention this is a similar intent of what I have done here :
OISF/libhtp@fe16fa7#diff-523a8b9d16e6d9f006c824e158eeb4b7

For resizing, I would suggest it is important to keep track of the dictionary size mentioned in the header, and not go over it.

dragly · 2020-06-11T14:44:14Z

@catenacyber interesting! Thanks for sharing :)

Why is it important not to go over it? Can the decoding fail if we do so, or is there a different reason to stay within the size?

catenacyber · 2020-06-11T17:03:44Z

I am no LZMA expert, so I might be mistaken.

But what is the behavior of other lzma tools, when the header specifies a dictionary size of, for instance, 4096 bytes, and the decompression somehow tries to access item 4097 of dictionary ?
Let me know if I am not clear.
cf line 1041 here : OISF/libhtp@2fb0091#diff-1143601df653d71da0a4ff7c68de9e29R1041

gendx · 2020-07-13T10:15:20Z

Just to mention this is a similar intent of what I have done here :
OISF/libhtp@fe16fa7#diff-523a8b9d16e6d9f006c824e158eeb4b7

For resizing, I would suggest it is important to keep track of the dictionary size mentioned in the header, and not go over it.

From what I see, the indices passed to get are currently always modulo the dict size, so we shouldn't reach past the dictionary size of the header. That said:

We could add checks in debug_assertions mode inside of get so that indeed this doesn't happen.
We should have unit tests for it. Implementing Add a test encoder taking LZ matches as input #24 would greatly help to write these kinds of unit tests (and such tests could be helpful to compare to other LZMA parsers as well).

Do not allocate buffer size before necessary

0117fcd

dragly force-pushed the dragly/dynamic-buffer-size branch from 35eef9d to 0117fcd Compare December 19, 2019 20:56

gendx requested changes Dec 21, 2019

View reviewed changes

src/decode/lzbuffer.rs Outdated Show resolved Hide resolved

src/decode/lzbuffer.rs Outdated Show resolved Hide resolved

src/decode/lzbuffer.rs Outdated Show resolved Hide resolved

Adapt to comments from review

2b2e27f

gendx approved these changes Mar 4, 2020

View reviewed changes

gendx merged commit 94553d6 into gendx:master Mar 4, 2020

This was referenced Mar 5, 2020

Limit memory use for huge dictionary #7

Closed

Add option to tune how the buffer size is allocated. #27

Open

gendx mentioned this pull request Mar 5, 2020

Travis/Bors integration broken #28

Closed

dragly deleted the dragly/dynamic-buffer-size branch March 9, 2020 12:18

Shnatsel mentioned this pull request Apr 16, 2020

Add decoder fuzzing #34

Closed

chyyran mentioned this pull request Aug 5, 2022

Allow passing in a preallocated buffer to LzBuffer constructors #76

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not allocate buffer size before necessary #22

Do not allocate buffer size before necessary #22

dragly commented Dec 19, 2019

dragly commented Dec 19, 2019 •

edited

Loading

dragly commented Dec 19, 2019

gendx left a comment

dragly commented Feb 4, 2020

dragly commented Mar 4, 2020

gendx commented Mar 4, 2020

bors bot commented Mar 4, 2020

ibaryshnikov commented Mar 5, 2020

gendx commented Mar 5, 2020

catenacyber commented Jun 8, 2020

dragly commented Jun 11, 2020

catenacyber commented Jun 11, 2020

gendx commented Jul 13, 2020

Do not allocate buffer size before necessary #22

Do not allocate buffer size before necessary #22

Conversation

dragly commented Dec 19, 2019

dragly commented Dec 19, 2019 • edited Loading

dragly commented Dec 19, 2019

gendx left a comment

Choose a reason for hiding this comment

dragly commented Feb 4, 2020

dragly commented Mar 4, 2020

gendx commented Mar 4, 2020

bors bot commented Mar 4, 2020

Timed out

ibaryshnikov commented Mar 5, 2020

gendx commented Mar 5, 2020

catenacyber commented Jun 8, 2020

dragly commented Jun 11, 2020

catenacyber commented Jun 11, 2020

gendx commented Jul 13, 2020

dragly commented Dec 19, 2019 •

edited

Loading