[zstd_stream] Don't block in reader.Read if a zstd block is available #96

delthas · 2021-03-02T22:01:52Z

reader.Read used to try to fully read an internal buffer until EOF or
the buffer was filled. That was buffer was set to ZSTD_DStreamInSize,
which is larger than any zstd block.

This means that reader.Read could try to buffer much more data than
what was needed to process and return a single block from the Read
method.

This was an issue because we could miss an urgent Flush from a
corresponding Writer by blocking. (A typical use case is instant
messaging.) It was also against the general convention of io.Read that a
single call should return any immediately available data without
blocking, if any.

Interestingly enough, the test case should have caught this up, but
because we used a bytes.Buffer, the Read returned io.EOF after reading
the entirety of the buffer, even if we appended to the buffer later on.
The test case is also fixed by this commit.

Fixes: #95

delthas · 2021-03-02T22:02:41Z

It could be interesting to run some benchmarks to see if that impacts performance, and if yes, to take a decision on whether that should be the default or flag-dependent.

delthas · 2021-03-30T10:19:15Z

Benchmarks:
Before (modified the Decompression test to use io.ReadAll so that we test the same stuff as After):

goos: windows
goarch: amd64
pkg: github.com/DataDog/zstd
cpu: Intel(R) Core(TM) i7-3820 CPU @ 3.60GHz
BenchmarkStreamCompression
BenchmarkStreamCompression-8     	     248	 143712750 ns/op	  69.38 MB/s
BenchmarkStreamDecompression
BenchmarkStreamDecompression-8   	    2014	  17949798 ns/op	 555.47 MB/s
PASS

After:

goos: windows
goarch: amd64
pkg: github.com/DataDog/zstd
cpu: Intel(R) Core(TM) i7-3820 CPU @ 3.60GHz
BenchmarkStreamCompression
BenchmarkStreamCompression-8     	     244	 144691207 ns/op	  68.91 MB/s
BenchmarkStreamDecompression
BenchmarkStreamDecompression-8   	    1989	  18085000 ns/op	 551.32 MB/s
PASS

No significant change for compression (as expected), ~0.7% slowdown for decompression. IMO no need to add a flag for that, the performance difference is OK.

delthas · 2021-03-30T10:23:44Z

Ready for review @Viq111

delthas · 2021-05-24T10:12:46Z

Gentle ping 😃

Viq111 · 2021-06-01T17:45:59Z

Hi! Sorry I dropped the ball on this one so that's totally on me.
I was out last week but I'm back this week so I'll be able to give it a look.

Viq111

Thanks for the PR! It definitely makes a lot of sense
Rereading the stream decompression path, I think we can handle / document a bit better all the EOF cases

zstd_stream_test.go

zstd_stream.go

Viq111 · 2021-06-04T20:43:05Z

zstd_stream.go

+				return 0, fmt.Errorf("failed to read from underlying reader: %s", err)
+			}
+			if n == 0 {
+				return 0, io.EOF


So I think you are correct that if r.decompSize == len(r.decompressionBuffer) then zstd should output a decompressed block without additional input (except if its estimation was not correct but in that case, next loop it will ask more data).
But I don't think it means the opposite is correct though: i.e: r.decompSize < len(r.decompressionBuffer) does not mean without any additional input, not output will be produced. From the documentation (https://github.com/DataDog/zstd/blob/1.x/zstd.h#L777), a previous output could actually produce many blocks even if no additional input is given (and retCode is only a hint) so I think it would be safer to defer the return of io.EOF after the actual call to C zstd if err == EOF and compressionLeft == 0 (no compression left)

You might have more background than me on non-EOFed pipes but my understanding is that we should get (and return) only EOF when the pipe will not provide data anymore at any point in the future

I'm initially thinking of changing the condition to:

remainingOut := r.decompSize - r.decompOff if err == io.EOF && r.compressionLeft == 0 && remainingOut == 0 {

My understanding is that if r.decompSize < len(r.decompressionBuffer), ie if output.pos < output.size, the decoder flushed all it could (see https://github.com/DataDog/zstd/blob/1.x/zstd.h#L774); in other words the decoder needs more data in order to output anything else.

The line you linked seems to refer to a special condition if len(r.decompressionBuffer) > ZSTD_BLOCKSIZE_MAX. However this will never be the case, because the decompression buffer is allocated by a pool to a size of ZSTD_DStreamOutSize, which is == ZSTD_BLOCKSIZE_MAX. (Which is the purpose of using ZSTD_DStreamOutSize().)

So the condition looks correct to me as is. What do you think?

Indeed! I didn't read on ZSTD_DStreamOutSize == ZSTD_BLOCKSIZE_MAX but yes you are correct

Viq111 · 2021-06-04T21:12:02Z

zstd_stream.go

-		err = errShortRead
+		if r.decompOff > 0 {
+			return r.decompOff, nil
+		}


Thinking also about a input stream that got accidentaly cut off (so it starts returning EOF but we still have some zstd partial data), we could return a io.UnexpectedEOF
https://golang.org/pkg/io/#pkg-variables

Functions should return EOF only to signal a graceful end of input. If the EOF occurs unexpectedly in a structured data stream, the appropriate error is either ErrUnexpectedEOF or some other error giving more detail.

Added some code & comment about that. https://github.com/DataDog/zstd/pull/96/files#diff-ff3e40bda515a4af0936733c5415920eb09cc903913a7243bdec1074266784b0R470 Thoughts?

Viq111 · 2021-06-04T22:39:34Z

For benchmarking, I think those changes are immaterial so I would not worry about them:

ᐅ benchstat
goos: darwin
goarch: amd64
pkg: github.com/DataDog/zstd
cpu: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
name                    old time/op    new time/op    delta
CtxCompression-16         46.8µs ± 1%    50.5µs ±13%   ~     (p=0.065 n=9+10)
CtxDecompression-16       2.70µs ± 5%    2.95µs ±15%   ~     (p=0.247 n=10+10)
StreamCompression-16       869ns ±13%     898ns ±12%   ~     (p=0.190 n=10+10)
StreamDecompression-16    2.50µs ± 7%    2.57µs ±14%   ~     (p=0.952 n=10+10)
Compression-16            51.0µs ± 8%    52.1µs ± 8%   ~     (p=0.684 n=10+10)
Decompression-16          3.07µs ±16%    3.11µs ±15%   ~     (p=0.542 n=10+10)

name                    old speed      new speed      delta
CtxCompression-16        151MB/s ± 1%   140MB/s ±12%   ~     (p=0.065 n=9+10)
CtxDecompression-16     2.61GB/s ± 5%  2.42GB/s ±14%   ~     (p=0.247 n=10+10)
StreamCompression-16    8.15GB/s ±12%  7.90GB/s ±11%   ~     (p=0.190 n=10+10)
StreamDecompression-16  2.83GB/s ± 7%  2.76GB/s ±13%   ~     (p=0.971 n=10+10)
Compression-16           139MB/s ± 7%   136MB/s ± 8%   ~     (p=0.684 n=10+10)
Decompression-16        2.31GB/s ±14%  2.28GB/s ±13%   ~     (p=0.529 n=10+10)

reader.Read used to try to fully read an internal buffer until EOF or the buffer was filled. That was buffer was set to ZSTD_DStreamInSize, which is larger than any zstd block. This means that reader.Read could try to buffer much more data than what was needed to process and return a single block from the Read method. This was an issue because we could miss an urgent Flush from a corresponding Writer by blocking. (A typical use case is instant messaging.) It was also against the general convention of io.Read that a single call should return any immediately available data without blocking, if any. Interestingly enough, the test case should have caught this up, but because we used a bytes.Buffer, the Read returned io.EOF after reading the entirety of the buffer, even if we appended to the buffer later on. The test case is also fixed by this commit. Fixes: DataDog#95

delthas · 2021-10-13T12:58:30Z

Gentle ping 😀

delthas marked this pull request as draft March 2, 2021 22:12

delthas force-pushed the fix-stream-read branch from cdb27d7 to 67c6b8e Compare March 30, 2021 10:16

delthas marked this pull request as ready for review March 30, 2021 10:19

delthas force-pushed the fix-stream-read branch from 67c6b8e to 25758a4 Compare March 30, 2021 10:21

Viq111 reviewed Jun 4, 2021

View reviewed changes

delthas force-pushed the fix-stream-read branch from 25758a4 to 42c5dcb Compare July 9, 2021 13:36

Viq111 approved these changes Nov 24, 2021

View reviewed changes

Merge branch '1.x' into fix-stream-read

0b51bae

Viq111 merged commit c4c921b into DataDog:1.x Nov 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[zstd_stream] Don't block in reader.Read if a zstd block is available #96

[zstd_stream] Don't block in reader.Read if a zstd block is available #96

delthas commented Mar 2, 2021

delthas commented Mar 2, 2021

delthas commented Mar 30, 2021 •

edited

Loading

delthas commented Mar 30, 2021

delthas commented May 24, 2021

Viq111 commented Jun 1, 2021

Viq111 left a comment

Viq111 Jun 4, 2021

Viq111 Jun 4, 2021

delthas Jul 9, 2021

Viq111 Nov 24, 2021

Viq111 Jun 4, 2021

delthas Jul 9, 2021

Viq111 commented Jun 4, 2021

delthas commented Oct 13, 2021

[zstd_stream] Don't block in reader.Read if a zstd block is available #96

[zstd_stream] Don't block in reader.Read if a zstd block is available #96

Conversation

delthas commented Mar 2, 2021

delthas commented Mar 2, 2021

delthas commented Mar 30, 2021 • edited Loading

delthas commented Mar 30, 2021

delthas commented May 24, 2021

Viq111 commented Jun 1, 2021

Viq111 left a comment

Choose a reason for hiding this comment

Viq111 Jun 4, 2021

Choose a reason for hiding this comment

Viq111 Jun 4, 2021

Choose a reason for hiding this comment

delthas Jul 9, 2021

Choose a reason for hiding this comment

Viq111 Nov 24, 2021

Choose a reason for hiding this comment

Viq111 Jun 4, 2021

Choose a reason for hiding this comment

delthas Jul 9, 2021

Choose a reason for hiding this comment

Viq111 commented Jun 4, 2021

delthas commented Oct 13, 2021

delthas commented Mar 30, 2021 •

edited

Loading