[zstd_stream] Reader.Read can block even if a zstd block is available #95

delthas · 2021-03-02T01:06:08Z

Suppose a zstd.reader reads from a source which emits (and flushes) 1 small zstd block, then keeps the connection open (a typical use case would be compressed instant messaging).

If the block is too small, then the zstd.reader will hang forever trying to read a minimum amount of data from its source before actually processing it and passing it to zstd.

This is due to zstd.reader.Read calling TryReadFull to fill its compression buffer of initial size ZSTD_DStreamInSize (which is ZSTD_BLOCKSIZE_MAX + ZSTD_blockHeaderSize, larger than any zstd block). TryReadFull will keep trying to read to fill that buffer without trying to pass what it got so far on each Read to zstd.

The way to fix this issue would be to not call TryReadFull but instead simply Read on the underlying reader, and pass the intermediary result to zstd to check if a block is ready, if not try again, until zstd can read a block.

There could be a slight performance impact to calling zstd multiple times (crossing the Go/C boundary) so we could make this behaviour dependent on some new "low-latency" flag or something if it proves to hurt performance too much.

The text was updated successfully, but these errors were encountered:

reader.Read used to try to fully read an internal buffer until EOF or the buffer was filled. That was buffer was set to ZSTD_DStreamInSize, which is larger than any zstd block. This means that reader.Read could try to buffer much more data than what was needed to process and return a single block from the Read method. This was an issue because we could miss an urgent Flush from a corresponding Writer by blocking. (A typical use case is instant messaging.) It was also against the general convention of io.Read that a single call should return any immediately available data without blocking, if any. Interestingly enough, the test case should have caught this up, but because we used a bytes.Buffer, the Read returned io.EOF after reading the entirety of the buffer, even if we appended to the buffer later on. The test case is also fixed by this commit. Fixes: DataDog#95

Viq111 · 2021-03-08T20:53:34Z

Thanks for the detailed writeup! It's indeed a good idea. Looking back at the history, we never had a strong argument for using ReadFull except the fact that you mentioned that we won't cross the cgo barrier too often for small buffers.

I'm not a fan in general of having compile flag (if that what you meant) as this usually poorly expose functionality for new users.
Since we have an object here (reader), we could start thinking about adding options without breaking the existing API for current users.

Something like:

type readerConfig  struct {
  allowSmallBuffers bool
  // enableChecksum // in the future to support https://github.com/DataDog/zstd/issues/43
}

func ReaderAllowSmallBuffers() func(c *readerConfig) {
  c.allowSmallBuffers = true
}

func (r *reader) ApplyOptions(options ...func(*readerConfig)) {
  // Set different params
}

This could then be used as:

r := NewReader(myReader)
defer r.Close()
r.ApplyOptions(ReaderAllowSmallBuffers(), ReaderEnableChecksum(), ...)

// Use as before

What do you think ?

reader.Read used to try to fully read an internal buffer until EOF or the buffer was filled. That was buffer was set to ZSTD_DStreamInSize, which is larger than any zstd block. This means that reader.Read could try to buffer much more data than what was needed to process and return a single block from the Read method. This was an issue because we could miss an urgent Flush from a corresponding Writer by blocking. (A typical use case is instant messaging.) It was also against the general convention of io.Read that a single call should return any immediately available data without blocking, if any. Interestingly enough, the test case should have caught this up, but because we used a bytes.Buffer, the Read returned io.EOF after reading the entirety of the buffer, even if we appended to the buffer later on. The test case is also fixed by this commit. Fixes: DataDog#95

delthas mentioned this issue Mar 2, 2021

[zstd_stream] Don't block in reader.Read if a zstd block is available #96

Merged

Viq111 added the enhancement label Mar 8, 2021

Viq111 closed this as completed in #96 Nov 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[zstd_stream] Reader.Read can block even if a zstd block is available #95

[zstd_stream] Reader.Read can block even if a zstd block is available #95

delthas commented Mar 2, 2021

Viq111 commented Mar 8, 2021

[zstd_stream] Reader.Read can block even if a zstd block is available #95

[zstd_stream] Reader.Read can block even if a zstd block is available #95

Comments

delthas commented Mar 2, 2021

Viq111 commented Mar 8, 2021