Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[zstd_stream] Reader.Read can block even if a zstd block is available #95

Closed
delthas opened this issue Mar 2, 2021 · 1 comment · Fixed by #96
Closed

[zstd_stream] Reader.Read can block even if a zstd block is available #95

delthas opened this issue Mar 2, 2021 · 1 comment · Fixed by #96

Comments

@delthas
Copy link
Contributor

delthas commented Mar 2, 2021

Suppose a zstd.reader reads from a source which emits (and flushes) 1 small zstd block, then keeps the connection open (a typical use case would be compressed instant messaging).

If the block is too small, then the zstd.reader will hang forever trying to read a minimum amount of data from its source before actually processing it and passing it to zstd.

This is due to zstd.reader.Read calling TryReadFull to fill its compression buffer of initial size ZSTD_DStreamInSize (which is ZSTD_BLOCKSIZE_MAX + ZSTD_blockHeaderSize, larger than any zstd block). TryReadFull will keep trying to read to fill that buffer without trying to pass what it got so far on each Read to zstd.

The way to fix this issue would be to not call TryReadFull but instead simply Read on the underlying reader, and pass the intermediary result to zstd to check if a block is ready, if not try again, until zstd can read a block.

There could be a slight performance impact to calling zstd multiple times (crossing the Go/C boundary) so we could make this behaviour dependent on some new "low-latency" flag or something if it proves to hurt performance too much.

delthas added a commit to delthas/zstd that referenced this issue Mar 2, 2021
reader.Read used to try to fully read an internal buffer until EOF or
the buffer was filled. That was buffer was set to ZSTD_DStreamInSize,
which is larger than any zstd block.

This means that reader.Read could try to buffer much more data than
what was needed to process and return a single block from the Read
method.

This was an issue because we could miss an urgent Flush from a
corresponding Writer by blocking. (A typical use case is instant
messaging.) It was also against the general convention of io.Read that a
single call should return any immediately available data without
blocking, if any.

Interestingly enough, the test case should have caught this up, but
because we used a bytes.Buffer, the Read returned io.EOF after reading
the entirety of the buffer, even if we appended to the buffer later on.
The test case is also fixed by this commit.

Fixes: DataDog#95
@Viq111
Copy link
Collaborator

Viq111 commented Mar 8, 2021

Thanks for the detailed writeup! It's indeed a good idea. Looking back at the history, we never had a strong argument for using ReadFull except the fact that you mentioned that we won't cross the cgo barrier too often for small buffers.

I'm not a fan in general of having compile flag (if that what you meant) as this usually poorly expose functionality for new users.
Since we have an object here (reader), we could start thinking about adding options without breaking the existing API for current users.

Something like:

type readerConfig  struct {
  allowSmallBuffers bool
  // enableChecksum // in the future to support https://github.com/DataDog/zstd/issues/43
}

func ReaderAllowSmallBuffers() func(c *readerConfig) {
  c.allowSmallBuffers = true
}

func (r *reader) ApplyOptions(options ...func(*readerConfig)) {
  // Set different params
}

This could then be used as:

r := NewReader(myReader)
defer r.Close()
r.ApplyOptions(ReaderAllowSmallBuffers(), ReaderEnableChecksum(), ...)

// Use as before

What do you think ?

delthas added a commit to delthas/zstd that referenced this issue Mar 30, 2021
reader.Read used to try to fully read an internal buffer until EOF or
the buffer was filled. That was buffer was set to ZSTD_DStreamInSize,
which is larger than any zstd block.

This means that reader.Read could try to buffer much more data than
what was needed to process and return a single block from the Read
method.

This was an issue because we could miss an urgent Flush from a
corresponding Writer by blocking. (A typical use case is instant
messaging.) It was also against the general convention of io.Read that a
single call should return any immediately available data without
blocking, if any.

Interestingly enough, the test case should have caught this up, but
because we used a bytes.Buffer, the Read returned io.EOF after reading
the entirety of the buffer, even if we appended to the buffer later on.
The test case is also fixed by this commit.

Fixes: DataDog#95
delthas added a commit to delthas/zstd that referenced this issue Mar 30, 2021
reader.Read used to try to fully read an internal buffer until EOF or
the buffer was filled. That was buffer was set to ZSTD_DStreamInSize,
which is larger than any zstd block.

This means that reader.Read could try to buffer much more data than
what was needed to process and return a single block from the Read
method.

This was an issue because we could miss an urgent Flush from a
corresponding Writer by blocking. (A typical use case is instant
messaging.) It was also against the general convention of io.Read that a
single call should return any immediately available data without
blocking, if any.

Interestingly enough, the test case should have caught this up, but
because we used a bytes.Buffer, the Read returned io.EOF after reading
the entirety of the buffer, even if we appended to the buffer later on.
The test case is also fixed by this commit.

Fixes: DataDog#95
delthas added a commit to delthas/zstd that referenced this issue Jul 9, 2021
reader.Read used to try to fully read an internal buffer until EOF or
the buffer was filled. That was buffer was set to ZSTD_DStreamInSize,
which is larger than any zstd block.

This means that reader.Read could try to buffer much more data than
what was needed to process and return a single block from the Read
method.

This was an issue because we could miss an urgent Flush from a
corresponding Writer by blocking. (A typical use case is instant
messaging.) It was also against the general convention of io.Read that a
single call should return any immediately available data without
blocking, if any.

Interestingly enough, the test case should have caught this up, but
because we used a bytes.Buffer, the Read returned io.EOF after reading
the entirety of the buffer, even if we appended to the buffer later on.
The test case is also fixed by this commit.

Fixes: DataDog#95
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants