-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[zstd_stream] Reader.Read can block even if a zstd block is available #95
Comments
reader.Read used to try to fully read an internal buffer until EOF or the buffer was filled. That was buffer was set to ZSTD_DStreamInSize, which is larger than any zstd block. This means that reader.Read could try to buffer much more data than what was needed to process and return a single block from the Read method. This was an issue because we could miss an urgent Flush from a corresponding Writer by blocking. (A typical use case is instant messaging.) It was also against the general convention of io.Read that a single call should return any immediately available data without blocking, if any. Interestingly enough, the test case should have caught this up, but because we used a bytes.Buffer, the Read returned io.EOF after reading the entirety of the buffer, even if we appended to the buffer later on. The test case is also fixed by this commit. Fixes: DataDog#95
Thanks for the detailed writeup! It's indeed a good idea. Looking back at the history, we never had a strong argument for using I'm not a fan in general of having compile flag (if that what you meant) as this usually poorly expose functionality for new users. Something like: type readerConfig struct {
allowSmallBuffers bool
// enableChecksum // in the future to support https://github.com/DataDog/zstd/issues/43
}
func ReaderAllowSmallBuffers() func(c *readerConfig) {
c.allowSmallBuffers = true
}
func (r *reader) ApplyOptions(options ...func(*readerConfig)) {
// Set different params
} This could then be used as: r := NewReader(myReader)
defer r.Close()
r.ApplyOptions(ReaderAllowSmallBuffers(), ReaderEnableChecksum(), ...)
// Use as before What do you think ? |
reader.Read used to try to fully read an internal buffer until EOF or the buffer was filled. That was buffer was set to ZSTD_DStreamInSize, which is larger than any zstd block. This means that reader.Read could try to buffer much more data than what was needed to process and return a single block from the Read method. This was an issue because we could miss an urgent Flush from a corresponding Writer by blocking. (A typical use case is instant messaging.) It was also against the general convention of io.Read that a single call should return any immediately available data without blocking, if any. Interestingly enough, the test case should have caught this up, but because we used a bytes.Buffer, the Read returned io.EOF after reading the entirety of the buffer, even if we appended to the buffer later on. The test case is also fixed by this commit. Fixes: DataDog#95
reader.Read used to try to fully read an internal buffer until EOF or the buffer was filled. That was buffer was set to ZSTD_DStreamInSize, which is larger than any zstd block. This means that reader.Read could try to buffer much more data than what was needed to process and return a single block from the Read method. This was an issue because we could miss an urgent Flush from a corresponding Writer by blocking. (A typical use case is instant messaging.) It was also against the general convention of io.Read that a single call should return any immediately available data without blocking, if any. Interestingly enough, the test case should have caught this up, but because we used a bytes.Buffer, the Read returned io.EOF after reading the entirety of the buffer, even if we appended to the buffer later on. The test case is also fixed by this commit. Fixes: DataDog#95
reader.Read used to try to fully read an internal buffer until EOF or the buffer was filled. That was buffer was set to ZSTD_DStreamInSize, which is larger than any zstd block. This means that reader.Read could try to buffer much more data than what was needed to process and return a single block from the Read method. This was an issue because we could miss an urgent Flush from a corresponding Writer by blocking. (A typical use case is instant messaging.) It was also against the general convention of io.Read that a single call should return any immediately available data without blocking, if any. Interestingly enough, the test case should have caught this up, but because we used a bytes.Buffer, the Read returned io.EOF after reading the entirety of the buffer, even if we appended to the buffer later on. The test case is also fixed by this commit. Fixes: DataDog#95
Suppose a zstd.reader reads from a source which emits (and flushes) 1 small zstd block, then keeps the connection open (a typical use case would be compressed instant messaging).
If the block is too small, then the zstd.reader will hang forever trying to read a minimum amount of data from its source before actually processing it and passing it to zstd.
This is due to zstd.reader.Read calling TryReadFull to fill its compression buffer of initial size
ZSTD_DStreamInSize
(which isZSTD_BLOCKSIZE_MAX + ZSTD_blockHeaderSize
, larger than any zstd block). TryReadFull will keep trying to read to fill that buffer without trying to pass what it got so far on each Read to zstd.The way to fix this issue would be to not call TryReadFull but instead simply Read on the underlying reader, and pass the intermediary result to zstd to check if a block is ready, if not try again, until zstd can read a block.
There could be a slight performance impact to calling zstd multiple times (crossing the Go/C boundary) so we could make this behaviour dependent on some new "low-latency" flag or something if it proves to hurt performance too much.
The text was updated successfully, but these errors were encountered: