Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spec: 0-sized compressed blocks permitted by spec #3090

Closed
klauspost opened this issue Mar 9, 2022 · 9 comments
Closed

spec: 0-sized compressed blocks permitted by spec #3090

klauspost opened this issue Mar 9, 2022 · 9 comments

Comments

@klauspost
Copy link

klauspost commented Mar 9, 2022

Describe the bug

The current decompression code fails if the remainder of a block is sent to the literal decoder with only 2 bytes.

With 0 literals and 0 sequences that condition can be met.

While it is easy to argue that a block with such encoding is pointless (it should just be a raw block), I don't see any mention of it in the spec, and as far as I can tell it is conforming to spec. Ofc. I might have missed it.

Example: 2274d31e0d569fe9e31bedc4f9fddd9c9f114c2f.zst.gz

@Cyan4973
Copy link
Contributor

Cyan4973 commented Mar 9, 2022

Interesting, we have a harness test program which is supposed to generate all sorts of valid blocks, and is a good tool to validate compliance of decoders.
It should have been able to generate such empty block.
We probably need to update this test program if it doesn't.

@terrelln
Copy link
Contributor

terrelln commented Mar 9, 2022

Thanks for the report!

We'll fix this bug in the next release.

We've had other bugs in the decoder that rejected valid frames. I will put up a PR shortly with a document that lists all the known cases where we've rejected a valid frame, an example frame, and the version we fixed the issue.

@terrelln
Copy link
Contributor

terrelln commented Mar 9, 2022

Actually, this block used to be invalid. Older versions of the spec had this clause:

A Compressed_Block has the extra restriction that Block_Size is always strictly less than the decompressed size. If this condition cannot be respected, the block must be sent uncompressed instead (Raw_Block).

We removed that clause because the decoder didn't enforce it. And because it made certain topics (namely block splitting) a lot simpler. Since there are cases where you'd want to emit a small block that had large headers, but you'd then re-use those headers in later blocks to amortize the cost.

However, it looks like we missed the edge case where compressed block is 0 sized.

@Cyan4973
Copy link
Contributor

Cyan4973 commented Mar 9, 2022

Actually, this block used to be invalid.

That explains why it wasn't part of harness.
Indeed, subtle consequence of a small change !

@terrelln
Copy link
Contributor

terrelln commented Mar 9, 2022

This frame is the simplest test case. The only difference from the one you provided is that yours has the Unused_Bit of the frame header set to 1.

28b5 2ffd 2000 1500 0000 00

@terrelln
Copy link
Contributor

terrelln commented Mar 9, 2022

The one liner fix is to change MIN_CBLOCK_SIZE from 3 to 2.

However, that definitely breaks some bounds checks in ZSTD_decodeLiteralsBlock(), and maybe elsewhere. So more care is needed.

@terrelln
Copy link
Contributor

terrelln commented Mar 9, 2022

See #3092 for the decompressor errata document.

@klauspost
Copy link
Author

klauspost commented Mar 10, 2022

@terrelln MIN_CBLOCK_SIZE = 3 is correct for blocks, but for a compressed literal section it isn't, since at least one byte has already been read from the block.

So I think MIN_CBLOCK_SIZE should be kept at 3, and either MIN_CBLOCK_SIZE - 1 or a new MIN_LITSEC_SIZE = 2 should be added.

@Cyan4973 Yeah, the input is from fuzz testing, so I am not surprised there are some wonky bits in there :)

@terrelln
Copy link
Contributor

@klauspost MIN_CBLOCK_SIZE isn't supposed to be including the block header (at least in the way it is used in the literals section decoding). In this case the block size is 2 bytes. 1 byte for the literals section, and 1 byte for the sequences section.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants