-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spec: 0-sized compressed blocks in frames with window_size = 0 #3482
Comments
When you mention "The Spec" I presume you refer to https://github.com/facebook/zstd/blob/dev/doc/zstd_compression_format.md , current version ( (Specification has evolved a bit over time, and while the main parts have remained stable, little edge details (like this one) could have moved or the exact wording could have been updated, it's an easy source of difference in interpretation.) Now, according to the wording of this specification, you are right, an empty frame with a known decompressed size has a window size of Note that this is different from sending an empty compressed block in any other context : as soon as the frame is non empty, We shall have a look at the test case at The golden sample name |
Yes, by spec I mean the compression format docs.
I agree. |
It's worth noting that the reference implementation is inconsistent on checking the block size against
Here's an example of a file that will succeed when run through block-size-exceeds-max-input.zip (it's a zip file with only the .zst input inside it) |
Yeah, you are correct that the frame is invalid. It is only intending to test the empty compressed block example. I will update the frame header to use a
That is interesting, thanks for the example file, that is very useful! I don't think I want to change that behavior right now, as we're about to make a release, and I'd rather make it right after the release, to make sure our fuzzers get a chance to test that change, and verify that we can still round trip successfully. The libzstd decoder is definitely more forgiving than the spec (for example it allows any offset that is within its history buffer, even if it is beyond the window size), especially where we gain performance for being more lax. But we should strive to be consistent with the spec where it is easy to do so. |
This frame is invalid because the `Window_Size = 0`, and the `Block_Maximum_Size = min(128 KB, Window_Size) = 0`. But the empty compressed block has a `Block_Content` size of 2, which is invalid. The fix is to switch to using a `Window_Descriptor` instead of the `Single_Segment_Flag`. This sets the `Window_Size = 1024`. Hexdump before this PR: `28b5 2ffd 2000 1500 0000 00` Hexdump after this PR: `28b5 2ffd 0000 1500 0000 00` For issue facebook#3482.
There is indeed a subtle distinction between a conformant decoder, and a validation tool. A conformant decoder must be able to decode all combinations defined by the spec, but it might also be more permissive. This can be justified by other considerations, such as speed, binary size, memory usage or code complexity. As a side note, the situation is reversed for the compression side : a conformant compressor must only send combinations of features allowed by the spec, but it doesn't have to support them all. This direction is typically easier to understand and accept. But it follows that employing either a conformant decoder or encoder as validators is partially flawed, because the encoder may never generate some combinations, which are nonetheless allowed by the spec, and the decoder might permissively accept more combinations than what is strictly defined by the spec. Bridging that gap requires dedicated tools, laser-focused on spec compliance. There are already a few tools of this kind in this repository, to help library implementers, and probably more could be added. But of course, the main issue is that such efforts cost time, and that's the scarcest resource there is. |
This frame is invalid because the `Window_Size = 0`, and the `Block_Maximum_Size = min(128 KB, Window_Size) = 0`. But the empty compressed block has a `Block_Content` size of 2, which is invalid. The fix is to switch to using a `Window_Descriptor` instead of the `Single_Segment_Flag`. This sets the `Window_Size = 1024`. Hexdump before this PR: `28b5 2ffd 2000 1500 0000 00` Hexdump after this PR: `28b5 2ffd 0000 1500 0000 00` For issue #3482.
Fixed by @terrelln |
Just for clarification, what's the status of this beyond the specific |
Updating the golden file was important, because it represents a use case that a conformant decoder must support, so the older file was effectively requesting conformant decoders to support a behavior which was not in the spec. Now, as mentioned earlier, the reference implementation is just that, an implementation. It must be able to decode what is defined in the spec, but it may also accept interpreting a few payloads which go beyond the spec, as long as they are not dangerous for system safety, and still seem to represent something valid (i.e. not grossly corrupted). I haven't checked recently about the old Now, to be more complete, it's also not expensive to add such a filter, so if it's deemed preferable, that's certainly something that could be done. More generally, if the point is to have a tool that would actively reject any zstd payload which goes even mildly beyond the spec, then I think we would need a dedicated tool for that, something separate from the reference decoder, and focused on this one mission. |
The spec (according to my interpretation) would suggest that it is not possible have a compressed block in a frame that has
Window_Size
equal to 0. A compressed block always needs at least two bytes ofBlock_Content
(the literals and sequences headers).The test case at
test/golden-decompression/empty-block.zst
is the bytes28b5 2ffd 2000 1500 0000 00
; the only bit set in the frame header is the single segment flag set - according to the RFC means that it'sWindow_Size
is the content size, which is 0. The spec then says that theBlock_Content
is limited bymin(1<<17, Window_Size)
which is 0 in this case. However,empty-block.zst
has a compressed block with block content of size 2. It seems to me that this frame isn't valid according to the spec and would require a window descriptor to increase the window size while keeping the content size at 0.We came across this issue while implementing a decompressor for Zig and were unsure what the correct behaviour is supposed to be. I've just found #3090 and #3118 which make it look like the current behaviour (this frame decompressing successfully) is intended, but in this case I think the spec needs a tweak to cover this edge case.
The text was updated successfully, but these errors were encountered: