-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streaming Decompressor v3 #58
Conversation
333433f
to
c4b36d6
Compare
c2a4d53
to
693d572
Compare
Create a new struct `Stream` that uses the `std::io::Write` interface to read chunks of compressed data and write them to an output sink. Add a streaming mode so processing can work with streaming chunks of data. This is required because process() assumed the input reader contained a complete stream. Update flags and try_process_next() were added to handle when the decompressor requests more input bytes than are available. Data is temporarily buffered in the DecoderState if more input bytes are required to make progress. This commit also adds utility functions to the rangecoder for working with streaming data. Adds an allow_incomplete option to disable end of stream checks when calling `finish()` on a stream. This is because some users may want to retrieve partially decompressed data.
693d572
to
00b4e3b
Compare
Benchmarks are: master
streaming-decompressor-v3
|
@gendx do you have an update on when you will be able to review this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello,
Sorry again for the delay in reviewing, a lot of things came up which meant I didn't have a lot of time in the past couple months, and this change is non-trivial to review.
I think we're getting there, most comments are for minor changes.
if let Err(error) = stream.write_all(&compressed) { | ||
// WriteZero could indicate that the unpacked_size was reached before the | ||
// end of the stream | ||
if std::io::ErrorKind::WriteZero != error.kind() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: error.kind() != std::io::ErrorKind::WriteZero
is more readable.
{ | ||
let mut stream = lzma_rs::decompress::Stream::new_with_options(decode_options, Vec::new()); | ||
|
||
if let Err(error) = stream.write_all(&compressed) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is there an error check here but not in round_trip_no_options
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct. A simple .unwrap()
will do. This was because a test case would previously fail with WriteZero
for some Options
when unpacked_size
was reached.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take that back. After running the tests with --all-features
it shows that unpacked_size_write_none_to_header_and_use_provided_on_read
can return a WriteZero
io error:
thread 'unpacked_size_write_none_to_header_and_use_provided_on_read' panicked
at 'called `Result::unwrap()` on an `Err` value:
Custom { kind: WriteZero, error: "failed to write whole buffer" }', tests/lzma.rs:88:39
This is expected behavior. We are encoding 0xFFFF_FFFF_FFFF
in the header but using the provided unpacked size during decoding. Therefore decoding will stop before the 5 to 6 byte end-of-stream marker is read.
The check is here but not in round_trip_no_options
because it only applies when certain options are provided like in unpacked_size_write_none_to_header_and_use_provided_on_read
. The round_trip_no_options
function only uses default options.
I added more documentation and a match statement to make this more clear.
let mut input = Vec::new(); | ||
std::fs::File::open(compfile) | ||
.unwrap() | ||
.read_to_end(&mut input) | ||
.unwrap(); | ||
input |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this pattern appears twice already in this function, it would make sense to create a utility function for it.
stream.write_all(&tmp[0..n]).unwrap(); | ||
|
||
n > 0 | ||
} {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is not readable. Can you use if
together with break
for the exit condition?
assert_eq!(decomp, b"") | ||
assert_decomp_eq( | ||
b"\x5d\x00\x00\x80\x00\xff\xff\xff\xff\xff\xff\xff\xff\x00\x83\xff\ | ||
\xfb\xff\xff\xc0\x00\x00\x00", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: indentation should better align here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rust-analyzer vscode plugin displays inlay hints and this was offsetting the first line in my IDE. Thankfully this can be disabled :) .
let tmp = *self.tmp.get_ref(); | ||
|
||
// Check if we need more data to advance the decompressor | ||
if Mode::Run == mode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment about if variable == Constant
being more readable than the reverse. There are a few more in the code.
break; | ||
}; | ||
} else { | ||
if (Mode::Run == mode) && (rangecoder.remaining()? < MAX_REQUIRED_INPUT) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment about if variable == Constant
being more readable than the reverse. There are a few more in the code.
break; | ||
}; | ||
} else { | ||
if (Mode::Run == mode) && (rangecoder.remaining()? < MAX_REQUIRED_INPUT) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parentheses are not necessary.
let range = rangecoder.range(); | ||
let code = rangecoder.code(); | ||
let buf = rangecoder.buf()?; | ||
|
||
if self.try_process_next(buf, range, code).is_err() { | ||
let bytes_read = rangecoder.read_into(&mut self.tmp.get_mut()[..])?; | ||
let bytes_read = if bytes_read < std::u64::MAX as usize { | ||
bytes_read as u64 | ||
} else { | ||
return Err(error::Error::LZMAError( | ||
"Failed to convert integer to u64.".to_string(), | ||
)); | ||
}; | ||
self.tmp.set_position(bytes_read); | ||
return Ok(()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code looks similar to the above. Can you separate it into a utility function to avoid code duplication?
@@ -0,0 +1,510 @@ | |||
use crate::decode::lzbuffer::{LZBuffer, LZCircularBuffer}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I didn't look in detail at this file (yet). But it is restricted to the stream
feature.
Thanks for the review :). I published a new PR #59 with the requested changes. I had some issues with the suggested changes for the |
Pull Request Overview
Changes since last PR:
"stream"
._check()
functions, using anupdate
flag instead.tmp
,tmp_len
buffers to use aCursor
.Testing Strategy
This pull request was tested by...
.lzma
,.lzma2
,.xz
files).Supporting Documentation and References
Link to previous PR: #55