Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming Decompressor #51

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 23 additions & 1 deletion benches/lzma.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

extern crate test;

use std::io::Read;
use std::io::{Read, Write};
use test::Bencher;

fn compress_bench(x: &[u8], b: &mut Bencher) {
Expand Down Expand Up @@ -31,13 +31,28 @@ fn decompress_bench(compressed: &[u8], b: &mut Bencher) {
});
}

fn decompress_stream_bench(compressed: &[u8], b: &mut Bencher) {
b.iter(|| {
let mut stream = lzma_rs::decompress::Stream::new(Vec::new());
stream.write_all(compressed).unwrap();
stream.finish().unwrap();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I belive the resulting decompressed output should be the result of the lambda function passed to Bencher::iter, to make sure it's not optimized away. However, other benchmarks don't currently do that, but I can send a pull request to fix that accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

});
}

fn decompress_bench_file(compfile: &str, b: &mut Bencher) {
let mut f = std::fs::File::open(compfile).unwrap();
let mut compressed = Vec::new();
f.read_to_end(&mut compressed).unwrap();
decompress_bench(&compressed, b);
}

fn decompress_stream_bench_file(compfile: &str, b: &mut Bencher) {
let mut f = std::fs::File::open(compfile).unwrap();
let mut compressed = Vec::new();
f.read_to_end(&mut compressed).unwrap();
decompress_stream_bench(&compressed, b);
}

#[bench]
fn compress_empty(b: &mut Bencher) {
#[cfg(feature = "enable_logging")]
Expand Down Expand Up @@ -87,6 +102,13 @@ fn decompress_big_file(b: &mut Bencher) {
decompress_bench_file("tests/files/foo.txt.lzma", b);
}

#[bench]
fn decompress_stream_big_file(b: &mut Bencher) {
#[cfg(feature = "enable_logging")]
let _ = env_logger::try_init();
decompress_stream_bench_file("tests/files/foo.txt.lzma", b);
}

#[bench]
fn decompress_huge_dict(b: &mut Bencher) {
#[cfg(feature = "enable_logging")]
Expand Down
122 changes: 88 additions & 34 deletions src/decode/lzbuffer.rs
Original file line number Diff line number Diff line change
@@ -1,38 +1,51 @@
use crate::error;
use std::io;

pub trait LZBuffer {
pub trait LZBuffer<W>
where
W: io::Write,
{
fn len(&self) -> usize;
// Retrieve the last byte or return a default
fn last_or(&self, lit: u8) -> u8;
// Retrieve the n-th last byte
fn last_n(&self, dist: usize) -> error::Result<u8>;
// Append a literal
fn append_literal(&mut self, lit: u8) -> io::Result<()>;
fn append_literal(&mut self, lit: u8) -> error::Result<()>;
// Fetch an LZ sequence (length, distance) from inside the buffer
fn append_lz(&mut self, len: usize, dist: usize) -> error::Result<()>;
// Get a reference to the output sink
fn get_ref(&self) -> &W;
// Get a mutable reference to the output sink
fn get_mut(&mut self) -> &mut W;
Comment on lines +17 to +20
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide meaningful names, such as get_output or get_writer or get_sink? Then get_foo_mut.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with get_output and get_output_mut.

// Flush the buffer to the output
fn finish(self) -> io::Result<()>;
fn finish(self) -> io::Result<W>;
}

// An accumulating buffer for LZ sequences
pub struct LZAccumBuffer<'a, W>
pub struct LZAccumBuffer<W>
where
W: 'a + io::Write,
W: io::Write,
{
stream: &'a mut W, // Output sink
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes to the interface taking a Write rather than a &'a mut Write make sense. However, they seem decoupled from this pull request, and would be worth submitting as a separate pull request to study their effect on the benchmarks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. See PR #54.

buf: Vec<u8>, // Buffer
len: usize, // Total number of bytes sent through the buffer
stream: W, // Output sink
buf: Vec<u8>, // Buffer
memlimit: usize, // Buffer memory limit
len: usize, // Total number of bytes sent through the buffer
}

impl<'a, W> LZAccumBuffer<'a, W>
impl<W> LZAccumBuffer<W>
where
W: io::Write,
{
pub fn from_stream(stream: &'a mut W) -> Self {
pub fn from_stream(stream: W) -> Self {
Self::from_stream_with_memlimit(stream, std::usize::MAX)
}

pub fn from_stream_with_memlimit(stream: W, memlimit: usize) -> Self {
Self {
stream,
buf: Vec::new(),
memlimit,
len: 0,
}
}
Expand All @@ -52,7 +65,7 @@ where
}
}

impl<'a, W> LZBuffer for LZAccumBuffer<'a, W>
impl<W> LZBuffer<W> for LZAccumBuffer<W>
where
W: io::Write,
{
Expand Down Expand Up @@ -84,10 +97,19 @@ where
}

// Append a literal
fn append_literal(&mut self, lit: u8) -> io::Result<()> {
self.buf.push(lit);
self.len += 1;
Ok(())
fn append_literal(&mut self, lit: u8) -> error::Result<()> {
let new_len = self.len + 1;

if new_len > self.memlimit {
Err(error::Error::LZMAError(format!(
"exceeded memory limit of {}",
self.memlimit
)))
} else {
self.buf.push(lit);
self.len = new_len;
Ok(())
}
}

// Fetch an LZ sequence (length, distance) from inside the buffer
Expand All @@ -111,36 +133,48 @@ where
Ok(())
}

// Get a reference to the output sink
fn get_ref(&self) -> &W {
&self.stream
}

// Get a mutable reference to the output sink
fn get_mut(&mut self) -> &mut W {
&mut self.stream
}

// Flush the buffer to the output
fn finish(self) -> io::Result<()> {
fn finish(mut self) -> io::Result<W> {
self.stream.write_all(self.buf.as_slice())?;
self.stream.flush()?;
Ok(())
Ok(self.stream)
}
}

// A circular buffer for LZ sequences
pub struct LZCircularBuffer<'a, W>
pub struct LZCircularBuffer<W>
where
W: 'a + io::Write,
W: io::Write,
{
stream: &'a mut W, // Output sink
buf: Vec<u8>, // Circular buffer
dict_size: usize, // Length of the buffer
cursor: usize, // Current position
len: usize, // Total number of bytes sent through the buffer
stream: W, // Output sink
buf: Vec<u8>, // Circular buffer
dict_size: usize, // Length of the buffer
memlimit: usize, // Buffer memory limit
cursor: usize, // Current position
len: usize, // Total number of bytes sent through the buffer
}

impl<'a, W> LZCircularBuffer<'a, W>
impl<W> LZCircularBuffer<W>
where
W: io::Write,
{
pub fn from_stream(stream: &'a mut W, dict_size: usize) -> Self {
pub fn from_stream_with_memlimit(stream: W, dict_size: usize, memlimit: usize) -> Self {
lzma_info!("Dict size in LZ buffer: {}", dict_size);
Self {
stream,
buf: Vec::new(),
dict_size,
memlimit,
cursor: 0,
len: 0,
}
Expand All @@ -150,15 +184,25 @@ where
*self.buf.get(index).unwrap_or(&0)
}

fn set(&mut self, index: usize, value: u8) {
if self.buf.len() < index + 1 {
self.buf.resize(index + 1, 0);
fn set(&mut self, index: usize, value: u8) -> error::Result<()> {
let new_len = index + 1;

if self.buf.len() < new_len {
if new_len <= self.memlimit {
self.buf.resize(new_len, 0);
} else {
return Err(error::Error::LZMAError(format!(
"exceeded memory limit of {}",
self.memlimit
)));
}
}
self.buf[index] = value;
Ok(())
}
}

impl<'a, W> LZBuffer for LZCircularBuffer<'a, W>
impl<W> LZBuffer<W> for LZCircularBuffer<W>
where
W: io::Write,
{
Expand Down Expand Up @@ -195,8 +239,8 @@ where
}

// Append a literal
fn append_literal(&mut self, lit: u8) -> io::Result<()> {
self.set(self.cursor, lit);
fn append_literal(&mut self, lit: u8) -> error::Result<()> {
self.set(self.cursor, lit)?;
self.cursor += 1;
self.len += 1;

Expand Down Expand Up @@ -237,12 +281,22 @@ where
Ok(())
}

// Get a reference to the output sink
fn get_ref(&self) -> &W {
&self.stream
}

// Get a mutable reference to the output sink
fn get_mut(&mut self) -> &mut W {
&mut self.stream
}

// Flush the buffer to the output
fn finish(self) -> io::Result<()> {
fn finish(mut self) -> io::Result<W> {
if self.cursor > 0 {
self.stream.write_all(&self.buf[0..self.cursor])?;
self.stream.flush()?;
}
Ok(())
Ok(self.stream)
}
}
Loading