Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audit reqwest #5

Open
Shnatsel opened this issue Jul 21, 2019 · 5 comments
Open

Audit reqwest #5

Shnatsel opened this issue Jul 21, 2019 · 5 comments

Comments

@Shnatsel
Copy link
Member

https://crates.io/crates/reqwest

Popular HTTP client library. 5000 downloads/day. High-risk due to handling untrusted input from the network. Uses unsafe.

@Shnatsel
Copy link
Member Author

It also has a ton of transitive dependencies with unsafe code in them. We should open an issue for auditing each of them.

@64
Copy link

64 commented Jul 22, 2019

Minimal unsafe here.

There are 3 areas where it’s used. The first one is:

fn poll(&mut self) -> Poll<Option<Self::Item>, Self::Error> {
        if self.buf.remaining_mut() == 0 {
            self.buf.reserve(INIT_BUFFER_SIZE);
        }

        // The buffer contains uninitialised memory so getting a readable slice is unsafe.
        // We trust the `flate2` and `miniz` writer not to read from the memory given.
        //
        // To be safe, this memory could be zeroed before passing to `flate2`.
        // Otherwise we might need to deal with the case where `flate2` panics.
        let read = try_io!(self.inner.read(unsafe { self.buf.bytes_mut() }));

        if read == 0 {
            // If GzDecoder reports EOF, it doesn't necessarily mean the
            // underlying stream reached EOF (such as the `0\r\n\r\n`
            // header meaning a chunked transfer has completed). If it
            // isn't polled till EOF, the connection may not be able
            // to be re-used.
            //
            // See https://github.com/seanmonstar/reqwest/issues/508.
            let inner_read = try_io!(self.inner.get_mut().read(&mut [0]));
            if inner_read == 0 {
                Ok(Async::Ready(None))
            } else {
                Err(error::from(io::Error::new(
                    io::ErrorKind::InvalidData,
                    "unexpected data after gzip decoder signaled end-of-file",
                )))
            }
        } else {
            unsafe { self.buf.advance_mut(read) };
            let chunk = Chunk::from_chunk(self.buf.split_to(read).freeze());

            Ok(Async::Ready(Some(chunk)))
        }
    }

As you can see, uninitialised memory is being passed to the reader which is in general quite a bad idea, but should be harmless in the case that the reader type is known not to look at the memory. The second area that uses unsafe does a similar thing, except it’s slightly worse because it’s doing it with a generic reader.

The solution would be to zero out the memory before passing to the reader. However, this needs to be benchmarked because it’s quite possible that it would have a significant impact. The impact could be mitigated somewhat by using something which lowers to alloc_zeroed inside bytes, but there is no abstraction for this yet (See tokio-rs/bytes#270).

This seems like a pretty common issue in the ecosystem as well so it would be good to think of longer term solutions, e.g some kind of Read shim which accepts a kind of mutable slice with variable length so that uninitialised memory is never exposed to safe code.


The third usage of unsafe is here:

    fn poll(&mut self) -> Poll<Self::Item, Self::Error> {
        let bytes = try_ready!(self.concat.poll());
        // a block because of borrow checker
        {
            let (text, _, _) = self.encoding.decode(&bytes);
            match text {
                Cow::Owned(s) => return Ok(Async::Ready(s)),
                _ => (),
            }
        }
        unsafe {
            // decoding returned Cow::Borrowed, meaning these bytes
            // are already valid utf8
            Ok(Async::Ready(String::from_utf8_unchecked(bytes.to_vec())))
        }
    }

This is a tricky one. Cow::into_owned is suitable here but potentially regresses performance. The current unsafe version avoids a copy (of a possibly very large buffer) when the Cow is Borrowed. Don’t see an easy solution - any ideas?

@Nemo157
Copy link

Nemo157 commented Jul 22, 2019

This seems like a pretty common issue in the ecosystem as well so it would be good to think of longer term solutions, e.g some kind of Read shim which accepts a kind of mutable slice with variable length so that uninitialised memory is never exposed to safe code.

There is an existing experiment into providing an abstraction at the IO level on whether a specific instance can deal with an uninitialized output slice correctly: https://doc.rust-lang.org/nightly/std/io/trait.Read.html#method.initializer (although the comments on the tracking issue make it sound like that API is going to change a lot).

@Shnatsel
Copy link
Member Author

Shnatsel commented Jul 22, 2019

Yeah, initializer sounds like it did not pan out.

Some of the safety issues stem from the fact that read_to_end() writes to a Vec and so does not require the underlying memory to be initialized, but read_exact() writes to a slice, requiring initialization. If we had a Vec-like fixed-capacity view of contiguous memory, it would be possible to use it here instead and avoid the initialization while preserving memory safety. https://crates.io/crates/buffer provides an inkling of what that would look like.

We should make a tracking issue for this and/or put it prominently on the repo itself as an unresolved issue.

There is actually a cool trick for reading from iterators that sidesteps the initialization issue and manages to use read_to_end() as if it were read_exact(), see sile/libflate#39

@Shnatsel
Copy link
Member Author

reqwest is currently under a lot of churn due to transitioning to async/await, so let's postpone this one for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants