Channel lacks high watermark flow control #392

vdudouyt · 2024-11-24T09:49:07Z

I discovered that russh continues reading into an internal buffer even if my application can't process that bytes in time. This could be a big problem for some applications (like mine), causing unacceptable RAM usage or even a termination by OOM killer.

Steps to reproduce:

Prepare 1gb.bin:

$ dd if=/dev/urandom of=/tmp/1.bin bs=64M count=16

Start a minimal reproducible example that just creates a reader and calls tokio::time::sleep().await (you can find full main.rs in attachment):

    async fn channel_open_session(
        &mut self,
        channel: Channel<Msg>,
        _session: &mut Session,
    ) -> Result<bool, Self::Error> {
        let mut channel = channel;
        tokio::spawn(async move {
           {
               let _cin = channel.make_writer();
               let _cout = channel.make_reader();
               tokio::time::sleep(Duration::from_secs(3600)).await;
           }
            // for some reasons russh requires me to call channel.close() explicitly,
            // although I have no problems with that
           let _ = channel.close().await;
        });

        info!("channel_open_session");
        Ok(true)
    }

Send 1gb.bin with pv and ssh. You can use any password, or an empty password:

$ pv /tmp/1.bin|ssh root@127.0.0.1 -p 2222 foo

Expected behavior: russh stops reading bytes in an internal buffer after high watermark is reached (suppose 16 Mb by default), until my app consumes some bytes from the channel making that internal buffer available for accepting new bytes again.

That's how it normally works for OpenSSH server sessions, as well as for TCP streams, as well as for UNIX domain sockets, as well as for simple UNIX pipes you're familiar with in UNIX shell, ... (you can continue).

Observed behavior: russh just reads whole 1gb.bin into RAM

Is there any way to overcome that issue? (Apart from implementing some self-control at application protocol level, which isn't possible if I have to keep compatibility with existing protocols such as rsync).

As I didn't tested if a similar issue exists for writer, may I also ask what happens if my app starts writing into channel faster than the client can receive that bytes? Will it await the client once buffer high watermark is reached, or just continues storing extra bytes into memory until terminated by OOM killer?

main.rs.zip

P. S. I'm using the latest russh release, which happens to be 0.46.0

The text was updated successfully, but these errors were encountered:

vdudouyt · 2024-11-28T08:56:31Z

Any chances to have that fixed in the next release? This happens to be the last showstopper problem for me :(

Fixes Eugeny#392. Simply put, this changes its `UnboundedReceiver` from `unbounded_channel` with a `Receiver` from `channel(...)`, with a hard-coded value in `CHANNEL_BUFFER_SIZE` of 100. This means that, for each channel that's created, it can hold up to 100 `Msg` objects, which are at most as big as the window size plus metadata/indirection pointers. This is enough to force messages to be read from the underlying `TcpStream` only as quickly as the server/client is able to handle them. This has been tested with the example given in Eugeny#392, including a modification to slowly read into a non-expanding buffer, and appears to fix the issue that's been pointed out. Some other considerations: - It is still possible to cause memory to explode by opening multiple channels, although that might be intentional on the user's part. In this case, it's up to the user to handle when channels get created or not. - It might be desirable to let users pick how big the `mpsc:channel`s can be by some configuration, instead of the hardcoded value. It should be simple to modify the code in order to support this. - The limited buffering also means that control `Msg`s (eg. `Eof`) will also be delayed by the transport layer backpressure, although this might be expected by the SSH standard. Reading through RFC 4254, I couldn't find any mention of non-`Data`/`ExtendedData` messages having to be dealt differently, but I don't think it's even possible to do so over TCP. If this assumption is wrong, this might require a separate `mpsc::unbounded_channel` just for control messages. **BREAKING CHANGE**: This removes `fn send_channel_msg`, which doesn't seem to be used anywhere in this library, but is part of the publicly exposed API nonetheless.

Fixes Eugeny#392. Simply put, this changes its `UnboundedReceiver` from `unbounded_channel` with a `Receiver` from `channel(...)`, with a configurable value in `channel_buffer_size` of 100. This means that, for each channel that's created, it can hold up to 100 `Msg` objects, which are at most as big as the window size plus metadata/indirection pointers. This is enough to force messages to be read from the underlying `TcpStream` only as quickly as the server/client is able to handle them. This has been tested with the example given in Eugeny#392, including a modification to slowly read into a non-expanding buffer, and appears to fix the issue that's been pointed out. Some other considerations: - It is still possible to cause memory to explode by opening multiple channels, although that might be intentional on the user's part. In this case, it's up to the user to handle when channels get created or not. - The limited buffering also means that control `Msg`s (eg. `Eof`) will also be delayed by the transport layer backpressure, although this might be expected by the SSH standard. Reading through RFC 4254, I couldn't find any mention of non-`Data`/`ExtendedData` messages having to be dealt differently, but I don't think it's even possible to do so over TCP. If this assumption is wrong, this might require a separate `mpsc::unbounded_channel` just for control messages. **BREAKING CHANGES**: - This removes `fn send_channel_msg`, which doesn't seem to be used anywhere in this library, but is part of the publicly exposed API nonetheless. - This adds the configuration value `channel_buffer_size` to both servers and clients that allows setting how big the buffer size should be.

Eugeny · 2024-12-12T16:50:44Z

I've published v0.50.0-beta.1 which is just 0.49 but with this fix on top ✌️

vdudouyt · 2024-12-13T07:40:03Z

Thank you so much!

Eugeny added the bug Something isn't working label Nov 24, 2024

EpicEric mentioned this issue Dec 7, 2024

Add backpressure to Channel receivers #412

Merged

Eugeny closed this as completed in #412 Dec 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Channel lacks high watermark flow control #392

Channel lacks high watermark flow control #392

vdudouyt commented Nov 24, 2024 •

edited

Loading

vdudouyt commented Nov 28, 2024 •

edited

Loading

Eugeny commented Dec 12, 2024

vdudouyt commented Dec 13, 2024

Channel lacks high watermark flow control #392

Channel lacks high watermark flow control #392

Comments

vdudouyt commented Nov 24, 2024 • edited Loading

vdudouyt commented Nov 28, 2024 • edited Loading

Eugeny commented Dec 12, 2024

vdudouyt commented Dec 13, 2024

vdudouyt commented Nov 24, 2024 •

edited

Loading

vdudouyt commented Nov 28, 2024 •

edited

Loading