Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Channel lacks high watermark flow control #392

Closed
vdudouyt opened this issue Nov 24, 2024 · 3 comments · Fixed by #412
Closed

Channel lacks high watermark flow control #392

vdudouyt opened this issue Nov 24, 2024 · 3 comments · Fixed by #412
Labels
bug Something isn't working

Comments

@vdudouyt
Copy link

vdudouyt commented Nov 24, 2024

I discovered that russh continues reading into an internal buffer even if my application can't process that bytes in time. This could be a big problem for some applications (like mine), causing unacceptable RAM usage or even a termination by OOM killer.

Steps to reproduce:

  • Prepare 1gb.bin:
$ dd if=/dev/urandom of=/tmp/1.bin bs=64M count=16
  • Start a minimal reproducible example that just creates a reader and calls tokio::time::sleep().await (you can find full main.rs in attachment):
    async fn channel_open_session(
        &mut self,
        channel: Channel<Msg>,
        _session: &mut Session,
    ) -> Result<bool, Self::Error> {
        let mut channel = channel;
        tokio::spawn(async move {
           {
               let _cin = channel.make_writer();
               let _cout = channel.make_reader();
               tokio::time::sleep(Duration::from_secs(3600)).await;
           }
            // for some reasons russh requires me to call channel.close() explicitly,
            // although I have no problems with that
           let _ = channel.close().await;
        });

        info!("channel_open_session");
        Ok(true)
    }
  • Send 1gb.bin with pv and ssh. You can use any password, or an empty password:
$ pv /tmp/1.bin|ssh root@127.0.0.1 -p 2222 foo

Expected behavior: russh stops reading bytes in an internal buffer after high watermark is reached (suppose 16 Mb by default), until my app consumes some bytes from the channel making that internal buffer available for accepting new bytes again.

That's how it normally works for OpenSSH server sessions, as well as for TCP streams, as well as for UNIX domain sockets, as well as for simple UNIX pipes you're familiar with in UNIX shell, ... (you can continue).

Observed behavior: russh just reads whole 1gb.bin into RAM

Is there any way to overcome that issue? (Apart from implementing some self-control at application protocol level, which isn't possible if I have to keep compatibility with existing protocols such as rsync).

As I didn't tested if a similar issue exists for writer, may I also ask what happens if my app starts writing into channel faster than the client can receive that bytes? Will it await the client once buffer high watermark is reached, or just continues storing extra bytes into memory until terminated by OOM killer?

main.rs.zip

P. S. I'm using the latest russh release, which happens to be 0.46.0

@Eugeny Eugeny added the bug Something isn't working label Nov 24, 2024
@vdudouyt
Copy link
Author

vdudouyt commented Nov 28, 2024

Any chances to have that fixed in the next release? This happens to be the last showstopper problem for me :(

EpicEric added a commit to EpicEric/russh that referenced this issue Dec 7, 2024
Fixes Eugeny#392.

Simply put, this changes its `UnboundedReceiver` from
`unbounded_channel` with a `Receiver` from `channel(...)`, with a
hard-coded value in `CHANNEL_BUFFER_SIZE` of 100. This means that, for
each channel that's created, it can hold up to 100 `Msg` objects, which
are at most as big as the window size plus metadata/indirection
pointers.

This is enough to force messages to be read from the underlying
`TcpStream` only as quickly as the server/client is able to handle them.
This has been tested with the example given in Eugeny#392, including a
modification to slowly read into a non-expanding buffer, and appears to
fix the issue that's been pointed out.

Some other considerations:

- It is still possible to cause memory to explode by opening multiple
  channels, although that might be intentional on the user's part. In
  this case, it's up to the user to handle when channels get created or
  not.
- It might be desirable to let users pick how big the `mpsc:channel`s
  can be by some configuration, instead of the hardcoded value. It
  should be simple to modify the code in order to support this.
- The limited buffering also means that control `Msg`s (eg. `Eof`) will
  also be delayed by the transport layer backpressure, although this
  might be expected by the SSH standard. Reading through RFC 4254, I
  couldn't find any mention of non-`Data`/`ExtendedData` messages having
  to be dealt differently, but I don't think it's even possible to do so
  over TCP. If this assumption is wrong, this might require a separate
  `mpsc::unbounded_channel` just for control messages.

**BREAKING CHANGE**: This removes `fn send_channel_msg`, which doesn't seem
to be used anywhere in this library, but is part of the publicly exposed
API nonetheless.
EpicEric added a commit to EpicEric/russh that referenced this issue Dec 8, 2024
Fixes Eugeny#392.

Simply put, this changes its `UnboundedReceiver` from
`unbounded_channel` with a `Receiver` from `channel(...)`, with a
configurable value in `channel_buffer_size` of 100. This means that, for
each channel that's created, it can hold up to 100 `Msg` objects, which
are at most as big as the window size plus metadata/indirection
pointers.

This is enough to force messages to be read from the underlying
`TcpStream` only as quickly as the server/client is able to handle them.
This has been tested with the example given in Eugeny#392, including a
modification to slowly read into a non-expanding buffer, and appears to
fix the issue that's been pointed out.

Some other considerations:

- It is still possible to cause memory to explode by opening multiple
  channels, although that might be intentional on the user's part. In
  this case, it's up to the user to handle when channels get created or
  not.
- The limited buffering also means that control `Msg`s (eg. `Eof`) will
  also be delayed by the transport layer backpressure, although this
  might be expected by the SSH standard. Reading through RFC 4254, I
  couldn't find any mention of non-`Data`/`ExtendedData` messages having
  to be dealt differently, but I don't think it's even possible to do so
  over TCP. If this assumption is wrong, this might require a separate
  `mpsc::unbounded_channel` just for control messages.

**BREAKING CHANGES**:

- This removes `fn send_channel_msg`, which doesn't seem
  to be used anywhere in this library, but is part of the publicly
  exposed API nonetheless.
- This adds the configuration value `channel_buffer_size` to both
  servers and clients that allows setting how big the buffer size should
  be.
EpicEric added a commit to EpicEric/russh that referenced this issue Dec 9, 2024
Fixes Eugeny#392.

Simply put, this changes its `UnboundedReceiver` from
`unbounded_channel` with a `Receiver` from `channel(...)`, with a
configurable value in `channel_buffer_size` of 100. This means that, for
each channel that's created, it can hold up to 100 `Msg` objects, which
are at most as big as the window size plus metadata/indirection
pointers.

This is enough to force messages to be read from the underlying
`TcpStream` only as quickly as the server/client is able to handle them.
This has been tested with the example given in Eugeny#392, including a
modification to slowly read into a non-expanding buffer, and appears to
fix the issue that's been pointed out.

Some other considerations:

- It is still possible to cause memory to explode by opening multiple
  channels, although that might be intentional on the user's part. In
  this case, it's up to the user to handle when channels get created or
  not.
- The limited buffering also means that control `Msg`s (eg. `Eof`) will
  also be delayed by the transport layer backpressure, although this
  might be expected by the SSH standard. Reading through RFC 4254, I
  couldn't find any mention of non-`Data`/`ExtendedData` messages having
  to be dealt differently, but I don't think it's even possible to do so
  over TCP. If this assumption is wrong, this might require a separate
  `mpsc::unbounded_channel` just for control messages.

**BREAKING CHANGES**:

- This removes `fn send_channel_msg`, which doesn't seem
  to be used anywhere in this library, but is part of the publicly
  exposed API nonetheless.
- This adds the configuration value `channel_buffer_size` to both
  servers and clients that allows setting how big the buffer size should
  be.
@Eugeny
Copy link
Owner

Eugeny commented Dec 12, 2024

I've published v0.50.0-beta.1 which is just 0.49 but with this fix on top ✌️

@vdudouyt
Copy link
Author

Thank you so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants