-
Notifications
You must be signed in to change notification settings - Fork 41
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Improve typed socket connection robustness (#810)
What we know: - Sometimes, when there are network issues, some (but not all) drones fail to reconnect. In the logs, it shows up as `failed to send heartbeat` (from [heartbeat.rs](https://github.com/jamsocket/plane/blob/afc9b7f0786f69770fb9fe4b9731fde566dc793d/plane/src/drone/heartbeat.rs#L19)), with the `err` value of `Disconnected` ([typed_socket](https://github.com/jamsocket/plane/blob/afc9b7f0786f69770fb9fe4b9731fde566dc793d/plane/src/typed_socket/mod.rs#L45)). - When we “send” websocket messages, we are actually sending messages to a queue that gets picked up by the websocket event loop asynchronously. Disconnected [actually means that the message queue is full](https://github.com/jamsocket/plane/blob/afc9b7f0786f69770fb9fe4b9731fde566dc793d/plane/src/typed_socket/mod.rs#L51), which *implies* we are disconnected, but is not immediate. - On a `TypedSocket`, `send()` [uses Sender::send](https://github.com/jamsocket/plane/blob/afc9b7f0786f69770fb9fe4b9731fde566dc793d/plane/src/typed_socket/mod.rs#L70-L76), while `TypedSocketSender`'s `send()` uses [try_send](https://github.com/jamsocket/plane/blob/afc9b7f0786f69770fb9fe4b9731fde566dc793d/plane/src/typed_socket/mod.rs#L92). - In tokio, [`send`](https://docs.rs/tokio/latest/tokio/sync/mpsc/struct.Sender.html#method.send) on a full channel blocks until the channel has capacity. `try_send` returns immediately if the channel is full. My leading theory is that when the network is interrupted abruptly, upon reconnecting to the controller, the controller sends a bunch of messages, causing a deadlock: - the `new_client` loop is stalled waiting for capacity in send_to_client, which won't happen until `socket.recv()` [is called](https://github.com/jamsocket/plane/blob/afc9b7f0786f69770fb9fe4b9731fde566dc793d/plane/src/drone/mod.rs#L96-L99) in the main drone event loop - the main drone event loop is waiting on a call to `socket.send()` when [acking an action](https://github.com/jamsocket/plane/blob/afc9b7f0786f69770fb9fe4b9731fde566dc793d/plane/src/drone/mod.rs#L142) This PR introduces several changes, which should improve the robustness of reconnects: - Instead of handling messages from the controller directly in the drone event loop, they are sent to separate tasks. This means that nothing can get in the way of the drone event loop's ability to call `socket.recv()`. - `TypedSocket::send` now uses `Sender::try_send`, for consistency with `TypedSocketSender::send`. Note that as a result of moving handling out of the drone event loop, messages sent from the drone loop now use `TypedSocketSender` instead of `TypedSocket` anyway, so those messages would now use `try_send` regardless. - As a result of using `try_send`, `TypedSocket::send` no longer needs to be async, so that's removed. - `TypedSocketError::Disconnected` is renamed to `TypedSocketError::Clogged`, to better reflect what the issue is.
- Loading branch information
Showing
13 changed files
with
103 additions
and
126 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.