Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition in futures::sync::mpsc #909

Closed
simmons opened this issue Mar 25, 2018 · 1 comment · Fixed by #2304
Closed

Race condition in futures::sync::mpsc #909

simmons opened this issue Mar 25, 2018 · 1 comment · Fixed by #2304
Labels
A-channel Area: futures::channel bug

Comments

@simmons
Copy link

simmons commented Mar 25, 2018

It looks like there's a race condition in the MPSC implementation in futures 0.1.19, unless my understanding of the intended behavior is incorrect. If a Sender::try_send() happens concurrently with a Receiver close/drop, it's possible for try_send() to return Ok(()) even though the item can never be received and will not be dropped until the Sender is dropped. My expectation was to receive an error matching Err(ref e) if e.is_disconnected(). The Receiver Drop implementation closes the channel and drains any items present, but this can apparently happen just before the Sender thinks it has successfully enqueued the item.

I wrote a small program to stress-test this scenario and demonstrate the bug:
https://github.com/simmons/mpsc-stress

I discovered this behavior while stress testing a program which implements a synchronous API by sending a command + oneshot::Sender to a future's MPSC, which processes the command and sends the oneshot when complete. When the described race occurs, the sending thread would deadlock while hopelessly waiting forever for the oneshot::Receiver to either receive or indicate disconnection.

If I get a chance in the next few days, I'll see if I can root-cause the problem. (Unless I'm told that my expectation is incorrect.)

@simmons
Copy link
Author

simmons commented Mar 27, 2018

I looked through the futures::sync::mpsc code, and confirmed what is happening. When a new item is submitted via try_send(), the Sender checks that the channel is open, then enqueues the message -- but the channel can actually be closed between these two events. Specifically:

  1. On Thread 1, do_send() (via try_send()) is called to submit a message.
  2. inc_num_messages() determines that the channel is open and a new message can be enqueued.
  3. On Thread 2, the Receiver is closed, which marks the channel as closed, and all remaining messages are drained. This could be done explicitly by the application (e.g. the "clean shutdown" described in the documentation), or by way of the Drop implementation.
  4. Back on Thread 1, do_send() adds the new message to the queue and returns Ok(()) to indicate that the message has successfully been sent.
  5. That message is now stuck in a channel with no Receiver. It can never be received and will not be dropped (unless the Sender is dropped, of course). Because try_send() returns Ok(()) instead of Err(TrySendError { kind: TrySendErrorKind::Disconnected(_) }), the caller may reasonably assume that the message is deliverable or will be properly dropped should the Receiver close.

Just to test, I added a Mutex to prevent Sender::do_send() and Receiver::close() from executing concurrently, and confirmed that this fixes the problem. Obviously that is counter to the lock-free design goal of the MPSC code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-channel Area: futures::channel bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants