-
Notifications
You must be signed in to change notification settings - Fork 990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Do not publish to connections without context #3873
Conversation
This is a rare case where a closed connection is kept alive while the handling fiber yields, therefore leaving `cc_` (the connection context) pointing to null for other fibers to see. As far as I can see, this can only happen during server shutdown, but there could be other cases that I have missed. The test on its own does _not_ reproduce the crash, however with added `ThisFiber::SleepFor()`s I could reproduce the crash: * Right before `DispatchBrief()` [here](https://github.com/dragonflydb/dragonfly/blob/e3214cb603fce42c8bb52a2c220c3115b7f8cecf/src/server/channel_store.cc#L154) * Right after connection context `reset()` [here](https://github.com/dragonflydb/dragonfly/blob/2ab480e160dece516b908584f1933b25b760abfb/src/facade/dragonfly_connection.cc#L750) In any case, calling `SendPubMessageAsync()` to a connection where `cc_` is null is a bug, and we fix that here.
src/server/channel_store.cc
Outdated
@@ -146,7 +146,8 @@ unsigned ChannelStore::SendMessages(std::string_view channel, facade::ArgRange m | |||
auto it = lower_bound(subscribers_ptr->begin(), subscribers_ptr->end(), idx, | |||
ChannelStore::Subscriber::ByThreadId); | |||
while (it != subscribers_ptr->end() && it->Thread() == idx) { | |||
if (auto* ptr = it->Get(); ptr) | |||
// A connection might have closed or be in the process of closing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// A connection might have closed or be in the process of closing | |
// if ptr->cntx() is null, a connection might have closed or be in the process of closing |
@@ -146,7 +146,8 @@ unsigned ChannelStore::SendMessages(std::string_view channel, facade::ArgRange m | |||
auto it = lower_bound(subscribers_ptr->begin(), subscribers_ptr->end(), idx, | |||
ChannelStore::Subscriber::ByThreadId); | |||
while (it != subscribers_ptr->end() && it->Thread() == idx) { | |||
if (auto* ptr = it->Get(); ptr) | |||
// if ptr->cntx() is null, a connection might have closed or be in the process of closing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can it be not null
yet the state is_closing
? If thast's the case shouldn't we also cover this here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm not sure which is_closing
state you are referring to, there doesn't seem to be such a state in facade::Connection
nor in util::Connection
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when we close a connection we set the member field of conn_context
called conn_closing
. I am asking if we also need to poll that here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ptr->cntx()->conn_closing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, conn_closing
case is already supported and handled inside SendAsync
, so we're good with that regards
This is a rare condition, which we know can happen during shutdown (see [here](#3873 (comment)))
* fix: Do not use `cc_` in connection if it's null This is a rare condition, which we know can happen during shutdown (see [here](#3873 (comment))) * add comment
This is a rare case where a closed connection is kept alive while the handling fiber yields, therefore leaving
cc_
(the connection context) pointing to null for other fibers to see.As far as I can see, this can only happen during server shutdown, but there could be other cases that I have missed.
The test on its own does not reproduce the crash, however with added
ThisFiber::SleepFor()
s I could reproduce the crash:DispatchBrief()
herereset()
hereIn any case, calling
SendPubMessageAsync()
to a connection wherecc_
is null is a bug, and we fix that here.