-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reconnect policy #266
Reconnect policy #266
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the contribution. I have a few suggestions and questions about parts of it.
src/room/IReconnectPolicy.ts
Outdated
nextRetryDelayInMs(context: IReconnectContext): number | null; | ||
} | ||
|
||
export interface IReconnectContext { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would remove this layer of indirection. If the class is supposed to be a storage class. instead, it would be great to make ReconnectContext
an interface. Interfaces in Typescript are more like a "data class" in other languages.
src/room/DefaultReconnectPolicy.ts
Outdated
7 * 7 * 300, | ||
8 * 8 * 300, | ||
9 * 9 * 300, | ||
null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this is a bit confusing to have nulls in arrays. would be preferable to perform a length check and exit if out of attempts.
src/room/RTCEngine.ts
Outdated
if (this.reconnectAttempts >= maxReconnectRetries || duration > maxReconnectDuration) { | ||
recoverable = false; | ||
} | ||
duration = Date.now() - this.reconnectStart; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this logic is now repeated from the top. can you think of a way to avoid this duplication?
if (reconnectRequired && !this.fullReconnectOnNext) { | ||
this.fullReconnectOnNext = true; | ||
this.reconnectAttempts = 0; | ||
let requireSignalEvents = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if this logic is correct. we should only emit a reconnecting/resuming event on the very first attempt. I think this alters the behavior quite a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously code set reconnectAttempts
to 0 to force re-emiting the events if fullReconnectOnNext
is false and reconnectRequired
is true.
I was wondering about that as well as this can emit the events twice. First & second reconnect attempt.
I decided to keep that behavior as I'm not aware of the original reason.
But instead of setting reconnectAttempts
to 0 after the first attempt I found it cleaner to signal the events in a different way because it's the 2nd reconnect attempt we should not reset the counter as this alters the logic in the reconnect policy by re-evaluating retryCount
with "0" another time.
I don't see an alteration of the previous behavior regarding events emitted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I double checked the listeners and can confirm that we want to fire the signal again going from resume to reconnect. So your implementation is right.
|
||
this.engine.client.signalLatency = this.options.expSignalLatency; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is moving this required by this PR? It would be good to keep PRs focused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is related to the PR but not a functional requirement. I passed the RoomOptions
to the RTCEngine
constructor to get hold of the reconnectPolicy
.
As RTCEngine
is now aware of the RoomOptions
it makes more sense in my opinion to apply the configuration setting of the SignalClient
where it's created.
thanks for your clarifications, I'll give it a try tomorrow and will keep you posted! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great progress here, added a few notes.
if (reconnectRequired && !this.fullReconnectOnNext) { | ||
this.fullReconnectOnNext = true; | ||
this.reconnectAttempts = 0; | ||
let requireSignalEvents = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I double checked the listeners and can confirm that we want to fire the signal again going from resume to reconnect. So your implementation is right.
src/room/RTCEngine.ts
Outdated
.then(() => { | ||
waitForPcCompleted = true; | ||
}) | ||
.catch(() => Error('failed to wait for peer connection')); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this intending to throw? I'm unfamiliar with this syntax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks good catch, there is throw missing.
src/room/RTCEngine.ts
Outdated
.catch(() => Error('failed to wait for peer connection')); | ||
|
||
while (!waitForPcCompleted) { | ||
// TODO: this is a workaround to check if the socket was closed while we are reconnecting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you describe a repro step to trigger a socket closure while reconnecting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to revert as it's not related to the changes in the PR, I would classify it as a separate bug.
src/room/RTCEngine.ts
Outdated
// TODO: this is a workaround to check if the socket was closed while we are reconnecting | ||
// if signal client reconnects with ?reconnect=1 ws.onclose is almost immediately called after ws.onopen | ||
if (!this.client.isConnected) { | ||
this.fullReconnectOnNext = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it need to be set here again? throwing should be sufficient to move this into the right state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct no need to set here again.
@taskbit just checking in. do you think you have cycles to get this one across the finish line? I'd love to get this merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fantastic work here! this is a meaty PR and really improves the flexibility of reconnection. Thank you for making LiveKit better!
@davidzhao thank you for working on this awesome project and the detailed code review, appreciate it! |
My motivation was that the existing reconnection logic gave up reconnecting after the LiveKit node was forcefully restarted.
Also in response to #261 I revisited the reconnection handling.
I run into an issue where the websocket connection is being closed after trying to resume with
?reconnect=1
probably initiated on the server side.
I added a workaround here