-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some proposed fixes for "Poor connection never attaches and reports connection to be connected #195" #218
Conversation
ok, but I assumed HTTP request timeout applies to normal REST HTTP requests as opposed to Comet based HTTP requests. Is that not the case? Looks good 👍 |
Is there no way to simulate the problem we had with two transports by adding a test to prevent a regression? I would prefer to see this if possible. |
Currently, no. At the moment in ably-js, the two timeouts work at different levels of the lib: The httpRequestTimeout determines how long an http request in SEND mode lasts before giving up. At that point the request object doesn't know or care if it's been created as part of a comet transport, a REST request, an authUrl call to a client server, or anything else, it just knows its url and request mode. The realtimeRequestTimeout is at a much higher level, and determine how long the Connection stays in the There's an open issue which discusses changing how the timeouts interact in the case of opening a realtime connection specifically. We should probably discuss how timeouts should work more generally.
I had a play around with trying to simulate the problem, but couldn't see any nice way of doing it from within ably-js (the failure is an http request that never responds, so would need to figure out the http transport being used, them shim XHRequest#exec for a version which figures out if a request is the right one and replaces the xhr.onreadystatechange handler with a noop if so; and be able to do an equivalent thing for all the other comet transports so the test works everywhere). So, yes, it's possible. But I figured not much point if we're shortly going to remove the upgrade mechanism anyway... (Also: if we do want to do general realtime request failure tests: per Paddy's suggestion here, might it be better (more flexible, not so tied to the implementation of each transport) to do those by making realtime fail to respond, rather than fake it with complicated shims? Could even maybe have a set of bad-realtime-endpoints that fails in fixed ways simulating network conditions, and can test all the client libraries against them.) |
Didn't think we had decided on that, I thought we would see the impact of the new router changes first.
Sure, would be useful, perhaps even a header that can be passed in to An alternative would instead be to simply start up a local WS & HTTP server that proxies to realtime and achieves that thus keeping this complexity in the test suite itself where it belongs |
505e37b
to
632d9c3
Compare
@paddybyers can you confirm whether you agree with the other changes in this? In particular, whether you think changing of the attaching timeout to revert the channel to |
@paddybyers see prev comment |
There's no real reason for them to be different timeouts. However, it doesn't really introduce any new case that needs to be handled, because of course the operation might have succeeded in the first 10s but the indication of that failed to reach us. So the 10s elapses, so we indicate failure and fail the pending ops; now in the next few seconds we find out that the operation did in fact succeed and we are attached after all. That's just too bad; the server response always takes precedence in determining our state, so we are attached, and it's just a shame that we failed those messages. |
This looks ok but it would be even better if we could test it :)
Is there no way to avoid this? We're synchronously deciding what the state should be, so why can't we make it so? I've not understood the problem here I think
Correct. Otherwise LGTM, thanks. |
Rather than indefinitely retry
To match xhrrequest, and so that connectionManager isFatalErr helper responds appropriately
… state change If there is a pending transports which is connected and scheduled for activation, just waiting for the active transport to finish what its doing, then when that transport fails, can just let the pending one seamlessly take over, rather than going into the disconnected state
Only difference between a disconnect and a close is that a close tells realtime to discard the connection state, which should be avoided unless the user requests it
632d9c3
to
f82f559
Compare
So: when this happens, this closure is registered as a listener for the Would there be any bad consequences? Once the new transport is active, Channels#onTransportActive calls |
What's asynchronous though? |
@paddybyers (reiterating slack discussion): Sorry, you're right, we could do it just be emitting But, again: that would bypass the nopending mechanism. Which I don't object to, except that I don't understand why it exists at all, given that Channels#onTransportActive deals with pending channel operations across transport changes (I'm obviously a bit wary of bypassing something whose purpose I don't understand...) |
Addresses #195.
The questions from that issue, with discussion & questions:
Already addressed by #197
Because previous strategy on timeout was just to retry. 96e65de changes that to fail channel ops (leaving the channel in the state it was in beforehand) if the request fails.
I'm not certain this is the correct thing to do (in particular, since realtimeRequestTimeout is 10s and httpRequestTimeout is 15s, this leaves a period of 5s after which the channel considers the request failed, but during which it could still succeed, which is kinda weird). Opinions?
96f83a6 changes the deactivateTransport logic such that if there is a pending transport which is connected and scheduled for activation -- just waiting for the active transport to finish what its
doing - then, if the current active transport fails, it just lets the pending one take over, avoiding going into the disconnected state. If that's not the case, we continue as we do at the moment.
(This technically leaves a period in which the state is
connected
but activeProtocol is null -- I think should be at most one cycle of the event loop -- not sure if this is a problem?)Because transport#abort was calling
requestClose(true)
. I don't understand why that was like that -- AFAIK the only reason to do a /close (rather than a /disconnect) is if we want realtime to discard the connection state, which surely isn't something we should be doing unless the user requests a close. @paddybyers am I missing something?7a593e2 changes it to
requestClose(false)
.In that case I think it was right -- realtime lost the connection state (due to the previous point), so Matt was no longer attached.