handling disconnect/reconnect #9

dominictarr · 2013-04-18T12:59:56Z

posting this mostly as notes, and thinking out loud.

handling disconnections well is fairly important.
when I'm building apps, I often end up with this pattern:

reconnect(function (stream) {
  hash(function (hash) {
    //this is called immediately, and then whenever the hash changes
    //you this could represent other things, basically - it's just when the ui
    //mode changes.
  })
})

This is quite awkward, because whenever the app changes modes/state
there are streams/mux-demux/scuttlebutts to disconnect, and new things to connect.

in rumours, I managed to pave over this problem for scuttlebutts.
https://github.com/dominictarr/rumours#example-using-defaults

Basically, the challenge here, is creating an interface to each type of abstraction that works well over reconnections.

Just buffering all requests is not efficient - you don't know if a disconnection is temporary or not.

there are various approaches that could be used for different streaming abstractions.
For example, with a range query, that is append only - so, like in twitter - new tweets are always added to the end of the feed - so, after disconnecting, you just ask for the changes since the last thing you received.

That only works for cases that are append only - if a updates may come into the middle of a range, then you'd have to keep a buffer, of say, the last N updates, and then if the disconnection is brief, it will be fine - but if it's long, just resend the whole range, or use something like level-master instead.

get, put, batch can be buffered like in levelup.

The problem here, is that it depends on the particular way that data is updated and replicated.

soldair · 2013-05-11T01:34:09Z

@dominictarr i don't think you can to worry about the case where the source data of the range query is updated at the end or updated in the middle based on application logic.

A stream will never be "real time" consistent. Doesn't levelup/down uses read consistent views (transient snapshot) for all streams? Even if you put a value that would be sorted after the current seek position in the stream it will not be returned. (https://github.com/rvagg/node-levelup/blob/master/test/snapshot-test.js)

This means that no one could have coded against an api behavior related to the update in the middle case for streams. On the other hand with levelup it is possible to have constructed an API, however unlikely, that expects to get no values after the start time of the readStream (as in time series data). But over multilevel the start time is non deterministic, and there is no existing api to send stream start time to the client to code like that against.

a library should either give a disconnect error or start a reconnect + continue range algorithm.

anyway just some thoughts. Obviously i'm just starting to get more involved so your thoughts would be highly appreciated.
I'm implementing a wrapper over multilevel to test this now its a great idea. I started multileveling because i need it to run both a realtime data aggregation stream and a batch stream concurrently. I hope to learn a lot as its over ~6tb of data.

dominictarr · 2013-05-12T18:04:30Z

Sounds fantastic, I look forward to hearing of your experiences!

I am mostly refering to plugins such as https://github.com/dominictarr/level-live-stream/ which do allow real time views!
I use this through out my modules, so things like https://github.com/dominictarr/map-reduce are also real-time queryable.

In other cases, you are correct - reconnecting, and resuming from the last key is correct.
this will work well in cases like log files where the order of keys is sorted by time.

juliangruber · 2013-05-13T05:43:40Z

FIrst thing we could introduce is db.createStream() on the client, so the db object itself is not the stream and you don't have to create a new db object on each reconnect.

A hash per reconnect isn't really necessary I think. When a reconnect recurs the client just tells the server what operations it had going on before, eg (13,34) where 13 is the id of a keyStream and 34 of a batch.

So, on buffering: Both client and server need an in-memory cache for pending operations/results. The client can have a cache that grows forever but timeouts occur and flush some cash entries when they just take too long. The server on the other side should have a cache that doesn't grow beyond a certain size, so that it can't be taken down by clients just firing one request and quickly disconnecting.

@soldair LevelDB can be made to use a Snapshot for iteration but can also iterate without. See http://leveldb.googlecode.com/svn/trunk/doc/index.html (the snapshot section).

juliangruber · 2013-05-13T07:48:48Z

What I said about snapshots was wrong, they're created implicitely when iterating.

dominictarr · 2013-05-13T11:17:45Z

how each method should be handled depends greatly on the plugins - for example, scuttlebutt has it's own ways to handle reconnect, or if data is append only it's quite easy to reconnect... so, what would be needed is a way to inject a function to handle reconnection for a particular plugin.

soldair · 2013-05-15T22:06:18Z

I'm wrapping multilevel methods with this now https://github.com/soldair/node-when-connected
It Provides all of the error cases i needed exposed and doesn't resume streams or know anything about multilevel/leveldb.

Within the many layers of streams we have I'm sure its doing some duplicate work but i don't know if any of this has a place in the multilevel api. This solves issues like multilevel should emit an error if a stream is ended due to disconnection rather than end.

soldair · 2013-05-18T01:49:47Z

hey guys. to solve resuming disconnected streams on top of the module i mentioned earlier i've published
https://github.com/soldair/multilevel-reconnected

this handles numbers of retries, resumes and a timeout for any call through to multilevel. You can write programs that are largely unaffected by database process restarts. I don't know if any of these features should be in multilevel considering all things should be moar smaller but i'm doing a lot of hacking to scale processing for my production app. (not there yet but close)

i'm probably bothering you guys i know my input was not exactly solicited and i'm all talk so far with no pull requests to the project but if anything i'm thinking about sparks an idea of how to make this api better i would love to contribute.

juliangruber · 2013-05-18T06:20:21Z

this should def. happen in multilevel itself, maybe still using a library just for that. It's great that you experiment with approaches already as I don't have time for that at the moment. I'll have a look at your libs.

dominictarr · 2013-05-18T10:44:40Z

@soldair your input is absolutely welcomed. Please don't hesitate to suggest things, or show how you are using this module.

this is good work.

Two questions on async callbacks:

if it disconnects before an async call completes, does that return an error?
looks like if currently disconnected, async calls are buffered until connected, correct?

we could probably break this into two parts, one for handling streams and one for handling async calls.

That will make it pretty easy to plug into multilevel.

soldair · 2013-05-18T11:54:19Z

thanks =)

if you become disconnected and you have not specified a number of retries you will immediately get an error (E_DISCONNECT)
when-connected/index.js 29
if you have specified a number of retries each disconnect increments the attempts by one. when connected it calls the queued method. if attempts == retries the callback will be fired with an error (E_DISCONNECT)
when-connected/index.js 26
if you call an async method and you are not connected the call is queued.
when-connected/index.js 76

accessing any reconnecting stream is really an async operation (its async sometimes so i prefer to have one code path that always presents the callback interface) so i found that streams support was 90% overlap

when you wrap a method you pass an option saying that this method returns a stream
when called it returns a through() (*stream 1) and associates it with the queued async call.
when called i pipe the real stream (...createReadStream().pipe(s,{end:false}) etc.) and mange events. if at any time the stream becomes disconnected an E_DISCONNECT error is emitted

in multilevel-reconnected i wrapped stream methods one additionl time if "resume" was specified as an option. So

i tracked the last key returned multilevel-reconnected/index.js 115
transform calls from createValueStream into createReadStream for key access
multilevel-reconnected/index.js 99
re-queuing calls up to "resumes" times before letting the disconnect error get through. multilevel-reconnected/index.js 138

It made perfect sense for this to be separate because the key logic and such.

i did not get close to trying to tangle with createLiveStream business =) anyway. super sleepy almost 5am here.

soldair · 2013-07-19T00:00:55Z

hey guys. i mentioned this to @dominictarr in irc today but here is a test case that shows what i think is a pretty big issue.

When the connection ends active incomplete streams to not emit error.

Basically if you don't have an on disconnect that handles this your program has a bug (not live stream of course).
Fixing this would help to clean the implementation of resumable streams which i would like to publish as a module.

Mostly i would love feedback to clarify the best course of action and to submit pull requests that fix this. If this is not a bug i would love to hear how you are handing it so i can build with your pattern.

https://gist.github.com/soldair/6034011

dominictarr · 2013-07-19T01:50:46Z

in irc:

dominictarr: soldair: hey, so you want to get an error emitted when it unexpectedly disconnects?
[6:46pm] soldair: dominictarr: i mean it is not the end of the stream
[6:46pm] dominictarr: agree
[6:46pm] soldair: dominictarr: i would at least like a way to distinguish between an end and a disconnect
[6:46pm] dominictarr: yeah, probably want a error of a specific type
[6:46pm] dominictarr: so, fixing this is easy actually
[6:47pm] soldair: dominictarr: when the styream gets a real error this happens as well
[6:47pm] soldair: dominictarr: really
[6:47pm] dominictarr: just need to enable a mux-demux option
[6:47pm] dominictarr: it's like if you are streaming tcp, and then the wifi router dies
[6:48pm] dominictarr: soldair: can you make a PR to make this a test case for multilevel?
[6:48pm] dominictarr: also: the fix is one line MuxDemux() -> MuxDemux({error: true})
[6:49pm] soldair: dominictarr: cool.so i guess i can whip up a pull request that sounds like an easy way to go
[6:49pm] dominictarr: I'll wait for juliangruber to see it before merging, but I agree this is correct.
[6:49pm] soldair: dominictarr: 
[6:49pm] dominictarr: (one line in lib/server.js and lib/client.js I mean)
[6:49pm] dominictarr: soldair: no problem!

soldair · 2013-07-19T02:30:11Z

i'm playing the bind all of the errors game to make sure its good for my
use case and i can bind all the errors. ill submit the request when i'm
done testing.

On Thu, Jul 18, 2013 at 6:50 PM, Dominic Tarr notifications@github.comwrote:

in irc:

dominictarr: soldair: hey, so you want to get an error emitted when it unexpectedly disconnects?
[6:46pm] soldair: dominictarr: i mean it is not the end of the stream
[6:46pm] dominictarr: agree
[6:46pm] soldair: dominictarr: i would at least like a way to distinguish between an end and a disconnect
[6:46pm] dominictarr: yeah, probably want a error of a specific type
[6:46pm] dominictarr: so, fixing this is easy actually
[6:47pm] soldair: dominictarr: when the styream gets a real error this happens as well
[6:47pm] soldair: dominictarr: really
[6:47pm] dominictarr: just need to enable a mux-demux option
[6:47pm] dominictarr: it's like if you are streaming tcp, and then the wifi router dies
[6:48pm] dominictarr: soldair: can you make a PR to make this a test case for multilevel?
[6:48pm] dominictarr: also: the fix is one line MuxDemux() -> MuxDemux({error: true})
[6:49pm] soldair: dominictarr: cool.so i guess i can whip up a pull request that sounds like an easy way to go
[6:49pm] dominictarr: I'll wait for juliangruber to see it before merging, but I agree this is correct.
[6:49pm] soldair: dominictarr:
[6:49pm] dominictarr: (one line in lib/server.js and lib/client.js I mean)
[6:49pm] dominictarr: soldair: no problem!

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/9#issuecomment-21227048
.

soldair mentioned this issue Jul 20, 2013

adding mux-demux error option to detect errors from disconnect #26

Merged

timoxley mentioned this issue Dec 4, 2013

handling disconnect/reconnect #42

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

handling disconnect/reconnect #9

handling disconnect/reconnect #9

dominictarr commented Apr 18, 2013

soldair commented May 11, 2013

dominictarr commented May 12, 2013

juliangruber commented May 13, 2013

juliangruber commented May 13, 2013

dominictarr commented May 13, 2013

soldair commented May 15, 2013

soldair commented May 18, 2013

juliangruber commented May 18, 2013

dominictarr commented May 18, 2013

soldair commented May 18, 2013

soldair commented Jul 19, 2013

dominictarr commented Jul 19, 2013

soldair commented Jul 19, 2013

handling disconnect/reconnect #9

handling disconnect/reconnect #9

Comments

dominictarr commented Apr 18, 2013

soldair commented May 11, 2013

dominictarr commented May 12, 2013

juliangruber commented May 13, 2013

juliangruber commented May 13, 2013

dominictarr commented May 13, 2013

soldair commented May 15, 2013

soldair commented May 18, 2013

juliangruber commented May 18, 2013

dominictarr commented May 18, 2013

soldair commented May 18, 2013

soldair commented Jul 19, 2013

dominictarr commented Jul 19, 2013

soldair commented Jul 19, 2013