Connection retry logic of an momentarily unavailable peer #311

kilrau · 2018-08-09T01:48:18Z

I don't think we got this yet: #185 (comment)

Do this now or as part of #178 ?

sangaman · 2018-08-09T14:11:36Z

Yeah we had discussed an exponential backoff on connection attempts, and I think that would be a pretty decent and simple way to implement this functionality at least at first. We'll keep trying to connect with longer waits between attempts after each failure.

kilrau · 2018-08-15T20:36:14Z

Proposal:
The minimum amount of time, in seconds, that xud will wait before it tries to reconnect to the source or target destination after a failure.

This attribute works with Minimum Delay, Maximum Delay and Incremental Delay (all in seconds) attributes. After the first failure to connect to a destination, xud will wait for the number of seconds defined by Minimum Delay.

If the second trial also fails, it will increase its waiting time by the number of seconds defined by Incremental Delay. The maximum delay time is defined by Maximum Delay. Once the waiting time is increased to the maximum value, xud will not increase its waiting time anymore.

Delete node from nodes db if no successful connection could be established within DeleteNodes Delay.
Once xud successfully connects to a peer, its waiting time will be reset to the initial value defined by Minimum Delay.

Minumum Delay: 5
Incremental Delay: Current Delay*2
Maximum Delay: 3600
DeleteNodes Delay. 604800

Source: https://docs.oracle.com/cd/E13222_01/wls/docs81/ConsoleHelp/domain_messagingbridge_config_connectionretry.html

sangaman · 2018-08-15T22:03:53Z

I think a week isn't that long if/when we target non-exchanges that don't expect to be online 24/7. It would be a good thing to have configurable I think, but something around there may be fine for now. But mainly I think a 5 second fixed backoff is too little and will result in way too many connection attempts, I think an exponential backoff (5, 10, 20, 40, 80, 160, etc) makes a lot more sense.

Also I don't think we should delete the node from the db per se, maybe we just drop its listening addresses from the db indicating that we no longer believe those are good and we stop trying to connect to them.

kilrau · 2018-08-16T18:36:03Z

adjusted

moshababo · 2018-08-22T16:02:55Z

@kilrau

Once the waiting time is increased to the maximum value, xud will not increase its waiting time anymore.

It will not increase, or reconnection attempts should stop?

Another thing: Maximum Delay is smaller then DeleteNodes Delay, so how would the later ever be reached?
We can instead stop trying to connect to a node if the last successful connection attempt was too long ago.

At the moment we're not listing a node in db before our first successful connection.
So when trying to connect to a new node, if reconnection attempts would stop at some point, the node will be forgotten.

kilrau · 2018-08-22T22:11:23Z

Once the waiting time is increased to the maximum value, xud will not increase its waiting time anymore.
It will not increase, or reconnection attempts should stop?

It will not increase. Re-connection attempts go on til DeleteNodes Delay (total), then the node gets deleted. In plain language: the reconnection attempts start with a 5s delay, then increase *2 until a 3600s interval is reached. This 3600s is kept until a total time of 604800s (1 week) is reached, then the node gets deleted. Maybe DeleteNodeTime is the better word.

So when trying to connect to a new node, if reconnection attempts would stop at some point, the node will be forgotten.

With #386 we'll learn about the node again once it's online again at some point.

kilrau · 2018-09-05T16:29:15Z

add automatic test

kilrau assigned sangaman and moshababo Aug 9, 2018

kilrau added this to the 1.0.0-alpha milestone Aug 14, 2018

kilrau unassigned sangaman and moshababo Aug 14, 2018

kilrau modified the milestones: 1.0.0-alpha.2, 1.0.0-alpha.1 Aug 15, 2018

kilrau assigned moshababo Aug 15, 2018

kilrau added the p2p Peer to peer networking label Aug 15, 2018

kilrau mentioned this issue Aug 15, 2018

hosts table (db) should include only valid entries #305

Closed

moshababo mentioned this issue Aug 23, 2018

connection retries #392

Merged

kilrau added the to do label Aug 27, 2018

kilrau mentioned this issue Sep 6, 2018

Peers/Node Data Structures #185

Closed

kilrau closed this as completed Sep 8, 2018

ghost removed the to do label Sep 8, 2018

kilrau mentioned this issue Oct 30, 2018

not connected to peer after disconnect #616

Closed

This was referenced Dec 3, 2018

[Overview] Reconnect to peer after disconnection (Improvements) #721

Closed

Implement behavior for #152 disconnect package "banned" #693

Open

moshababo mentioned this issue Dec 7, 2018

Reconnect only for outgoing peers #739

Closed

kilrau mentioned this issue Dec 11, 2018

GET_NODES periodically #402

Closed

kilrau mentioned this issue Jun 10, 2019

test2 (seed node) didn't connect to other seed nodes (on startup) #1014

Closed

kilrau mentioned this issue Jun 17, 2019

xud doesn't try to connect to lnd if it's syncing #1039

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Connection retry logic of an momentarily unavailable peer #311

Connection retry logic of an momentarily unavailable peer #311

kilrau commented Aug 9, 2018

sangaman commented Aug 9, 2018

kilrau commented Aug 15, 2018 •

edited

Loading

sangaman commented Aug 15, 2018

kilrau commented Aug 16, 2018

moshababo commented Aug 22, 2018

kilrau commented Aug 22, 2018

kilrau commented Sep 5, 2018

Connection retry logic of an momentarily unavailable peer #311

Connection retry logic of an momentarily unavailable peer #311

Comments

kilrau commented Aug 9, 2018

sangaman commented Aug 9, 2018

kilrau commented Aug 15, 2018 • edited Loading

sangaman commented Aug 15, 2018

kilrau commented Aug 16, 2018

moshababo commented Aug 22, 2018

kilrau commented Aug 22, 2018

kilrau commented Sep 5, 2018

kilrau commented Aug 15, 2018 •

edited

Loading