Skip to content
This repository has been archived by the owner on May 26, 2022. It is now read-only.

Shorter Timeout #22

Closed
Stebalien opened this issue Feb 19, 2018 · 5 comments
Closed

Shorter Timeout #22

Stebalien opened this issue Feb 19, 2018 · 5 comments

Comments

@Stebalien
Copy link
Member

IMO, it's reasonable to assume that we don't really want to talk to a node with an RTT latency of over a few seconds.

@Kubuxu
Copy link
Member

Kubuxu commented Mar 10, 2018

I disagree, sometimes this is the only node we might be able to talk to. As in, low bandwidth high ping situations.

@Stebalien
Copy link
Member Author

The issue is that file descriptors are a scarce resource and hung dials prevent us from trying other addresses. Ideally we'd have some way to determine if we're running low on file descriptors and start killing long dials but that gets really messy.

Another thing to note is that we currently have a global timeout of 1m on fully establishing a connection. Given that it takes us (at least) 6 round trips (!) to establish a connection, we only really have 10s (at most) for establishing the TCP connection. So, how about setting TCP timeout to be timeout/6.


In general, I wonder if a rolling average timeout would work. That is, the TCP transport can track how long dials usually take (possibly ignoring dials to private addresses) and pick a reasonable timeout based on that.

@ajbouh
Copy link

ajbouh commented May 3, 2018

Since @Kubuxu's concern applies only in situations where our only option is a high latency link, perhaps an escalating schedule of TCP timeouts is the simplest solution? That is, if we can't connect to anyone with a timeout of 4 seconds, try 8, then 32, etc. Some basic statistics on TCP connection times would probably yield a reasonable schedule that improves median, 80th, and maybe even 90th percentile connection times at the expense of 99th percentile times.

Thoughts?

@vyzo
Copy link

vyzo commented May 3, 2018

cc myself

@marten-seemann
Copy link
Contributor

Given that it takes us (at least) 6 round trips (!) to establish a connection, we only really have 10s (at most) for establishing the TCP connection. So, how about setting TCP timeout to be timeout/6.

Not sure why it takes 6 roundtrips, I guess this number was still from secio times? I count 1 for the 3-way handshake, 1 for security protocol negotiation, 1 for the security handshake, 1 for muxer negotiation. So that's 4 in total.
By moving the security protocol into the multiaddr, we'll further reduce this, so (very soon) we'll be down to 3 roundtrips. Now the 5s timeout doesn't look as unreasonable any more.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants