-
Notifications
You must be signed in to change notification settings - Fork 769
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect closed notification substreams instead of evicting all peers #3983
Conversation
/// | ||
/// To prevent this from happening, define a threshold for how long `SyncingEngine` should wait | ||
/// before it starts evicting peers. | ||
const INITIAL_EVICTION_WAIT_PERIOD: Duration = Duration::from_secs(2 * 60); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dq: The worst that can happen now is to stall indefinitely? And we presume that previously, we evicted peers because they closed their notification substreams but we didn't notice it on time (ie before the eviction timer triggered all peers to be dropped)? 🤔
I think this is related to: #3346
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order for this to happen, all peers have to keep the notification channel open and misbehave by not providing any useful data. This seems unlikely to me.
Yes, the peer eviction kicked in only if we had all notification streams half-closed, but didn't notice it.
I think we can revisit peer eviction and may be come up with a solution tracking individual peers (what has been done at some point i the past).
Also, people complained that suddenly the number of connected peers dropped to zero due to eviction. Now it should work on an individual peer level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We just added this to work around the bug that this pr is fixing, thus I think it is fine to remove this. Especially as we already have seen issues with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…aritytech#3983) This PR brings the fix paritytech/substrate#13396 to polkadot-sdk. In the past, due to insufficient inbound slot count on polkadot & kusama, this fix led to low peer count. The situation has improved since then after changing the default ratio between `--in-peers` & `--out-peers`. Nevertheless, it's expected that the reported total peer count with this fix is going to be lower than without it. This should be seen as the correct number of working connections reported, as opposed to also reporting already closed connections, and not as lower count of working connections with peers. This PR also removes the peer eviction mechanism, as closed substream detection is a more granular way of detecting peers that stopped syncing with us. The burn-in has been already performed as part of testing these changes in paritytech#3426. --------- Co-authored-by: Aaro Altonen <a.altonen@hotmail.com>
…aritytech#3983) This PR brings the fix paritytech/substrate#13396 to polkadot-sdk. In the past, due to insufficient inbound slot count on polkadot & kusama, this fix led to low peer count. The situation has improved since then after changing the default ratio between `--in-peers` & `--out-peers`. Nevertheless, it's expected that the reported total peer count with this fix is going to be lower than without it. This should be seen as the correct number of working connections reported, as opposed to also reporting already closed connections, and not as lower count of working connections with peers. This PR also removes the peer eviction mechanism, as closed substream detection is a more granular way of detecting peers that stopped syncing with us. The burn-in has been already performed as part of testing these changes in paritytech#3426. --------- Co-authored-by: Aaro Altonen <a.altonen@hotmail.com>
This PR brings the fix paritytech/substrate#13396 to polkadot-sdk.
In the past, due to insufficient inbound slot count on polkadot & kusama, this fix led to low peer count. The situation has improved since then after changing the default ratio between
--in-peers
&--out-peers
.Nevertheless, it's expected that the reported total peer count with this fix is going to be lower than without it. This should be seen as the correct number of working connections reported, as opposed to also reporting already closed connections, and not as lower count of working connections with peers.
This PR also removes the peer eviction mechanism, as closed substream detection is a more granular way of detecting peers that stopped syncing with us.
The burn-in has been already performed as part of testing these changes in #3426.