This repository has been archived by the owner on Oct 28, 2021. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Garbage collect incompatible peers in Host::run() #5624
Garbage collect incompatible peers in Host::run() #5624
Changes from 5 commits
d7741b1
06b00ce
a587575
64f120c
e65197a
71d2ea6
b0d5485
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe better
onHandshakeFailed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I would make it public and declare close to
startPeerSession
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gumb0 : Why make this public, does it make sense to expose the concept of a handshake to consumers of Host?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well it looks to me like a callback similar to
startPeerSession
, the callback called byRLPXHandshake
when handshake is finished. One is for success another one is for failure.It works as private, because
RLPXHandshake
is a friend ofHost
, but idealluy we should get rid of this friend declarations at some point.Exposing it to the clients of
Host
is of course not great, but the proper way to deal with it could be to create a separate interface with these callbacks only, don't expose it toHost
clients, but pass it only toRLPXHandshake
. That's a bit complicated change, at least it's not for this PR.(In other words, we won't make it much worse, because
startPeerSession
is already public)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah that makes sense, thank you for clarifying! 😄 I'll make the change before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about
UserReason
- this could in theory be any kind of reason specific to subprotocol, including some temporary reasons, maybe it makes sense to try to reconnectThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose to treat UserReason as terminal because we only use it in 2 cases - if status validation fails:
aleth/libethereum/BlockChainSync.cpp
Lines 190 to 202 in 505aead
Here's where we perform the actual validation in EthereumPeer::validate:
aleth/libethereum/EthereumPeer.cpp
Lines 58 to 68 in 505aead
aleth/libp2p/Session.cpp
Lines 409 to 418 in 505aead
You bring up a good point though, technically it can be any reason specific to a subprotocol and other clients can also choose to send it to us for reasons specific to their implementation. I think that rather than treating it as immediately critical we should use it in some sort of reconnection count threshold.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gumb0 I've decided to change
UserReason
->UselessPeer
when status message validation fails for a peer (since if that happens the peer is effectively useless to us i.e. we can't sync with it). I've also added in logic inPeer::isUseless
to take the number of failed connection attempts into account when the last disconnect isn't critical.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
m_failedAttempts
seems to affect only the value of fallbackSeconds currently. Maybe we should use it now to retry several times and then go to "critical error, disconnect" state.(at least for some cases of failures)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gumb0 Good idea - what do you think of this being taken care of in another PR? I'd like to limit the amount of changes I make to the peer gc logic in this PR so if something ends up breaking it will be easier to debug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok for another PR, but it seems to be the matter of only adding condition like
m_failedAttempts >= 20
touselessPeer
function.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional thought is that we could make all this change more conservative if we leave
fallbackSeconds()
as it was before (so that we do the same reconnects as before) and have just this check for failed attempts count inuselessPeer
(plus the new check for handshake failures)This way it would reconnect with the same intervals for each case as before, but stop after limited number of attempts.
But I'm fine with it if you think it's better to immediately stop in some cases.