-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lib/grandpa: ensure grandpa catch-up logic works #2983
Conversation
- ensure grandpa catch-up logic works when interacting with substrate nodes - If round in the neighbour message is ahead of our current round by a threshold, send a catch up request - process the catch up response, if we can't process it at the moment, store it to process later. Closes #1531
return fmt.Errorf("failed to send catch up request: %w", err) | ||
} | ||
|
||
logger.Debugf("successfully sent a catch up request to node %s, for round number %d and set ID %d", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger.Debugf("successfully sent a catch up request to node %s, for round number %d and set ID %d", | |
logger.Debugf("successfully sent a catch up request to peer %s, for round %d and set ID %d", |
case <-c.catchUpResponseCh: | ||
return nil | ||
case <-timer.C: | ||
return errors.New("timeout") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be nice to have this as an sentinel error like: ErrCatchUpTimeout
lib/grandpa/catch-up.go
Outdated
logger.Debugf("successfully sent a catch up request to node %s, for round number %d and set ID %d", | ||
to, round, setID) | ||
|
||
c.waitingOnResponse.Store(true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you call this c.waitingOnResponse.Store(true)
function in sendCatchUpRequest
and call here as well, should you remove this one?
} | ||
|
||
logger.Debugf( | ||
"processing catch up response with hash %s for round %d and set id %d", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"processing catch up response with hash %s for round %d and set id %d", | |
"processing catch up response with hash %s for round %d and set ID %d", |
msg.Hash, msg.Round, msg.SetID) | ||
|
||
// if we aren't currently expecting a catch up response, return | ||
if !c.waitingOnResponse.Load().(bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if !c.waitingOnResponse.Load().(bool) { | |
// TODO: decrease the peer reputation | |
if !c.waitingOnResponse.Load().(bool) { |
if msg.Hash.IsEmpty() || msg.Number == 0 { | ||
return ErrGHOSTlessCatchUp | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could be the second verification step in this function placed right after the if !c.grandpa.authority
c.requestsSent = make(map[peer.ID]CatchUpRequest) | ||
c.lock.Unlock() | ||
|
||
logger.Debugf("caught up to round; unpaused service and grandpa state round is %d", c.grandpa.state.round) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger.Debugf("caught up to round; unpaused service and grandpa state round is %d", c.grandpa.state.round) | |
logger.Debugf("caught up to round; starting at round %d", c.grandpa.state.round) |
return nil | ||
} | ||
|
||
func (c *catchUp) verifyPreCommitJustification(msg *CatchUpResponse) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function already exists in lib/grandpa/message_handler.go
, why not use it?
return nil | ||
} | ||
|
||
func (c *catchUp) verifyPreVoteJustification(msg *CatchUpResponse) (common.Hash, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function already exists in lib/grandpa/message_handler.go
, why not use it?
// TODO: Clean up all request sent before 5 min / (neighbour message interval) | ||
c.lock.Lock() | ||
_, ok := c.requestsSent[to] | ||
c.lock.Unlock() | ||
if ok { | ||
logger.Debugf("ignoring neighbour message since we already sent a catch-up request to this peer: %s", to) | ||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we implement a TTL on this map?
Will there be cases where you could ask for multiple catch up messages to the same peer?
@@ -0,0 +1,268 @@ | |||
// Copyright 2021 ChainSafe Systems (ON) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you rename to catch_up.go
.
} | ||
|
||
c.lock.Lock() | ||
c.requestsSent[to] = *req |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why we need to copy the req
here? Can this map hold *CatchupRequest
?
requestsSent map[peer.ID]CatchUpRequest | ||
|
||
catchUpResponseCh chan *CatchUpResponse | ||
waitingOnResponse *atomic.Value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
waitingOnResponse *atomic.Value | |
waitingOnResponse *atomic.Bool |
case VoteMessageType, CommitMessageType: | ||
s.network.GossipMessage(msg) | ||
return true, nil | ||
case NeighborMessageType, CatchUpRequestMessageType, CatchUpResponseMessageType: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we have a default
case that returns an error for unsupported messages?
it seems like polkadot sends current round in neighbor message instead of last finalised round. So, I had to send a catch request with one round previous. Otherwise, polkadot would discard our request.
can you mark as draft or close @kishansagathiya ? |
Shall we merge this some day or put the code somewhere else? |
When I tried to do this, I could not finish this task completely. I was able to send a catch up request to polkadot, I could see the request on logs of polkadot. I could see in polkadot logs that it was sending a response back, but on gossamer side I was not able to see the response. I tried to do this for quite a few days. Got some other folks to help as well, but could not get any results. And eventually gave up. In gossamer, there is an active task of grandpa refactor is going on (in branch |
closing for too old |
Changes
threshold, send a catch up request
store it to process later.
Tests
Issues
Primary Reviewer
@timwu20