Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate implementing grandpa catch up request and response #2950

Closed
jimjbrettj opened this issue Nov 14, 2022 · 6 comments
Closed

Investigate implementing grandpa catch up request and response #2950

jimjbrettj opened this issue Nov 14, 2022 · 6 comments

Comments

@jimjbrettj
Copy link
Contributor

Issue summary

  • We need to implement the logic to handle grandpa catch up requests and responses for grandpa.
  • This issue is to investigate what this would involve.

Other information and links

@kishansagathiya
Copy link
Contributor

kishansagathiya commented Nov 15, 2022

There is already an issue about this. Planning to revive my old PR (#2152) on this after eclesio's Grandpa PR is merged

@EclesioMeloJunior
Copy link
Member

EclesioMeloJunior commented Nov 16, 2022

@kishansagathiya I believe the PR you mentioned can be broken in separate work streams with smaller and consise PR's, but I would like to discuss some points you have in your PR:

ensure catch-up logic works when interacting with other gossamer nodes

The catch-up logic covers from the node bootstrap until checking neighbor messages, so it contains a very large scope and I believe we should identify and isolate the scopes to avoid big PR's

If the round in the neighbor message is ahead of our current round by a threshold, send a catch-up request

This point should be addressed in its own workstream

process the catch-up response, if we can't process it at the moment, store it to process later

Every catch-up request that gossamer receives from other node must be answered in the same stream, what do you mean by can't process it at the moment I don't think we should implement a catch-up tracker

It is important to notice that a catch-up request that we receive from another node should decrease the sender's reputation (as substrate does), so we charge a node's reputation in order to prevent spamming

Another work stream is to do the very first catch-up, currently Gossamer does not execute a catch-up request before starting the first round, this behavior in a running network can cause problems, so we should address this behavior to ensure Gossamer starts in the right round (when a grandpa authority)

I believe we can put this issue as part of epic #1531
@timwu20 @jimjbrettj

@kishansagathiya
Copy link
Contributor

kishansagathiya commented Nov 17, 2022

@EclesioMeloJunior @timwu20

The catch-up logic covers from the node bootstrap until checking neighbor messages, so it contains a very large scope and I believe we should identify and isolate the scopes to avoid big PR's

Do you mean the node bootstrap and checking neighbor messages?
In that PR, I have implemented catch up in case of neighbor messages. As far as I remember, I have not implemented catch in case of bootstrap. In case, we want to do that, it is not a huge change, since all we have to do is use that catch up function once more.

Also, even if we don't do deliberate catch up on bootstrap, we would still catch up, because we would be receiving neighbour messages rather often.

Every catch-up request that gossamer receives from other node must be answered in the same stream, what do you mean by can't process it at the moment I don't think we should implement a catch-up tracker

I have talked about processing catch up response later, not the request. We are responding to catch-up request in the same stream. Having a catch up response tracker is appropriate and safe.

It is important to notice that a catch-up request that we receive from another node should decrease the sender's reputation (as substrate does), so we charge a node's reputation in order to prevent spamming

This being a separate task makes sense to me.

Another work stream is to do the very first catch-up,

Isn't this same as your first point of catching up on node bootstrap?

Also, even if we don't do deliberate catch up on bootstrap, we would still catch up, because we would be receiving neighbour messages rather often.

@EclesioMeloJunior
Copy link
Member

In that PR, I have implemented catch up in case of neighbor messages

I mean, your PR name is feat(lib/grandpa): ensure catch-up logic works and in the PR body description is not explicit you are implementing only the catch up in case of neighbor messages, so I thought that your PR was doing all catch-up related work, but if your PR only addresses the catch up in case of neighbor messages so I would suggest to rename it + point it out in the description.

Having a catch up response tracker is appropriate and safe

Oh I see, right! I think we should address this in its own PR, what do you think? @kishansagathiya

Also, even if we don't do deliberate catch up on bootstrap, we would still catch up, because we would be receiving neighbour messages rather often

Right, and the point is basically once the node is in sync with the net so we should ensure the node is the right round/set id of the other authorities (this should not be done in the node bootstrap as I said earlier 😅 )

@kishansagathiya
Copy link
Contributor

Oh I see, right! I think we should address this in its own PR, what do you think? @kishansagathiya

I don't think that is necessary. @EclesioMeloJunior Let your grandpa refactor get merged. Let me revive that PR, I am happy to break up extra task if I find more things.

I thought that your PR was doing all catch-up related work, but if your PR only addresses the catch up in case of neighbor messages so I would suggest to rename it + point it out in the description.

We should catch up when we find that we are lagging behind. Neighbour messages are the best way to find that we are lagging behind. We receive them rather often. I don't see any other instance when we should catch up. Last I read, spec/rust impl also recommended the same thing. If you have any other instance in mind let me know.

@jimjbrettj
Copy link
Contributor Author

stale issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants