-
Notifications
You must be signed in to change notification settings - Fork 712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: PoS bootstrap desynchro #3058
Comments
Saw another one today:
|
I suspected that this could be related to nodes trying to enter the network through other participants with lower configs than required which would result in a PoS bootstrap delay. But #3058 (comment) bootstraped from our nodes. |
Note that this is a marginal case and seems to not happen often. |
#3058 (comment) does not remember from what nodes he bootstrapped, but said it happened to him only once and was able to connect through non-initial nodes. #3058 (comment) said that it happens "sometimes" so I assume more than once for him. |
this one however is towards the middle of a cycle |
I was able to replicate it myself multiple times (had to try bootstrapping 3 times for it to work):
|
Was able to reproduce with enhanced logs, investigating now:
And when it succeeded:
It appears the main bootstrap process is ending too soon in some cases, and starts streaming changes when the principal cycles have not finished streaming. |
Enhanced logs of a node acting as bootstrap server, first 2 succeeded last one failed (on client side):
Corresponding logs of the client side:
|
Found where #3058 (comment) is coming from. When a node tries to bootstrap, if final state changes are streamed because of a mid way ledger update, and PoS cycles are still streaming, the client will also try to apply PoS changes deducing the cycle from the ledger update slot. This cycle is not sequential to the ones streamed previously because cycle streaming is not over, therefore creating the discontinuity error. |
#3058 (comment) is fixed in #3079, couldn't reproduce #3058 (comment) but added logs on what I think the problem might be. |
There is a chance that they were both related but I will keep the issue opened until we're sure this is fixed. |
We didn't reproduce this error anymore. |
A user had the following error with version 14.7:
thread 'main' panicked at 'PoS received cycle (31) should be equal to the next expected cycle (34)', massa-pos-exports/src/types.rs:405:17
It's the first time I see this error in production, will have to confirm it is actually an issue before investigating.
The text was updated successfully, but these errors were encountered: