Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: PoS bootstrap desynchro #3058

Closed
Eitu33 opened this issue Sep 22, 2022 · 15 comments
Closed

Bug: PoS bootstrap desynchro #3058

Eitu33 opened this issue Sep 22, 2022 · 15 comments
Labels
bug Something isn't working

Comments

@Eitu33
Copy link
Contributor

Eitu33 commented Sep 22, 2022

A user had the following error with version 14.7: thread 'main' panicked at 'PoS received cycle (31) should be equal to the next expected cycle (34)', massa-pos-exports/src/types.rs:405:17

It's the first time I see this error in production, will have to confirm it is actually an issue before investigating.

@Eitu33
Copy link
Contributor Author

Eitu33 commented Sep 23, 2022

Saw another one today:

2022-09-23T09:33:14.448528Z  INFO massa_bootstrap::client: Bootstrap from server 51.75.60.228:31245 failed. Your node will try to bootstrap from another server in 60s.          
2022-09-23T09:34:14.450434Z  INFO massa_bootstrap::client: Start bootstrapping from 158.69.23.120:31245                                                         
thread 'main' panicked at 'PoS received cycle (79) should be equal to the next expected cycle (81)', massa-pos-exports/src/types.rs:405:17                                                   
stack backtrace: 

@Eitu33
Copy link
Contributor Author

Eitu33 commented Sep 23, 2022

I suspected that this could be related to nodes trying to enter the network through other participants with lower configs than required which would result in a PoS bootstrap delay. But #3058 (comment) bootstraped from our nodes.

@Eitu33 Eitu33 changed the title PoS bootstrap desynchro (not confirmed) Bug: PoS bootstrap desynchro Sep 23, 2022
@Eitu33
Copy link
Contributor Author

Eitu33 commented Sep 23, 2022

Note that this is a marginal case and seems to not happen often.

@Eitu33
Copy link
Contributor Author

Eitu33 commented Sep 23, 2022

#3058 (comment) does not remember from what nodes he bootstrapped, but said it happened to him only once and was able to connect through non-initial nodes. #3058 (comment) said that it happens "sometimes" so I assume more than once for him.

@Eitu33
Copy link
Contributor Author

Eitu33 commented Sep 23, 2022

Third one:
image

@damip
Copy link
Member

damip commented Sep 23, 2022

Third one: image

this one towards the end of cycle 92 (period 119/128)

@damip
Copy link
Member

damip commented Sep 23, 2022

Saw another one today:

2022-09-23T09:33:14.448528Z  INFO massa_bootstrap::client: Bootstrap from server 51.75.60.228:31245 failed. Your node will try to bootstrap from another server in 60s.          
2022-09-23T09:34:14.450434Z  INFO massa_bootstrap::client: Start bootstrapping from 158.69.23.120:31245                                                         
thread 'main' panicked at 'PoS received cycle (79) should be equal to the next expected cycle (81)', massa-pos-exports/src/types.rs:405:17                                                   
stack backtrace: 

this one however is towards the middle of a cycle

@altai0
Copy link
Contributor

altai0 commented Sep 25, 2022

image
image
image

similar errors

@damip
Copy link
Member

damip commented Sep 25, 2022

I was able to replicate it myself multiple times (had to try bootstrapping 3 times for it to work):

�[2m2022-09-25T12:21:54.256133Z�[0m �[33m WARN�[0m �[2mmassa_bootstrap::client�[0m�[2m:�[0m Error while bootstrapping: pos error: Overflow error: invalid cycle sequence in PoS final state


�[2m2022-09-25T12:23:02.514495Z�[0m �[33m WARN�[0m �[2mmassa_bootstrap::client�[0m�[2m:�[0m Error while bootstrapping: pos error: Overflow error: invalid cycle sequence in PoS final state

@Eitu33
Copy link
Contributor Author

Eitu33 commented Sep 26, 2022

Was able to reproduce with enhanced logs, investigating now:

2022-09-26T09:51:31.800776Z  INFO massa_node: Node version : TEST.14.7
2022-09-26T09:51:31.808357Z  INFO massa_bootstrap::client: Start bootstrapping from 149.202.89.125:31245
2022-09-26T09:51:33.044330Z  WARN massa_pos_exports::types: [main bootstrap] last cycle: None | expected cycle: None | received cycle: 206 | received cyle completion: true    
2022-09-26T09:51:33.186826Z  WARN massa_pos_exports::types: [main bootstrap] last cycle: Some(206) | expected cycle: Some(207) | received cycle: 207 | received cyle completion: true    
2022-09-26T09:51:33.190638Z  WARN massa_pos_exports::pos_final_state_impl: [changes bootstrap] last cycle: 207 | expected cycle: 208 | received cycle (from given slot): 210    
2022-09-26T09:51:33.190683Z  WARN massa_bootstrap::client: Error while bootstrapping: pos error: Overflow error: invalid cycle sequence in PoS final state
2022-09-26T09:51:33.191083Z  INFO massa_bootstrap::client: Bootstrap from server 149.202.89.125:31245 failed. Your node will try to bootstrap from another server in 60s.
^C2022-09-26T09:51:54.488847Z  INFO massa_node: interrupt signal received in bootstrap loop

And when it succeeded:

2022-09-26T09:50:38.385554Z  INFO massa_node: Node version : TEST.14.7
2022-09-26T09:50:38.397363Z  INFO massa_bootstrap::client: Start bootstrapping from 198.27.74.5:31245
2022-09-26T09:50:40.650272Z  WARN massa_pos_exports::types: [main bootstrap] last cycle: None | expected cycle: None | received cycle: 206 | received cyle completion: true    
2022-09-26T09:50:40.844005Z  WARN massa_pos_exports::types: [main bootstrap] last cycle: Some(206) | expected cycle: Some(207) | received cycle: 207 | received cyle completion: true    
2022-09-26T09:50:41.120563Z  WARN massa_pos_exports::types: [main bootstrap] last cycle: Some(207) | expected cycle: Some(208) | received cycle: 208 | received cyle completion: true    
2022-09-26T09:50:41.279269Z  WARN massa_pos_exports::types: [main bootstrap] last cycle: Some(208) | expected cycle: Some(209) | received cycle: 209 | received cyle completion: true    
2022-09-26T09:50:41.481300Z  WARN massa_pos_exports::types: [main bootstrap] last cycle: Some(209) | expected cycle: Some(210) | received cycle: 210 | received cyle completion: false    
2022-09-26T09:50:41.485609Z  INFO massa_bootstrap::client: State bootstrap complete
2022-09-26T09:50:43.393284Z  INFO massa_bootstrap::client: Successful bootstrap

It appears the main bootstrap process is ending too soon in some cases, and starts streaming changes when the principal cycles have not finished streaming.

@Eitu33
Copy link
Contributor Author

Eitu33 commented Sep 26, 2022

Enhanced logs of a node acting as bootstrap server, first 2 succeeded last one failed (on client side):

2022-09-26T12:48:44.215232Z  WARN massa_bootstrap::server: (main bootstrap (server)) last streaming step: Started | new streaming step: Ongoing(211)
2022-09-26T12:48:44.990682Z  WARN massa_pos_exports::types: (main bootstrap (server)) cursor last cycle: 211 | state last cycle Some(215)
2022-09-26T12:48:44.992565Z  WARN massa_bootstrap::server: (main bootstrap (server)) last streaming step: Ongoing(211) | new streaming step: Ongoing(212)
2022-09-26T12:48:44.992816Z  WARN massa_pos_exports::types: (main bootstrap (server)) cursor last cycle: 212 | state last cycle Some(215)
2022-09-26T12:48:44.994748Z  WARN massa_bootstrap::server: (main bootstrap (server)) last streaming step: Ongoing(212) | new streaming step: Ongoing(213)
2022-09-26T12:48:45.175307Z  WARN massa_pos_exports::types: (main bootstrap (server)) cursor last cycle: 213 | state last cycle Some(215)
2022-09-26T12:48:45.180218Z  WARN massa_bootstrap::server: (main bootstrap (server)) last streaming step: Ongoing(213) | new streaming step: Ongoing(214)
2022-09-26T12:48:45.394616Z  WARN massa_pos_exports::types: (main bootstrap (server)) cursor last cycle: 214 | state last cycle Some(215)
2022-09-26T12:48:45.396425Z  WARN massa_bootstrap::server: (main bootstrap (server)) last streaming step: Ongoing(214) | new streaming step: Ongoing(215)
2022-09-26T12:48:45.515074Z  WARN massa_pos_exports::types: (main bootstrap (server)) cursor last cycle: 215 | state last cycle Some(215)
2022-09-26T12:48:45.515118Z  WARN massa_bootstrap::server: (main bootstrap (server)) last streaming step: Ongoing(215) | new streaming step: Finished
2022-09-26T12:53:19.189166Z  WARN massa_bootstrap::server: (main bootstrap (server)) last streaming step: Started | new streaming step: Ongoing(211)
2022-09-26T12:53:19.895155Z  WARN massa_pos_exports::types: (main bootstrap (server)) cursor last cycle: 211 | state last cycle Some(215)
2022-09-26T12:53:19.898724Z  WARN massa_bootstrap::server: (main bootstrap (server)) last streaming step: Ongoing(211) | new streaming step: Ongoing(212)
2022-09-26T12:53:20.102816Z  WARN massa_pos_exports::types: (main bootstrap (server)) cursor last cycle: 212 | state last cycle Some(215)
2022-09-26T12:53:20.109084Z  WARN massa_bootstrap::server: (main bootstrap (server)) last streaming step: Ongoing(212) | new streaming step: Ongoing(213)
2022-09-26T12:53:20.192451Z  WARN massa_pos_exports::types: (main bootstrap (server)) cursor last cycle: 213 | state last cycle Some(215)
2022-09-26T12:53:20.198992Z  WARN massa_bootstrap::server: (main bootstrap (server)) last streaming step: Ongoing(213) | new streaming step: Ongoing(214)
2022-09-26T12:53:20.302722Z  WARN massa_pos_exports::types: (main bootstrap (server)) cursor last cycle: 214 | state last cycle Some(215)
2022-09-26T12:53:20.309339Z  WARN massa_bootstrap::server: (main bootstrap (server)) last streaming step: Ongoing(214) | new streaming step: Ongoing(215)
2022-09-26T12:53:20.364090Z  WARN massa_pos_exports::types: (main bootstrap (server)) cursor last cycle: 215 | state last cycle Some(215)
2022-09-26T12:53:20.364184Z  WARN massa_bootstrap::server: (main bootstrap (server)) last streaming step: Ongoing(215) | new streaming step: Finished
2022-09-26T12:58:21.318667Z  WARN massa_bootstrap::server: (main bootstrap (server)) last streaming step: Started | new streaming step: Ongoing(212)
2022-09-26T12:58:22.559715Z  WARN massa_pos_exports::types: (main bootstrap (server)) cursor last cycle: 212 | state last cycle Some(216)
2022-09-26T12:58:22.566268Z  WARN massa_bootstrap::server: (main bootstrap (server)) last streaming step: Ongoing(212) | new streaming step: Ongoing(213)
2022-09-26T12:58:22.759936Z  WARN massa_pos_exports::types: (main bootstrap (server)) cursor last cycle: 213 | state last cycle Some(216)
2022-09-26T12:58:22.766511Z  WARN massa_bootstrap::server: (main bootstrap (server)) last streaming step: Ongoing(213) | new streaming step: Ongoing(214)
2022-09-26T12:58:23.057092Z  WARN massa_pos_exports::types: (main bootstrap (server)) cursor last cycle: 214 | state last cycle Some(216)
2022-09-26T12:58:23.063640Z  WARN massa_bootstrap::server: (main bootstrap (server)) last streaming step: Ongoing(214) | new streaming step: Ongoing(215)
2022-09-26T12:58:23.170641Z  WARN massa_pos_exports::types: (main bootstrap (server)) cursor last cycle: 215 | state last cycle Some(216)
2022-09-26T12:58:23.175446Z  WARN massa_bootstrap::server: (main bootstrap (server)) last streaming step: Ongoing(215) | new streaming step: Ongoing(216)
2022-09-26T12:58:23.265119Z  WARN massa_pos_exports::types: (main bootstrap (server)) cursor last cycle: 216 | state last cycle Some(216)
2022-09-26T12:58:23.265180Z  WARN massa_bootstrap::server: (main bootstrap (server)) last streaming step: Ongoing(216) | new streaming step: Finished

Corresponding logs of the client side:

2022-09-26T12:58:20.575511Z  INFO massa_node: Node version : TEST.14.7
2022-09-26T12:58:20.583072Z  INFO massa_bootstrap::client: Start bootstrapping from 195.134.167.226:31245
2022-09-26T12:58:22.690755Z  WARN massa_pos_exports::types: (main bootstrap) last cycle: None | expected cycle: None | received cycle: 212 | received cyle completion: true
2022-09-26T12:58:22.693924Z  WARN massa_bootstrap::client: (main bootstrap) streaming step: Ongoing(212)
2022-09-26T12:58:22.928078Z  WARN massa_pos_exports::types: (main bootstrap) last cycle: Some(212) | expected cycle: Some(213) | received cycle: 213 | received cyle completion: true
2022-09-26T12:58:22.932093Z  WARN massa_bootstrap::client: (main bootstrap) streaming step: Ongoing(213)
2022-09-26T12:58:22.932344Z  WARN massa_bootstrap::client: (main bootstrap) trying to apply PoS changes on an unfinished bootstrap
2022-09-26T12:58:22.932433Z  WARN massa_bootstrap::client: (main bootstrap) trying to apply PoS changes on an unfinished bootstrap
2022-09-26T12:58:22.932505Z  WARN massa_bootstrap::client: (main bootstrap) trying to apply PoS changes on an unfinished bootstrap
2022-09-26T12:58:22.932574Z  WARN massa_bootstrap::client: (main bootstrap) trying to apply PoS changes on an unfinished bootstrap
2022-09-26T12:58:23.093902Z  WARN massa_pos_exports::types: (main bootstrap) last cycle: Some(213) | expected cycle: Some(214) | received cycle: 214 | received cyle completion: true
2022-09-26T12:58:23.098003Z  WARN massa_bootstrap::client: (main bootstrap) streaming step: Ongoing(214)
2022-09-26T12:58:23.250929Z  WARN massa_pos_exports::types: (main bootstrap) last cycle: Some(214) | expected cycle: Some(215) | received cycle: 215 | received cyle completion: true
2022-09-26T12:58:23.255583Z  WARN massa_bootstrap::client: (main bootstrap) streaming step: Ongoing(215)
2022-09-26T12:58:23.526092Z  WARN massa_pos_exports::types: (main bootstrap) last cycle: Some(215) | expected cycle: Some(216) | received cycle: 216 | received cyle completion: false
2022-09-26T12:58:23.529823Z  WARN massa_bootstrap::client: (main bootstrap) streaming step: Ongoing(216)

@Eitu33
Copy link
Contributor Author

Eitu33 commented Sep 26, 2022

Found where #3058 (comment) is coming from.

When a node tries to bootstrap, if final state changes are streamed because of a mid way ledger update, and PoS cycles are still streaming, the client will also try to apply PoS changes deducing the cycle from the ledger update slot. This cycle is not sequential to the ones streamed previously because cycle streaming is not over, therefore creating the discontinuity error.

@Eitu33
Copy link
Contributor Author

Eitu33 commented Sep 26, 2022

#3058 (comment) is fixed in #3079, couldn't reproduce #3058 (comment) but added logs on what I think the problem might be.

@Eitu33
Copy link
Contributor Author

Eitu33 commented Sep 26, 2022

There is a chance that they were both related but I will keep the issue opened until we're sure this is fixed.

@Eitu33 Eitu33 added the bug Something isn't working label Sep 26, 2022
@AurelienFT
Copy link
Contributor

We didn't reproduce this error anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants