-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: forester: add retry logic for epoch registration #1160
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Further changes are necessary to handle not only active work but subsequent epoch steps correctly for recovered epochs.
Consolidate repeated active phase work processing into a new function `process_epoch_work`. Introduce `register_for_epoch_with_retry` to handle registration retries with a specified maximum number of attempts and delay duration.
Eliminated an unnecessary info log statement when setting epoch flags.
ec1cd25
to
7213266
Compare
} | ||
|
||
// Attempt to recover registration info | ||
let mut registration_info = match self.recover_registration_info(epoch).await { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this do if there is no registration that can be recovered?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is no registration we'll try to register here https://github.com/Lightprotocol/light-protocol/pull/1160/files/7213266f76690b8931a73e01055af9123b07b22a#diff-1e165bd3189acab768bdb1cd6cf4ec33528400511ec68498004b73a25d0646c3R280-R281
return Err(ForesterError::Custom(format!( | ||
"Too late to register for epoch {}. Current slot: {}, Registration end: {}", | ||
epoch, slot, phases.registration.end | ||
))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really want to throw in this case?
An alternative behavior could be to wait for the next registration period.
Or just return if we have logic to wait for the next epoch in a different place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this behaviour is semantically correct because it's part of the process_epoch flow: this case is really an error, it shouldn't happen in a normal situation, and we log it as an error in our logfiles: https://github.com/Lightprotocol/light-protocol/pull/1160/files/7213266f76690b8931a73e01055af9123b07b22a#diff-1e165bd3189acab768bdb1cd6cf4ec33528400511ec68498004b73a25d0646c3R147
Replaced multiple calls to slot_tracker.estimated_current_slot with sync_slot to ensure accurate slot synchronization. Updated sync_slot to return the current slot and adjusted the calling logic accordingly to maintain correct epoch phase handling.
Consolidate repeated active phase work processing into a new function
process_epoch_work
.Introduce
register_for_epoch_with_retry
to handle registration retries with a specified maximum number of attempts and delay duration.