-
Notifications
You must be signed in to change notification settings - Fork 374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry & backoff mechanism in worker loop #913
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ancazamfir
reviewed
May 14, 2021
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, i tested by making the packet worker fail on send_packet event and successful in clearing packets. It switches nicely between event based relaying (passive mode) to packet clearing (active mode) and then back to event. Should be good for now.
ancazamfir
approved these changes
May 14, 2021
hu55a1n1
pushed a commit
to hu55a1n1/hermes
that referenced
this pull request
Sep 13, 2022
* Retry strategy in uni-directional worker * Better retry in worker. Default * Re-add retries in uni_chan_path worker * Cleanup * Ensure total duration of worker retries does not exceed 2 seconds * Notify supervisor when unichan worker stops after exceeding max retries * Increase max total retry delay in event monitor to 10min * Make event monitor less verbose in log level higher than trace * Fix bug where worker would retry with the next event batch for every retry * Do not stop the worker after exhausting retry attempts * Cleanup * Fix compilation error * Set clear packets again after failing to retry * Update changelog * Add unit test for clamp_delay and fix case where it would overshoot the max delay Co-authored-by: Romain Ruetschi <romain@informal.systems>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes: #943
Follow-up to: #903
Description
Adds the Fibonacci-based retry mechanism to method
run_uni_chan_path()
Ensure we don't retry for more than 2 seconds
Fix the
sending on a disconnected channel
dangling error (details in the trace below)Show trace
Ensure we can properly resume relaying. At the moment the relayer does not receive events from the chain that went down after the chain is restarted. Fixed: what happened was that the event monitor stopped retrying before the node went back up again.
Do not stop the workers when the node goes down because otherwise we don't pick up on packets that were sent after the worker got stopped until some more packets are sent so that the worker starts again and clears them.
For contributor use:
docs/
) and code comments.Files changed
in the Github PR explorer.