Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: node recovery strategies #2076

Open
5 tasks
weboko opened this issue Jul 18, 2024 · 6 comments
Open
5 tasks

feat: node recovery strategies #2076

weboko opened this issue Jul 18, 2024 · 6 comments

Comments

@weboko
Copy link
Collaborator

weboko commented Jul 18, 2024

This is a feature request

Problem

Prerequisite: #2070

Once we can understand in what state a light node is we should provide clear strategies for consumers to recover from undesirable states.

Proposed Solutions

For each node health state we should develop a method that should be triggered to recover from it.
Additionally we should introduce option networkRecovery: boolean that would make triggering of such states automatic.

This behavior should be off by default and tested before making it on by default.

Node health state:

  • sufficiently healthy: no actions needed;
  • minimally healthy:
    • find and establish connection to new peers to fulfill needed requirements:
  • unhealthy:
    • if possible use previous operation (check in practice if it works);
    • if no implement hard reset operation that would re-establish connection to bootstrap nodes and will start all over again;
  • expose mentioned API;
  • implement auto triggering operation if networkRecovery option was provided;

Note: hard reset operation won't help in case if node is offline (from Internet) and we should be clear about it in our API / documentation / behavior.

@weboko
Copy link
Collaborator Author

weboko commented Jul 18, 2024

From discussion:

  • partially healthy strategy:

@weboko
Copy link
Collaborator Author

weboko commented Jul 18, 2024

ping @vpavlin @hackyguru for perspective on the issue as we are not sure if it is needed

@weboko weboko moved this from Triage to Blocked in Waku Jul 18, 2024
@vpavlin
Copy link
Member

vpavlin commented Jul 18, 2024

I am a bit confused by

For each node health state we should develop a method

I'd expect there are basically 2 methods - "find new nodes" and "hard reset" which can potentially be used in any health state by the app dev in case their app notices anything weird?

And then automating it on waku level by setting networkRecovery: true

Or maybe this is what has been said?:)

@weboko
Copy link
Collaborator Author

weboko commented Jul 22, 2024

@vpavlin there could be two methods or just one depending how we find it better.

but the question here is more - do we need it? Have you noticed before such a need when talked to people that use js-waku?

@weboko
Copy link
Collaborator Author

weboko commented Aug 20, 2024

As we don't have enough evidence it would give a lot of improvement for developers - iceboxing for now.

@weboko weboko moved this from Blocked to Icebox in Waku Aug 20, 2024
@weboko
Copy link
Collaborator Author

weboko commented Oct 16, 2024

From https://github.com/waku-org/support/issues/2

Maybe we need a full network wipe feature (zy0n)

Ideally js-waku should be able to recover from bad situations.
Keeping Iceboxed for now, need more feedback after fixing original problem with Filter #2158

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Icebox
Development

No branches or pull requests

2 participants