Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it easier to keep nodes synced on block info #2480

Closed
stronk-dev opened this issue Jun 29, 2022 · 2 comments · Fixed by #2489
Closed

Make it easier to keep nodes synced on block info #2480

stronk-dev opened this issue Jun 29, 2022 · 2 comments · Fixed by #2489
Assignees

Comments

@stronk-dev
Copy link
Contributor

Is your feature request related to a problem? Please describe.
Lately I have been having issues with nodes not updating block info (usually Singapore, sometimes other locations). The orchestrator service stays available, so automatic health checks won't pick up on these issues. This means that I have to randomly restart my nodes in order to backfill block events which is a PITA

Describe the solution you'd like
Assuming that having fairly recent block info is important for being able to receive streams, I would expect the node to:

  • Keep retrying the defined RPC endpoint to get recent info. It seems that for some situations it just stops trying to pull new block info. Sometimes after an RPC error, but sometimes without any error at all.
  • Throw an error if it hasn't been able to update block info for a while

An additional solution would be adding the option to define multiple RPC endpoints, so it can rotate between them. This way I can safely use the community node and have Alchemy and offchain RPC endpoints as backup. Since they all have their stability issues it would be nice if we could just use them all

@github-actions github-actions bot added the status: triage this issue has not been evaluated yet label Jun 29, 2022
@leszko
Copy link
Contributor

leszko commented Jun 29, 2022

Related to #1959

I think we should implement the failover RPC endpoints.

@leszko leszko self-assigned this Jul 1, 2022
@leszko leszko added type: bug Something isn't working area: orchestrator QoL area: blockchain and removed status: triage this issue has not been evaluated yet labels Jul 1, 2022
@leszko
Copy link
Contributor

leszko commented Jul 4, 2022

I dug a little into the issue and I think it's related to how the current block polling mechanism works. Livepeer polls every block from the chain to update its internal caches. Arbitrum mines a lot of blocks in a short period of time. So if your block polling interval is long, e.g., 60s, then it needs to poll ~60-100 blocks one by one. What's worse, if the RPC endpoint is unreachable for some time, Livepeer may never be able to catch up with the missing blocks. This result in the inability to transcode.

I'm thinking now about the fix for this. Will keep you posted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants