Make it easier to keep nodes synced on block info #2480

stronk-dev · 2022-06-29T08:25:07Z

Is your feature request related to a problem? Please describe.
Lately I have been having issues with nodes not updating block info (usually Singapore, sometimes other locations). The orchestrator service stays available, so automatic health checks won't pick up on these issues. This means that I have to randomly restart my nodes in order to backfill block events which is a PITA

Describe the solution you'd like
Assuming that having fairly recent block info is important for being able to receive streams, I would expect the node to:

Keep retrying the defined RPC endpoint to get recent info. It seems that for some situations it just stops trying to pull new block info. Sometimes after an RPC error, but sometimes without any error at all.
Throw an error if it hasn't been able to update block info for a while

An additional solution would be adding the option to define multiple RPC endpoints, so it can rotate between them. This way I can safely use the community node and have Alchemy and offchain RPC endpoints as backup. Since they all have their stability issues it would be nice if we could just use them all

leszko · 2022-06-29T12:55:01Z

Related to #1959

I think we should implement the failover RPC endpoints.

leszko · 2022-07-04T14:16:29Z

I dug a little into the issue and I think it's related to how the current block polling mechanism works. Livepeer polls every block from the chain to update its internal caches. Arbitrum mines a lot of blocks in a short period of time. So if your block polling interval is long, e.g., 60s, then it needs to poll ~60-100 blocks one by one. What's worse, if the RPC endpoint is unreachable for some time, Livepeer may never be able to catch up with the missing blocks. This result in the inability to transcode.

I'm thinking now about the fix for this. Will keep you posted.

github-actions bot added the status: triage this issue has not been evaluated yet label Jun 29, 2022

leszko self-assigned this Jul 1, 2022

leszko added type: bug Something isn't working area: orchestrator QoL area: blockchain and removed status: triage this issue has not been evaluated yet labels Jul 1, 2022

leszko mentioned this issue Jul 5, 2022

eth: Backfill blocks in batches #2489

Merged

5 tasks

leszko closed this as completed in #2489 Jul 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make it easier to keep nodes synced on block info #2480

Make it easier to keep nodes synced on block info #2480

stronk-dev commented Jun 29, 2022

leszko commented Jun 29, 2022

leszko commented Jul 4, 2022

Make it easier to keep nodes synced on block info #2480

Make it easier to keep nodes synced on block info #2480

Comments

stronk-dev commented Jun 29, 2022

leszko commented Jun 29, 2022

leszko commented Jul 4, 2022