Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingestion engine does not handle EN falling far behind #5298

Closed
Tracked by #5270
j1010001 opened this issue Jan 25, 2024 · 2 comments
Closed
Tracked by #5270

Ingestion engine does not handle EN falling far behind #5298

j1010001 opened this issue Jan 25, 2024 · 2 comments
Assignees

Comments

@j1010001
Copy link
Member

j1010001 commented Jan 25, 2024

The problem:
When Execution node falls far behind the latest block (for example when it is stopped for a longer period of time) it can fail to start and enters a crash loop, which causes extra load on the network. The root cause is the ingestion engine, which attempts to request collections for all unexecuted blocks. This loads the collection nodes with many requests, and creates high traffic on the network. The EN then fails to fit all the collections into memory, crashes and this cycle repeats.

Goal: Refactor ingestion engine to handle this edge case.

@zhangchiqing
Copy link
Member

zhangchiqing commented Jan 29, 2024

Two Approaches for implemention

Goal: implement a stateful ingestion module that keeps track of the next X number of blocks to get ready for execution.

Requirements:

  • Blocks can be on different forks, especially when execution is ahead of finalization, we need to make sure all valid blocks are ingested.
  • When execution is far behind finalization, it should not ingest all finalized blocks, but only the next X number of unexecuted blocks, otherwise it might risk hitting OOM.
  • Should not be blocking the BlockExecuted event, otherwise it would slow down the block execution.
  • Should not be blocking the BlockProcessable event, otherwise it would slow down consensus algorithm fetching and processing blocks.

Challenges:

  • The ingestion module needs to handle both block executed events and block processable events, and check if they would trigger more blocks to be loaded. However, the two events happen concurrently.
  • The ingestion module must guarantee the parent block is always ingested before its child block.
  • The ingestion module must guarantee no block is missing to be ingested, even with restart.

Approach 1: In Memory state machine

  • Subscribing the BlockProcessed and BlockExecuted event
  • keep track of highest executed height, ingest all blocks below the height (highest_executed_height + 500), (say we ingest blocks up to 500 heights)

Approach 2: Build a BlockProcessable history on disk

  • Build a history of block Procesable history as jobs and save it to disk.
  • The ingestion module could create a single worker load 500 jobs at most from the job queue.

@j1010001 j1010001 changed the title Refactor ingestion engine to support catching up from far behind Ingestion engine does not handle EN falling far behind Jan 30, 2024
@zhangchiqing
Copy link
Member

Related to #5161

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants