-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EventIndexing: add fast path for multiple new blocks without reorg #3012
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logic LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG. Even though we now have two places with reorg detection, this approach probably adds less complexity than combining everything in a single place.
I agree but I was kind of scared to even attempt combining these code paths. 😅 🙈 |
Description
Recently we moved some event indexing onto the hot path for auction building. This turned out to be surprisingly slow (500ms - 750ms) quite often [1]. Judging by the logs the prod mainnet autopilot entered the event indexing logic ~24K times [2]. Around half the time we hit the code path were do not add 0 or 1 blocks to the index (those are the fast paths) [3].
It looks like the indexing logic assumes that it runs on every block. And if the current block is not the last handled block or the next block after it suspects a reorg happened somewhere (this is the slow path). In that case it fetches the entire range of blocks we consider not finalized to check for the reorg. This effectively means on 50% of indexing runs we fetch the last 64 blocks which takes a lot of time and spams the RPC node.
Changes
I handled the case where multiple new blocks got added without any reorg. For that I fetch the range
[last_handled_block_number, current_block_number]
. If the block hash returned forlast_handled_block_number
is equal to the hash of the last block we handled it means all those blocks are just new but not reorged.This special handling should make it such that for the vast majority of cases we only fetch the number of blocks we missed (+1) instead of >64 blocks.
How to test
I copied one existing ignored test and adjusted it slightly. By waiting for ~25s between 2 updates we know that we hit the problematic code path. While running the test I also added a log
"fetch block"
which prints how many blocks the code would try to fetch. With the original code we can see 3 fetches (127, 64, 66) where 66 is the problematic fetch.With the new changes the same test logged 127, 64 and 3 fetches instead.
Given that the test is ignored (i.e. manual) anyway I'll remove it before merging this PR.
@sunce86 could you please take a close look at this PR? I believe you were very involved in this logic and I find it very hard to understand so I would appreciate a thorough review to not blow up the indexing logic by accident. I tried to make the change as side effect free as possible but TBH I still have only moderate confidence that this is actually correct.
run without change
run with change