Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROPOSAL] Event indexer counters #2965

Closed
wants to merge 1 commit into from
Closed

Conversation

squadgazzz
Copy link
Contributor

@squadgazzz squadgazzz commented Sep 10, 2024

Description

During one of the bugfixes(#2931) it was noticed that some of the event handlers might spend a lot of time and resources on each restart due to the absence of the corresponding items in the DB. In that case, preconfigured start block numbers are used that, in most cases, are too old. As a result, on each restart, a background task might fetch millions of blocks on networks with a short round time(such as arbitrum one).

Changes

As a proposal, store indexer counters in a separate table each time a portion of new blocks are processed. It is currently impossible to use existing append_events function since it operates with pre-filtered events only that can never happen if we don't have a single event for the specified type on the chain.

Driver crate doesn't have access to the DB currently, which will be required in case this proposal is accepted.

Copy link

Reminder: Please update the DB Readme.


Caused by:

@squadgazzz squadgazzz marked this pull request as ready for review September 10, 2024 12:23
@squadgazzz squadgazzz requested a review from a team as a code owner September 10, 2024 12:23
@MartinquaXD
Copy link
Contributor

I think the idea of storing which block was indexed (even if it didn't contain an event we are interested in) makes a lot of sense.
However, instead of solving this problem ourselves we should consider researching if this was not already solved better by other people.
Also the driver requiring access to the DB would not be great. But there are ways around this which we might have to implement sooner rather than later anyway.

@squadgazzz
Copy link
Contributor Author

However, instead of solving this problem ourselves we should consider researching if this was not already solved better by other people.

We can use other indexers, such as The Graph, but introducing a new dependency and integrating it into the system seems like an overkill for the output of the whole activity.

Also the driver requiring access to the DB would not be great. But there are ways around this which we might have to implement sooner rather than later anyway.

The same applies to the persistence layer. Bringing more dependencies, even with lightweight/embedded solutions, seems like a much bigger change than adding a small table. For the driver specifically, either add a URL to the existing PG DB or a completely new embedded DB dependency with k8s volumes within will be storing a few counter, where I would still be more inclined with the former.

We can probably start with the PG table and, if we have time, integrate The Graph or an embedded DB.

Wdyt?

Copy link

This pull request has been marked as stale because it has been inactive a while. Please update this pull request or it will be automatically closed.

Copy link

github-actions bot commented Oct 2, 2024

This pull request has been marked as stale because it has been inactive a while. Please update this pull request or it will be automatically closed.

@github-actions github-actions bot added the stale label Oct 2, 2024
@squadgazzz squadgazzz removed the stale label Oct 2, 2024
Copy link

This pull request has been marked as stale because it has been inactive a while. Please update this pull request or it will be automatically closed.

@github-actions github-actions bot added the stale label Oct 10, 2024
@github-actions github-actions bot closed this Oct 17, 2024
MartinquaXD added a commit that referenced this pull request Oct 23, 2024
# Description
Our event indexing logic currently can't handle some cases well during
restarts. Let's say a contract we index doesn't emit an event for 1
month our current logic would fetch the last indexed event from that
contract and continue indexing from there. That means on restarts we
might index ~1 month (theoretically unbounded) of blocks before we can
start building new auctions (we recently started to require the indexing
logic to have seen the tip of the chain before we can start building
auctions).
Especially on fast chains like `arbitrum` this is a huge issue.

Dusts off @squadgazzz's previous
[PR](#2965)

# Changes
* introduced a new table `last_processed_blocks` which associates a
block number for an arbitrary string
* populated table with data for `settlements`, `ethflow_refunds`, and
`onchain_orders`

## How to test
- 1 database specific test
- actually not sure how to best test this e2e. We could spawn an
autopilot, index an event, mint a ton of empty blocks, and restart the
autopilot to trigger the code we are interested in. But it's not super
clear to me what would be a good assertion that can be tested with
reasonable effort. 🤔

---------

Co-authored-by: ilya <ilya@cow.fi>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants