Track event indexing progress #3075

MartinquaXD · 2024-10-21T11:09:19Z

Description

Our event indexing logic currently can't handle some cases well during restarts. Let's say a contract we index doesn't emit an event for 1 month our current logic would fetch the last indexed event from that contract and continue indexing from there. That means on restarts we might index ~1 month (theoretically unbounded) of blocks before we can start building new auctions (we recently started to require the indexing logic to have seen the tip of the chain before we can start building auctions).
Especially on fast chains like arbitrum this is a huge issue.

Dusts off @squadgazzz's previous PR

Changes

introduced a new table last_processed_blocks which associates a block number for an arbitrary string
populated table with data for settlements, ethflow_refunds, and onchain_orders

How to test

1 database specific test
actually not sure how to best test this e2e. We could spawn an autopilot, index an event, mint a ton of empty blocks, and restart the autopilot to trigger the code we are interested in. But it's not super clear to me what would be a good assertion that can be tested with reasonable effort. 🤔

github-actions · 2024-10-21T11:09:33Z

Reminder: Please update the DB Readme.

Caused by:

database/sql/V073__track_event_indexing_progress.sql

mstrug · 2024-10-22T14:37:50Z

crates/autopilot/src/boundary/events/mod.rs

+    database::last_processed_blocks::update(
+        &mut ex,
+        index_name,
+        i64::try_from(last_block).context("new value of counter is not i64")?,


Wouldn't it be better to store in database u64 type for the last block instead of converting it to i64?

Unfortunately postgres doesn't support u64 natively. The alternative would be to use a bignumber type or sth like that.

sunce86

Nothing blocking.

Although, personally, I wouldn't introduce the function persist_last_processed_block, but would rather adjust append_events and replace_events to save the last block atomically together with the events.

Mostly because of reorgs that could lower the block height and then autopilot restart could happen without updating the last_processed_blocks.

So, bottom line, at the point in time when you store the new events (via append or replace) those functions have the responsibility to also update last_processed_blocks to keep the db in consistent state and shouldn't move this responsibility to client.

sunce86 · 2024-10-22T14:17:42Z

crates/shared/src/event_handling.rs

            self.update_events_from_latest_blocks(&event_range.latest_blocks, event_range.is_reorg)
                .await?;
+            self.store_mut()
+                .persist_last_processed_block(last_block.0)


nit: I'd move this call inside update_last_handled_blocks as this function is where we update the local cache so aligning local cache and db makes sense.

database/sql/V073__track_event_indexing_progress.sql

crates/database/src/lib.rs

MartinquaXD · 2024-10-22T14:50:32Z

Ah, I see now that we don't even save anything when we don't have events, too bad. So in order to do it my way, non-significant change should be done in EventStoring logic. Then better leave it as is.

Yeah, that is why I continued on that path rather than adjusting append_events.

sunce86 · 2024-10-22T14:56:04Z

Ah, I see now that we don't even save anything when we don't have events, too bad. So in order to do it my way, non-significant change should be done in EventStoring logic. Then better leave it as is.

Yeah, that is why I continued on that path rather than adjusting append_events.

I double checked and it appears we after all have a list of checked blocks (not just blocks with events). https://github.com/cowprotocol/services/blob/main/crates/shared/src/event_handling.rs#L373-L377

Meaning, past_events_by_block_hashes is expected to return full list of blocks from the input. So, it should be safe to use range.last() as last processed block.

MartinquaXD · 2024-10-22T15:50:53Z

There is a code path here where we wouldn't know which "latest_block" to pass into append_events(). At that point we know the total range over which we index blocks but the events come in chunks so we can't just take the last block of the total range. We could read the block number of the last event of each chunk but that is optional so we get into type issues.

Not sure if it's okay to make the meta field of the event non-optional. I assume this is probably never None but that would require a change in ethcontract-rs which probably has far reaching consequences. :/

Edit: nevermind we wouldn't be able to store the underlying event anyway if we don't know at which block it happened so it makes sense to err if meta is None.

Edit2: there seem to be quite a few places where replace_events is technically implemented wrong. Often it would delete all the events starting from the first block of the range. Also replace_events() often calls append_events() and then all bets are off in terms of only storing the latest_block in append_events() either.
Given that the indexing logic appears quite complicated I think I would prefer introducing the new function and simply calling it after the entire range of latest blocks was processed to not get into these weird issues.

squadgazzz and others added 5 commits October 21, 2024 07:24

Event indexer counters

2719635

Add missing field

e7affa1

Change, rename, document a few things

3ab80a1

Remove sqlx from shared

56016cb

Remove dead code

59f44b9

MartinquaXD requested a review from a team as a code owner October 21, 2024 11:09

MartinquaXD added 3 commits October 22, 2024 06:28

Fix migration and db code

86991f8

Merge branch 'main' into track-event-indexing-progress

5d60e9b

Merge branch 'main' into track-event-indexing-progress

ae4515f

mstrug reviewed Oct 22, 2024

View reviewed changes

sunce86 approved these changes Oct 22, 2024

View reviewed changes

MartinquaXD added 2 commits October 23, 2024 12:34

Rename last_processed -> last_indexed

a086e0a

Merge branch 'main' into track-event-indexing-progress

6c9c47a

MartinquaXD merged commit fad84f1 into main Oct 23, 2024
11 checks passed

MartinquaXD deleted the track-event-indexing-progress branch October 23, 2024 12:45

github-actions bot locked and limited conversation to collaborators Oct 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track event indexing progress #3075

Track event indexing progress #3075

MartinquaXD commented Oct 21, 2024

github-actions bot commented Oct 21, 2024

mstrug Oct 22, 2024

MartinquaXD Oct 22, 2024

sunce86 left a comment •

edited

Loading

sunce86 Oct 22, 2024

MartinquaXD commented Oct 22, 2024

sunce86 commented Oct 22, 2024 •

edited

Loading

MartinquaXD commented Oct 22, 2024 •

edited

Loading

Track event indexing progress #3075

Track event indexing progress #3075

Conversation

MartinquaXD commented Oct 21, 2024

Description

Changes

How to test

github-actions bot commented Oct 21, 2024

mstrug Oct 22, 2024

Choose a reason for hiding this comment

MartinquaXD Oct 22, 2024

Choose a reason for hiding this comment

sunce86 left a comment • edited Loading

Choose a reason for hiding this comment

sunce86 Oct 22, 2024

Choose a reason for hiding this comment

MartinquaXD commented Oct 22, 2024

sunce86 commented Oct 22, 2024 • edited Loading

MartinquaXD commented Oct 22, 2024 • edited Loading

sunce86 left a comment •

edited

Loading

sunce86 commented Oct 22, 2024 •

edited

Loading

MartinquaXD commented Oct 22, 2024 •

edited

Loading