Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: a new ChainIndexer to index tipsets, messages and events #12421

Merged
merged 128 commits into from
Oct 31, 2024

Conversation

aarshkshah1992
Copy link
Contributor

@aarshkshah1992 aarshkshah1992 commented Aug 29, 2024

For #12453.
Subsumes #12388 to implement the ChainIndexer.

TODO

  • Profile Sqlite Index usage and read queries on the Index
  • Clean up Config
  • Smoke testing on a calibnet node (test GC, snapshot hydration, automated backfilling, etc)
  • Review from @rvagg
  • Write unit & itests for the ChainIndexer (tests are at feat: tests for the chain indexer #12521)

Migration

Index "migration" will effectively involve re-indexing the chain state here. This is because the data in the old Indices isn't reliable (we're already aware of multiple bugs in the old Indices which motivated this PR in the first place) and the old Indices have never been gc'd/kept in sync with the chain state on nodes with splitstore enabled.

For Snapshot synced nodes with Splitstore enabled

  • Most splitstore enabled nodes only keep 2-3 days of chain state around. The best option here is to simply backfill/re-index the chain state on node startup by setting ReConcileEmptyIndex to true and setting the MaxReconcileTipsets as needed.

For archival nodes

  • If archival nodes want to backfill the index as part of daemon startup, they can always bootup the node with ReConcileEmptyIndex set to true and MaxReconcileTipsets set to infinity. This will drastically slow down the node startup but will ensure that the node starts up with it's entire history indexed.

  • The other option for users is to use the Backfill RPC API with a range of height they want backfilled. However, we need to handle race conditions here where backfilling races with new tipsets coming in and need to ensure that backfilling does not end up overwriting updates caused by new tipsets being indexed.

Sqllite Query Plans

For looking up MsgInfo

~/.lotus-calibnet/sqlite% sqlite3 chainindex.db "EXPLAIN QUERY PLAN SELECT tipset_key_cid, height FROM tipset_message WHERE message_cid='a' AN
D reverted=0"
QUERY PLAN
`--SEARCH tipset_message USING INDEX idx_message_cid (message_cid=?)
  • There should be a handful of rows with reverted != 0 and only one with reverted=0 for a given message_cid so no need to index by reverted as well here.

For looking up an eth tx hash

~/.lotus-calibnet/sqlite% sqlite3 chainindex.db "EXPLAIN QUERY PLAN SELECT message_cid FROM eth_tx_hash WHERE tx_hash = ?"
QUERY PLAN
`--SEARCH eth_tx_hash USING INDEX sqlite_autoindex_eth_tx_hash_1 (tx_hash=?)

For checking if a tipset exists

~/.lotus-calibnet/sqlite% sqlite3 chainindex.db "EXPLAIN QUERY PLAN SELECT EXISTS(SELECT 1 FROM tipset_message WHERE tipset_key_cid = ?)"
QUERY PLAN
|--SCAN CONSTANT ROW
`--SCALAR SUBQUERY 1
   `--SEARCH tipset_message USING COVERING INDEX idx_tipset_key_cid (tipset_key_cid=?)

For getting the minimum no-reverted height for reconciliation

~/.lotus-calibnet/sqlite% sqlite3 chainindex.db "EXPLAIN QUERY PLAN SELECT MIN(height) FROM tipset_message WHERE reverted = 0"
QUERY PLAN
`--SEARCH tipset_message USING INDEX idx_height
  • Given that for a given height, there should be very few entries with reverted != 0 and only one entry with reverted=0, this should be allright.

Looking up a message_id to insert an event

~/.lotus-calibnet/sqlite% sqlite3 chainindex.db "EXPLAIN QUERY PLAN SELECT message_id FROM tipset_message WHERE message_cid = ? AND tipset_key
_cid = ?"
QUERY PLAN
`--SEARCH tipset_message USING COVERING INDEX sqlite_autoindex_tipset_message_1 (tipset_key_cid=? AND message_cid=?)

Looking up events at a specific height

~/.lotus-calibnet/sqlite% sqlite3 chainindex.db "EXPLAIN QUERY PLAN SELECT
   e.event_id,
   tm.height,
   hex(tm.tipset_key_cid),
   hex(e.emitter_addr),
   e.event_index,
   hex(tm.message_cid),
   tm.message_index,
   e.reverted,
   ee.flags,
   ee.key,
   ee.codec,
   ee.value
FROM event e
JOIN tipset_message tm ON e.message_id = tm.message_id
JOIN event_entry ee ON e.event_id = ee.event_id
WHERE tm.height=1954494 AND e.reverted = false
ORDER BY tm.height DESC, ee.rowid ASC"
QUERY PLAN
|--SEARCH tm USING INDEX idx_height (height=?)
|--SEARCH e USING INDEX idx_event_message_id (message_id=?)
|--SEARCH ee USING INDEX event_entry_event_id (event_id=?)
`--USE TEMP B-TREE FOR RIGHT PART OF ORDER BY

Looking up events in a height range

~/.lotus-calibnet/sqlite% sqlite3 chainindex.db "EXPLAIN QUERY PLAN SELECT
   e.event_id,
   tm.height,
   tm.tipset_key_cid,
   e.emitter_addr,
   e.event_index,
   tm.message_cid,
   tm.message_index,
   e.reverted,
   ee.flags,
   ee.key,
   ee.codec,
   ee.value
FROM event e
JOIN tipset_message tm ON e.message_id = tm.message_id
JOIN event_entry ee ON e.event_id = ee.event_id
WHERE tm.height BETWEEN 1954481 AND 1954494
   AND e.reverted = false
ORDER BY tm.height DESC, ee.rowid ASC"
QUERY PLAN
|--SEARCH tm USING INDEX idx_height (height>? AND height<?)
|--SEARCH e USING INDEX idx_event_message_id (message_id=?)
|--SEARCH ee USING INDEX event_entry_event_id (event_id=?)
`--USE TEMP B-TREE FOR RIGHT PART OF ORDER BY

Looking up events for a specific tipset

~/.lotus-calibnet/sqlite% sqlite3 chainindex.db "EXPLAIN QUERY PLAN SELECT
   e.event_id,
   tm.height,
   tm.tipset_key_cid,
   e.emitter_addr,
   e.event_index,
   tm.message_cid,
   tm.message_index,
   e.reverted,
   ee.flags,
   ee.key,
   ee.codec,
   ee.value
FROM event e
JOIN tipset_message tm ON e.message_id = tm.message_id
JOIN event_entry ee ON e.event_id = ee.event_id
WHERE tm.tipset_key_cid = 'a' AND e.reverted = false
ORDER BY tm.height DESC, ee.rowid ASC"
QUERY PLAN
|--SEARCH tm USING INDEX idx_tipset_key_cid (tipset_key_cid=?)
|--SEARCH e USING INDEX idx_event_message_id (message_id=?)
|--SEARCH ee USING INDEX event_entry_event_id (event_id=?)
`--USE TEMP B-TREE FOR ORDER BY

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aarshkshah1992 aarshkshah1992 marked this pull request as draft August 29, 2024 14:38
@aarshkshah1992 aarshkshah1992 changed the title Chain index complete for msgs and txns [WIP ]Chain index complete for msgs and txns Aug 29, 2024
@aarshkshah1992 aarshkshah1992 changed the title [WIP ]Chain index complete for msgs and txns Chain index complete for msgs and txns Aug 30, 2024
chain/index/msgindex.go Outdated Show resolved Hide resolved
chainindex/ddls.go Outdated Show resolved Hide resolved
itests/eth_deploy_test.go Outdated Show resolved Hide resolved
itests/eth_deploy_test.go Outdated Show resolved Hide resolved
chainindex/ddls.go Outdated Show resolved Hide resolved
chainindex/ddls.go Outdated Show resolved Hide resolved
chainindex/ddls.go Outdated Show resolved Hide resolved
chainindex/ddls.go Outdated Show resolved Hide resolved
chainindex/gc.go Outdated Show resolved Hide resolved
chainindex/gc.go Outdated Show resolved Hide resolved
chainindex/gc.go Outdated Show resolved Hide resolved
chainindex/gc.go Outdated Show resolved Hide resolved
chainindex/gc.go Outdated Show resolved Hide resolved
chainindex/helpers.go Outdated Show resolved Hide resolved
chainindex/indexer.go Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
node/config/types.go Outdated Show resolved Hide resolved
aarshkshah1992 and others added 4 commits October 21, 2024 13:52
* Followup to PR #12450 for doc updates

This is being used to resolve the unresolved items in #12450 since that PR is unwieldly at this point.

* Incorporated some items and added TODOs based on unresolved items from #12450

* Incorporating more feedback

* Pointing to issue to learn about benefits

* Formatting fixes

* Apply most of the suggestions from @rvagg code review

Co-authored-by: Rod Vagg <rod@vagg.org>

* Incorporating feedback from #12600 (comment)

* Addressing #12600 (comment) and more

* Moved chain-indexer docs to documentation
Renamed
Added ToC

We can move to lotus-docs later

* Update documentation/en/chain-indexer-overview-for-operators.md

Co-authored-by: Rod Vagg <rod@vagg.org>

* Update documentation/en/chain-indexer-overview-for-operators.md

Co-authored-by: Rod Vagg <rod@vagg.org>

* Added upgrade path when importing chain state from a snapshot.

* Typo fixes

* Update documentation/en/chain-indexer-overview-for-operators.md

Co-authored-by: Rod Vagg <rod@vagg.org>

* chore(doc): "regular checks" section for chainindexer docs (#12612)

* Apply suggestions from @rvagg code review

Co-authored-by: Rod Vagg <rod@vagg.org>

* Incorporating @aarshkshah1992 feedback

* Update documentation/en/chain-indexer-overview-for-operators.md

Co-authored-by: Rod Vagg <rod@vagg.org>

---------

Co-authored-by: Rod Vagg <rod@vagg.org>
Co-authored-by: Aarsh Shah <aarshkshah1992@gmail.com>
Copy link
Member

@rvagg rvagg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aarshkshah1992 failure in CI is legitimate - itest-gateway:

Error Trace:	/home/runner/work/lotus/lotus/itests/kit/ensemble.go:486
        	            				/home/runner/work/lotus/lotus/itests/gateway_test.go:318
        	            				/home/runner/work/lotus/lotus/itests/gateway_test.go:56
        	Error:      	Received unexpected error:
        	            	starting node:
        	            	    github.com/filecoin-project/lotus/node.New
        	            	        /home/runner/work/lotus/lotus/node/builder.go:[359](https://github.com/filecoin-project/lotus/actions/runs/11605901734/job/32317064511?pr=12421#step:9:360)
        	            	  - missing dependencies for function "reflect".makeFuncStub
        	            	    	/home/runner/work/_tool/go/1.22.8/x64/src/reflect/asm_amd64.s:28:
        	            	    missing type:
        	            	    	- full.ChainIndexerAPI (did you mean to use index.Indexer?)

@aarshkshah1992
Copy link
Contributor Author

@rvagg Checking.

@aarshkshah1992
Copy link
Contributor Author

@rvagg Fixed the gateway test in 55bbe7a.

rvagg
rvagg previously approved these changes Oct 31, 2024
@aarshkshah1992 aarshkshah1992 dismissed stale reviews from rvagg and github-actions[bot] October 31, 2024 08:49

bot is stuck

@rvagg rvagg mentioned this pull request Oct 31, 2024
@aarshkshah1992 aarshkshah1992 enabled auto-merge (squash) October 31, 2024 09:52
@aarshkshah1992 aarshkshah1992 merged commit dcc903c into master Oct 31, 2024
80 checks passed
@aarshkshah1992 aarshkshah1992 deleted the feat/msg-eth-tx-index branch October 31, 2024 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ☑️ Done (Archive)
Development

Successfully merging this pull request may close these issues.

9 participants