Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Execution] Ingestion Block Queue #5248

Merged
merged 23 commits into from
Apr 1, 2024
Merged

Conversation

zhangchiqing
Copy link
Member

@zhangchiqing zhangchiqing commented Jan 18, 2024

Working towards #5297

This PR simplifies the ingestion engine's mempool module with a block queue module.

@codecov-commenter
Copy link

codecov-commenter commented Jan 18, 2024

Codecov Report

Attention: Patch coverage is 63.32046% with 95 lines in your changes are missing coverage. Please review.

Project coverage is 55.64%. Comparing base (1726b7b) to head (63671f8).
Report is 561 commits behind head on master.

Files Patch % Lines
engine/execution/ingestion/block_queue/queue.go 71.61% 60 Missing and 5 partials ⚠️
utils/unittest/fixtures.go 0.00% 30 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5248      +/-   ##
==========================================
- Coverage   55.97%   55.64%   -0.33%     
==========================================
  Files        1022     1038      +16     
  Lines       99705   101598    +1893     
==========================================
+ Hits        55807    56539     +732     
- Misses      39598    40715    +1117     
- Partials     4300     4344      +44     
Flag Coverage Δ
unittests 55.64% <63.32%> (-0.33%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

"github.com/onflow/flow-go/module/mempool/entity"
)

// BlockQueue keeps track of state of blocks and determines which blocks are executable
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if there is a better name than this. Suggestions are welcome.

@zhangchiqing zhangchiqing force-pushed the leo/ingestion-block-queue branch from 0b6a5ba to 0ed04d1 Compare February 1, 2024 16:14
@zhangchiqing zhangchiqing marked this pull request as ready for review February 1, 2024 16:15
@zhangchiqing zhangchiqing requested review from janezpodhostnik, sideninja and peterargue and removed request for ramtinms February 1, 2024 16:15
Copy link
Contributor

@janezpodhostnik janezpodhostnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I'm concerned how this will be used singe a lot of different methods (that are presumably called from different places) are returning a []*entity.ExecutableBlock.
    This might be messy if that information needs to then be forwarded to another component.

Maybe having a channel where *entity.ExecutableBlock would be pushed instead of always returning them would be better.

  1. All of the On* methods are blocking. Is this ok?

  2. A lot of locking is going on here. Will this be a problem? Would a syncmap be better in this case?

Comment on lines 94 to 134
// we have already received this block, and its parent still has not been executed yet
if executable.StartState == nil && parentFinalState == nil {
return nil, nil, nil
}

// this is an edge case where parentFinalState is provided, and its parent block exists
// in the queue but has not been marked as executed yet (OnBlockExecuted(parent) is not called),
// in this case, we will internally call OnBlockExecuted(parentBlockID, parentFinalState).
// there is no need to create the executable block again, since it's already created.
if executable.StartState == nil && parentFinalState != nil {
executables, err := q.onBlockExecuted(block.Header.ParentID, *parentFinalState)
if err != nil {
return nil, nil, fmt.Errorf("receiving block %v with parent commitment %v, but parent block %v already exists with no commitment, fail to call mark parent as executed: %w",
blockID, *parentFinalState, block.Header.ParentID, err)
}

// we already have this block, its collection must have been fetched, so we only return the
// executables from marking its parent as executed.
return nil, executables, nil
}

// this is an edge case could be ignored
if executable.StartState != nil && parentFinalState == nil {
q.log.Warn().
Str("blockID", blockID.String()).
Hex("parentID", block.Header.ParentID[:]).
Msg("edge case: receiving block with no parent commitment, but its parent block actually has been executed")
return nil, nil, nil
}

// this is an exception that should not happen
if *executable.StartState != *parentFinalState {
return nil, nil,
fmt.Errorf("block %s has already been executed with a different parent final state, %v != %v",
blockID, *executable.StartState, parentFinalState)
}

return nil, nil, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be packaged into a new function to reduce the complexity of this one

Comment on lines 355 to 359
func (q *BlockQueue) GetMissingCollections(blockID flow.Identifier) (
[]*MissingCollection, *flow.StateCommitment, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatting here is a bit weird.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean? Can you be more specific?

Comment on lines +280 to +284
return nil, fmt.Errorf("parent block %s of block %s is in the queue",
block.Block.Header.ParentID, blockID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sanity checks like this make me think that this should use the irrecoverable.SignalerContext to throw this error and shutdown the component entirely.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could leave the decision to the caller. The caller can decide how to handle this exception, either throw as a irrecoverable error or handle it differently.

return nil, executables, nil
}

// this is an edge case could be ignored
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you elaborate in this comment about how this case would happen and why it's safe to ignore

// update collection
colInfo.Collection.Transactions = collection.Transactions

// check if any block, which includes this collection, become executable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// check if any block, which includes this collection, become executable
// check if any block, which includes this collection, became executable

// 2. if a block's parent is not executed, then the parent block must be passed in first
// 3. if a block's parent is executed, then the parent's finalState must be passed in
// It returns (nil, nil, nil) if this block is a duplication
func (q *BlockQueue) OnBlock(block *flow.Block, parentFinalState *flow.StateCommitment) (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The On* naming is a little confusing since we typically use that for non-blocking signal functions. Here they are blocking and return some data. How about renaming them to something like HandleBlock?

@zhangchiqing zhangchiqing force-pushed the leo/ingestion-block-queue branch from 2e1f681 to 2ddeaf7 Compare February 29, 2024 23:34
@zhangchiqing
Copy link
Member Author

  1. I'm concerned how this will be used singe a lot of different methods (that are presumably called from different places) are returning a []*entity.ExecutableBlock.

Yes, they are called in multiple places, but all in the same file (the ingestion core module), and the logic to handle them are reused, such as here the executeConcurrently method is used to handle all executable blocks.

Regarding the ingestion core, it is a stateless module, the point of the separating the ingestion core and block queue is so that so a stateful machine, the block queue handles all state transition, and uses lock to protect its consistency, and ingestion core wraps the block queue, and connect all external dependencies and run side effects, such as reading and writing to database, listening to stop control.

And similarly, there is this fetch method to fetch all missing collections

All of the On* methods are blocking. Is this ok?

blocking should be OK. Because the BlockQueue's data are all hold in memory, and its methods are only doing updating internal state in memory, such be very fast.

A lot of locking is going on here. Will this be a problem? Would a syncmap be better in this case?

The lock is useful to guarantee

  1. state transition is safe and correct
  2. after the call is returned the data is processed. This is useful in the case I need to guarantee an executed block has been saved in the queue before saving the results in database.

@zhangchiqing zhangchiqing added this pull request to the merge queue Apr 1, 2024
Merged via the queue into master with commit 5429925 Apr 1, 2024
55 checks passed
@zhangchiqing zhangchiqing deleted the leo/ingestion-block-queue branch April 1, 2024 22:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants