Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Execution] Fix stop control #5327

Merged
merged 1 commit into from
Feb 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 10 additions & 3 deletions engine/execution/ingestion/engine.go
Original file line number Diff line number Diff line change
Expand Up @@ -205,7 +205,8 @@ func (e *Engine) BlockProcessable(b *flow.Header, _ *flow.QuorumCertificate) {

// TODO: this should not be blocking: https://github.com/onflow/flow-go/issues/4400

// skip if stopControl tells to skip
// skip if stopControl tells to skip, so that we can avoid fetching collections
// for this block
if !e.stopControl.ShouldExecuteBlock(b) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, this is the only place we check.

return
}
Expand Down Expand Up @@ -363,6 +364,12 @@ func (e *Engine) executeBlock(
ctx context.Context,
executableBlock *entity.ExecutableBlock,
) {

// don't execute the block if the stop control says no
if !e.stopControl.ShouldExecuteBlock(executableBlock.Block.Header) {
Copy link
Member Author

@zhangchiqing zhangchiqing Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stop control should check again here, because otherwise there is an edge case that might execute blocks above the stop height. See the comment on L457

The edge case is that, when a block becomes processable, the stop height might not be set, so it's added to the queue, however, when it becomes executable and is able to execute, the stop height is set, and should not be executed. Without this check, it will be executed, but should not.

return
}

lg := e.log.With().
Hex("block_id", logging.Entity(executableBlock)).
Uint64("height", executableBlock.Block.Header.Height).
Expand Down Expand Up @@ -445,6 +452,8 @@ func (e *Engine) executeBlock(
Int64("timeSpentInMS", time.Since(startedAt).Milliseconds()).
Msg("block executed")

e.stopControl.OnBlockExecuted(executableBlock.Block.Header)

err = e.onBlockExecuted(executableBlock, finalEndState)
if err != nil {
lg.Err(err).Msg("failed in process block's children")
Expand All @@ -454,8 +463,6 @@ func (e *Engine) executeBlock(
e.executionDataPruner.NotifyFulfilledHeight(executableBlock.Height())
}

e.stopControl.OnBlockExecuted(executableBlock.Block.Header)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stopControl.OnBlockExecuted is to report a block has been executed and triggers stop control to check whether it should stop/crash or not.

However, this is currently called after e.onBlockExecuted is called, which checks if the child block (next block) can be executed and executes if completed. So imaging the following case:

Given two blocks: A <- B and we set stop height as B, meaning stop as soon as A is executed, and don't execute B. OK, now both block A and B's collections have been received, after A is executed, B becomes ready to be executed, if onBlockExecued(A) is called before stopControl.OnBlockExecuted, then it will execute block B in another go routine (which shouldn't). But the reason we never ran into this, is probably because before B is executed, stop.OnBlockExecuted might have been called and crashed the node. However, B technically still have a chance to be executed, I didn't notice this until looking into a flakey test TestStopAtHeightWhenExecutedBeforeFinalized which captured this edge case.

So my solution here is to 1) move this line up before the onBlockExecuted, so that we let stop control check stop condition first. And 2) add ShouldExecuteBlock into the executeBlock function, so that even if stop control decides not to crash, and the function onBlockExecuted found the next block has received its collection and can be executed, and we can still stop the next block from being executed.


e.unit.Ctx()

}
Expand Down
7 changes: 5 additions & 2 deletions engine/execution/state/unittest/fixtures.go
Original file line number Diff line number Diff line change
Expand Up @@ -46,16 +46,19 @@ func ComputationResultForBlockFixture(

numberOfChunks := len(collections) + 1
ceds := make([]*execution_data.ChunkExecutionData, numberOfChunks)
startState := *completeBlock.StartState
for i := 0; i < numberOfChunks; i++ {
ceds[i] = unittest.ChunkExecutionDataFixture(t, 1024)
endState := unittest.StateCommitmentFixture()
computationResult.CollectionExecutionResultAt(i).UpdateExecutionSnapshot(StateInteractionsFixture())
computationResult.AppendCollectionAttestationResult(
*completeBlock.StartState,
*completeBlock.StartState,
startState,
endState,
nil,
unittest.IdentifierFixture(),
ceds[i],
)
startState = endState
}
bed := unittest.BlockExecutionDataFixture(
unittest.WithBlockExecutionDataBlockID(completeBlock.Block.ID()),
Expand Down
Loading