Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: miner: dead loop on removing sector #8386

Merged
merged 1 commit into from
Mar 28, 2022
Merged

fix: miner: dead loop on removing sector #8386

merged 1 commit into from
Mar 28, 2022

Conversation

zl03jsj
Copy link
Contributor

@zl03jsj zl03jsj commented Mar 28, 2022

Proposed Changes

to remove a sector would cause the fsm(state machine) stuck in dead loop
use following cmd to remove a sector:

./lotus-miner sectors remove --really-do-it <sid> 

the sector state would change from 'any' to 'Removed', and there is a 'plan' in fsmPlanners for 'Removed' state:

Removing: planOneOrIgnore(
on(SectorRemoved{}, Removed),
on(SectorRemoveFailed{}, RemoveFailed),
),

Removed: final,

that is a function named final, defined as following:
func final(events []statemachine.Event, state *SectorInfo) (uint64, error) {
if len(events) > 0 {
if gm, ok := events[0].User.(globalMutator); ok {
gm.applyGlobal(state)
return 1, nil
}
}
return 0, xerrors.Errorf("didn't expect any events in state %s, got %+v", state.State, events)
}

when plan above final on SectorRemoved{} event, the final would return an error,
because SectorRemoved doesn't implement globalMutator.
and here, the Sealing.Plan never return this error out, and the processed is 0:
func (m *Sealing) Plan(events []statemachine.Event, user interface{}) (interface{}, uint64, error) {
next, processed, err := m.plan(events, user.(*SectorInfo))
if err != nil || next == nil {
l := Log{
Timestamp: uint64(time.Now().Unix()),
Message: fmt.Sprintf("state machine error: %s", err),
Kind: fmt.Sprintf("error;%T", err),
}
user.(*SectorInfo).logAppend(l)
return nil, processed, nil
}

finally, in statemachine.run,
https://github.com/filecoin-project/go-statemachine/blob/27f8fbb86dfde4cff15c0a936fe95bf4c90be168/machine.go#L79-L87
fsm.mutateUser would not get a error, and processed is 0,
So, pendingEvents wouldn't reduce, after is a go-routine, which called:

	fsm.stageDone <- struct{}{}

as above described, is a cycle of dead loop.

@zl03jsj zl03jsj marked this pull request as ready for review March 28, 2022 12:59
@zl03jsj zl03jsj requested a review from a team as a code owner March 28, 2022 12:59
@codecov
Copy link

codecov bot commented Mar 28, 2022

Codecov Report

Merging #8386 (1646edf) into master (b8b33c4) will decrease coverage by 0.02%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #8386      +/-   ##
==========================================
- Coverage   40.60%   40.57%   -0.03%     
==========================================
  Files         686      686              
  Lines       75260    75262       +2     
==========================================
- Hits        30556    30538      -18     
- Misses      39399    39420      +21     
+ Partials     5305     5304       -1     
Impacted Files Coverage Δ
extern/storage-sealing/fsm.go 60.33% <0.00%> (-0.41%) ⬇️
chain/actors/builtin/miner/diff.go 48.52% <0.00%> (-10.30%) ⬇️
cli/util.go 70.83% <0.00%> (-8.34%) ⬇️
extern/sector-storage/worker_tracked.go 79.46% <0.00%> (-7.15%) ⬇️
storage/wdpost_sched.go 77.45% <0.00%> (-5.89%) ⬇️
miner/miner.go 56.72% <0.00%> (-3.28%) ⬇️
chain/stmgr/call.go 65.94% <0.00%> (-3.25%) ⬇️
storage/wdpost_changehandler.go 98.58% <0.00%> (-0.95%) ⬇️
extern/sector-storage/sched.go 84.77% <0.00%> (-0.83%) ⬇️
chain/exchange/peer_tracker.go 66.66% <0.00%> (ø)
... and 10 more

Copy link
Contributor

@magik6k magik6k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense, thanks for the PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants