Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Deal stuck in StorageDealAwaitingPreCommit on miner, even though sector that includes the deal is Proving #6123

Closed
nonsense opened this issue Apr 28, 2021 · 1 comment · Fixed by #6355
Labels
kind/bug Kind: Bug

Comments

@nonsense
Copy link
Member

nonsense commented Apr 28, 2021

Describe the bug
The lotus-soup testground end-to-end test has to ability to send concurrent deals from multiple clients to the same miner. When that happens, sometimes some of the deals are blocked in StorageDealAwaitingPreCommit state, even though the sectors that the miner included those deals in, passed through all stages, up to the final Proving state.

It appears that the problem is with the SectorCommittedManager.OnDealSectorPreCommitted function as well as the messageEvents type. It appears that the miner processes PreCommit messages for some deals (as there are multiple concurrent deals submitted to it), but not for all deals, and the messages do not match the expected parameters, for example:

For every deal we inspect messages on chain, but somehow we don't see all messages at:

		// Check through the deal IDs associated with this message
		for _, did := range params.DealIDs {
			if did == res.DealID {

At https://github.com/filecoin-project/lotus/blob/master/markets/storageadapter/ondealsectorcommitted.go , the matched callback is called for every PreCommit message, however the called callback is not - so if the DealID does not match one of the params.DealIDs, the deal gets stuck.


Version (run lotus version):
4688da5

To Reproduce
Steps to reproduce the behavior:

  1. Trigger testground test plan (part of CI), with removed randomized sleep for different clients: https://github.com/filecoin-project/lotus/blob/master/testplans/lotus-soup/deals_e2e.go#L80
  2. Note that some client deals are blocked in the StorageDealAwaitingPreCommit state, and never succeed, resulting in the test timing out.

Expected behavior
It is expected that all client deals succeed within 4-5min.

@nonsense
Copy link
Member Author

Related (or a separate bug) is the fact that when creating a devnet with 3 clients and a total of 3 deals (1 per client) to 1 miner, we sometimes end up with 4 sectors, namely:

ID  State      OnChain  Active  Expiration                   Deals  DealWeight
11  Proving    YES      NO      642417 (in 31 weeks 6 days)  1      7.974MiB
12  Proving    YES      NO      642417 (in 31 weeks 6 days)  1      7.974MiB
13  Proving    YES      NO      642417 (in 31 weeks 6 days)  1      7.974MiB
14  WaitDeals  NO       NO      n/a                          CC

It is not clear why the miner creates sector 14, when we fired only 3 deals to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Kind: Bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants