Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorg could cause deals to fail #3474

Closed
arajasek opened this issue Sep 2, 2020 · 1 comment
Closed

Reorg could cause deals to fail #3474

arajasek opened this issue Sep 2, 2020 · 1 comment

Comments

@arajasek
Copy link
Contributor

arajasek commented Sep 2, 2020

Background

We were investigating why some space race deals, such as 55685 for this miner might have failed. Based on the provided info, it seemed pretty clear that the dashboard had the deal as failed, even though the deal was in PreCommit1 on the miner's side.

Presumably this happened because the bot's Lotus node updated its status to failed, with the error:

ClientGetDealInfo: {"jsonrpc":"2.0","result":{"ProposalCid":{"/":"bafyreibc64zanfzw2ltdcee7fvpjh2ay5hwvznti5ilpotwrb3pjx2r4mq"},"State":26,"Message":"error in deal activation: failed to set up called handler: called check error (h: 22758): client: failed to look up deal on chain: deal 55685 not found","Provider":"t010078","DataRef":null,"PieceCID":{"/":"baga6ea4seaqnjtan44uopfnh7jmy7fafgogejexp75llew55m5hu72mo2mydahy"},"Size":133169152,"PricePerEpoch":"62500000","Duration":701069,"DealID":55685,"CreationTime":"2020-09-01T19:39:35.692171291Z"},"id":0}
--

We also noticed that deals 55686, 87, 88, 89, 90, 92, 93, and 94 failed similarly. Deal 55691 did NOT fail this way, but it was made by client t0113, unlike all the failed ones which were made by t0112.

Hypothesis

The conjecture is that client t0112 had a reorg occur around height 22762. The failed deals were all published in block X, triggering ClientEventDealPublished, and bumping the client FSM state to StorageDealSealing, which "waits" in OnDealSectorCommitted. If block X was then reorged, these deals would all error in the first checkFunc over here.

Potential solution

We could avoid failing if we can't find the deal on-chain in checkFunc if we also can't find the publish message on chain. This requires also providing the publish message CID to OnDealSectorCommitted, which should be fine (we could just give it the entire deal object).

#3472 demonstrates this (incompletely).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants