You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We were investigating why some space race deals, such as 55685 for this miner might have failed. Based on the provided info, it seemed pretty clear that the dashboard had the deal as failed, even though the deal was in PreCommit1 on the miner's side.
Presumably this happened because the bot's Lotus node updated its status to failed, with the error:
ClientGetDealInfo: {"jsonrpc":"2.0","result":{"ProposalCid":{"/":"bafyreibc64zanfzw2ltdcee7fvpjh2ay5hwvznti5ilpotwrb3pjx2r4mq"},"State":26,"Message":"error in deal activation: failed to set up called handler: called check error (h: 22758): client: failed to look up deal on chain: deal 55685 not found","Provider":"t010078","DataRef":null,"PieceCID":{"/":"baga6ea4seaqnjtan44uopfnh7jmy7fafgogejexp75llew55m5hu72mo2mydahy"},"Size":133169152,"PricePerEpoch":"62500000","Duration":701069,"DealID":55685,"CreationTime":"2020-09-01T19:39:35.692171291Z"},"id":0}
--
We also noticed that deals 55686, 87, 88, 89, 90, 92, 93, and 94 failed similarly. Deal 55691 did NOT fail this way, but it was made by client t0113, unlike all the failed ones which were made by t0112.
Hypothesis
The conjecture is that client t0112 had a reorg occur around height 22762. The failed deals were all published in block X, triggering ClientEventDealPublished, and bumping the client FSM state to StorageDealSealing, which "waits" in OnDealSectorCommitted. If block X was then reorged, these deals would all error in the first checkFunc over here.
Potential solution
We could avoid failing if we can't find the deal on-chain in checkFuncif we also can't find the publish message on chain. This requires also providing the publish message CID to OnDealSectorCommitted, which should be fine (we could just give it the entire deal object).
Background
We were investigating why some space race deals, such as
55685
for this miner might have failed. Based on the provided info, it seemed pretty clear that the dashboard had the deal as failed, even though the deal was inPreCommit1
on the miner's side.Presumably this happened because the bot's Lotus node updated its status to failed, with the error:
We also noticed that deals 55686, 87, 88, 89, 90, 92, 93, and 94 failed similarly. Deal 55691 did NOT fail this way, but it was made by client t0113, unlike all the failed ones which were made by t0112.
Hypothesis
The conjecture is that client t0112 had a reorg occur around height 22762. The failed deals were all published in block X, triggering
ClientEventDealPublished
, and bumping the client FSM state toStorageDealSealing
, which "waits" inOnDealSectorCommitted
. If block X was then reorged, these deals would all error in the firstcheckFunc
over here.Potential solution
We could avoid failing if we can't find the deal on-chain in
checkFunc
if we also can't find the publish message on chain. This requires also providing the publish message CID toOnDealSectorCommitted
, which should be fine (we could just give it the entire deal object).#3472 demonstrates this (incompletely).
The text was updated successfully, but these errors were encountered: