Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.11.3-rc1: Deal problem caused mine crashed and not able to get restarted #7345

Closed
11 of 18 tasks
WillTechMusing opened this issue Sep 17, 2021 · 5 comments · Fixed by #7346
Closed
11 of 18 tasks

1.11.3-rc1: Deal problem caused mine crashed and not able to get restarted #7345

WillTechMusing opened this issue Sep 17, 2021 · 5 comments · Fixed by #7346
Labels
kind/bug Kind: Bug

Comments

@WillTechMusing
Copy link

WillTechMusing commented Sep 17, 2021

Checklist

  • This is not a security-related bug/issue. If it is, please follow please follow the security policy.
  • This is not a question or a support request. If you have any lotus related questions, please ask in the lotus forum.
  • This is not a new feature request. If it is, please file a feature request instead.
  • This is not an enhancement request. If it is, please file a improvement suggestion instead.
  • I have searched on the issue tracker and the lotus forum, and there is no existing related issue or discussion.
  • I am running the Latest release, or the most recent RC(release canadiate) for the upcoming release or the dev branch(master), or have an issue updating to any of these.
  • I did not make any code changes to lotus.

Lotus component

  • lotus daemon - chain sync
  • lotus miner - mining and block production
  • lotus miner/worker - sealing
  • lotus miner - proving(WindowPoSt)
  • lotus miner/market - storage deal
  • lotus miner/market - retrieval deal
  • lotus miner/market - data transfer
  • lotus client
  • lotus JSON-RPC API
  • lotus message management (mpool)
  • Other

Lotus Version

Daemon:  1.11.3-rc1+mainnet+git.5db02ca05+api1.3.0
Local: lotus version 1.11.3-rc1+mainnet+git.5db02ca05

Describe the Bug

miner crashed after running hours. when tried to restarted, it crashed after printing the following errors. At this point, the miner is not able to start so missing all window post deadlines already.

from the miner log:
2021-09-17T04:12:41.988-0700 WARN sectors storage-sealing/states_failed.go:345 getting deal 2427710 for piece 0: deal 2427710 not found - deal may not have completed sealing before deal proposal start epoch, or deal may have been slashed

this is from journalctl:

Sep 17 04:12:41 g0lotus01 lotus-miner[46369]: panic: runtime error: invalid memory address or nil pointer dereference
Sep 17 04:12:41 g0lotus01 lotus-miner[46369]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x139937d]
Sep 17 04:12:41 g0lotus01 lotus-miner[46369]: goroutine 19603 [running]:
Sep 17 04:12:41 g0lotus01 lotus-miner[46369]: github.com/filecoin-project/lotus/extern/storage-sealing.(*Sealing).HandleRecoverDealIDs(0xc002582600, 0x3750ba0, 0xc02e806990, 0xc016626ba0, 0xe, 0x2304, 0x8, 0x613b0ae0, 0xc0231ed620, 0x1, ...)
Sep 17 04:12:41 g0lotus01 lotus-miner[46369]:         /home/filecoin/filecoin/lotus/extern/storage-sealing/states_failed.go:397 +0x65d
Sep 17 04:12:41 g0lotus01 lotus-miner[46369]: github.com/filecoin-project/lotus/extern/storage-sealing.(*Sealing).plan.func2.1(0x376ba90, 0xc0000528f8, 0xc0120e8030, 0xc016626ba0, 0xe, 0x2304, 0x8, 0x613b0ae0, 0xc0231ed620, 0x1, ...)
Sep 17 04:12:41 g0lotus01 lotus-miner[46369]:         /home/filecoin/filecoin/lotus/extern/storage-sealing/fsm.go:357 +0xd5
Sep 17 04:12:41 g0lotus01 lotus-miner[46369]: github.com/filecoin-project/lotus/extern/storage-sealing.(*Sealing).Plan.func1(0x376ba90, 0xc0000528f8, 0xc0120e8030, 0xc016626ba0, 0xe, 0x2304, 0x8, 0x613b0ae0, 0xc0231ed620, 0x1, ...)
Sep 17 04:12:41 g0lotus01 lotus-miner[46369]:         /home/filecoin/filecoin/lotus/extern/storage-sealing/fsm.go:26 +0x95
Sep 17 04:12:41 g0lotus01 lotus-miner[46369]: reflect.Value.call(0x2e657c0, 0xc0120e8020, 0x13, 0x324936e, 0x4, 0xc048d4af30, 0x2, 0x2, 0x52d57f, 0x2f9dc00, ...)
Sep 17 04:12:41 g0lotus01 lotus-miner[46369]:         /usr/local/go/src/reflect/value.go:476 +0x8e7
Sep 17 04:12:41 g0lotus01 lotus-miner[46369]: reflect.Value.Call(0x2e657c0, 0xc0120e8020, 0x13, 0xc032d5df30, 0x2, 0x2, 0x0, 0x78614d2e73656546, 0x696d6d6f43657250)
Sep 17 04:12:41 g0lotus01 lotus-miner[46369]:         /usr/local/go/src/reflect/value.go:337 +0xb9
Sep 17 04:12:41 g0lotus01 lotus-miner[46369]: github.com/filecoin-project/go-statemachine.(*StateMachine).run.func3(0xc02337e558, 0xc0231f18d0, 0x376ba90, 0xc0000528f8, 0xc0120e8030, 0xc0231f18e0, 0xc0158b2ba0)
Sep 17 04:12:41 g0lotus01 lotus-miner[46369]:         /root/go/pkg/mod/github.com/filecoin-project/go-statemachine@v1.0.1/machine.go:108 +0x3cc
Sep 17 04:12:41 g0lotus01 lotus-miner[46369]: created by github.com/filecoin-project/go-statemachine.(*StateMachine).run
Sep 17 04:12:41 g0lotus01 lotus-miner[46369]:         /root/go/pkg/mod/github.com/filecoin-project/go-statemachine@v1.0.1/machine.go:103 +0x451

Looks like it's related to a deal 2427710 not sealing properly.

$ grep -ai 2427710 /media/lotus/logs/miner.log

2021-09-10T00:36:00.834-0700    INFO    sectors storage-sealing/input.go:241    Adding piece for deal 2427710 (publish msg: bafy2bzaceahwrod4ebomefmr33uq4eucb46qbpahirq25wmm2lluup4fgne2k)
2021-09-10T00:44:02.497-0700    INFO    sectors storage-sealing/input.go:218    deal added to a sector  {"deal": 2427710, "sector": "8964", "piece": "baga6ea4seaqbi52q7pmeffxz7czj3r7m2ywdu3ovac522favrfnu6xqk3vrxifa"}
2021-09-17T00:12:13.022-0700    WARN    sectors storage-sealing/states_sealing.go:207   invalid deals in sector 8964: getting deal 2427710 for piece 0: deal 2427710 not found - deal may not have completed sealing before deal proposal start epoch, or deal may have been slashed
2021-09-17T00:12:13.036-0700    WARN    sectors storage-sealing/states_failed.go:345    getting deal 2427710 for piece 0: deal 2427710 not found - deal may not have completed sealing before deal proposal start epoch, or deal may have been slashed

I couldn't see any file like s-t08399-8964 in the storage folder. So I tried to create a fake file in unsealed and sealed folder but it didn't help. The miner still crashed, not able to start.

Logging Information

the log has been provided above. Here is a little more info showing the miner starting:


2021-09-17T04:57:20.572-0700    WARN    stores  stores/index.go:215     sector {8399 991} redeclared in 3264ded3-7cac-4ac7-abbb-55a6967bd5ee
2021-09-17T04:57:20.572-0700    WARN    stores  stores/index.go:215     sector {8399 992} redeclared in 3264ded3-7cac-4ac7-abbb-55a6967bd5ee
2021-09-17T04:57:20.572-0700    WARN    stores  stores/index.go:215     sector {8399 993} redeclared in 3264ded3-7cac-4ac7-abbb-55a6967bd5ee
2021-09-17T04:57:20.572-0700    WARN    stores  stores/index.go:215     sector {8399 994} redeclared in 3264ded3-7cac-4ac7-abbb-55a6967bd5ee
2021-09-17T04:57:20.572-0700    WARN    stores  stores/index.go:215     sector {8399 995} redeclared in 3264ded3-7cac-4ac7-abbb-55a6967bd5ee
2021-09-17T04:57:20.572-0700    WARN    stores  stores/index.go:215     sector {8399 996} redeclared in 3264ded3-7cac-4ac7-abbb-55a6967bd5ee
2021-09-17T04:57:20.572-0700    WARN    stores  stores/index.go:215     sector {8399 997} redeclared in 3264ded3-7cac-4ac7-abbb-55a6967bd5ee
2021-09-17T04:57:20.572-0700    WARN    stores  stores/index.go:215     sector {8399 998} redeclared in 3264ded3-7cac-4ac7-abbb-55a6967bd5ee
2021-09-17T04:57:20.572-0700    WARN    stores  stores/index.go:215     sector {8399 999} redeclared in 3264ded3-7cac-4ac7-abbb-55a6967bd5ee
2021-09-17T04:57:20.573-0700    INFO    stores  stores/index.go:142     New sector storage: 784df5bf-da81-4de8-8e5c-8a9253539888
2021-09-17T04:57:20.622-0700    INFO    stores  stores/index.go:142     New sector storage: 156d2ade-7b40-4894-a55c-2cb92429fda1
2021-09-17T04:57:20.820-0700    INFO    storageminer    storage/miner.go:241    starting up miner f08399, worker addr f3vu3ps7aiydvpe3advin5t4octczwlyq5wdefhu52rj34bx3ttnuwbtmjqpx2nliaei6oy5dlir3aj74bcyvq
2021-09-17T04:57:20.827-0700    INFO    main    lotus-miner/run.go:180  Remote version 1.11.3-rc1+mainnet+git.5db02ca05+api2.1.0
2021-09-17T04:57:20.867-0700    WARN    miner   miner/miner.go:476      completed mineOne       {"tookMilliseconds": 15, "forRound": 1119115, "baseEpoch": 1119114, "baseDeltaSeconds": 20, "nullRounds": 0, "lateStart": true, "beaconEpoch": 0, "lookbackEpochs": 900, "networkPowerAtLookback": "12631628988212510720", "minerPowerAtLookback": "492114270060544", "isEligible": false, "isWinner": false, "error": null}
2021-09-17T04:57:20.914-0700    INFO    miner   miner/warmup.go:49      skipping winning PoSt warmup, no sectors
2021-09-17T04:57:23.694-0700    DEBUG   advmgr  sector-storage/sched.go:354     SCHED 0 queued; 2 open windows
2021-09-17T04:57:34.351-0700    WARN    sectors storage-sealing/states_failed.go:345    getting deal 2427710 for piece 0: deal 2427710 not found - deal may not have completed sealing before deal proposal start epoch, or deal may have been slashed

Repo Steps

  1. Run '...'
  2. Do '...'
  3. See error '...'
    ...
@WillTechMusing
Copy link
Author

WillTechMusing commented Sep 17, 2021

miner log:
miner1.log

@WillTechMusing
Copy link
Author

WillTechMusing commented Sep 17, 2021

journal log:
miner2.log

@WillTechMusing
Copy link
Author

please note: I ran miner using systemd service so the logs are in two difference place.

@WillTechMusing WillTechMusing changed the title Deal problem caused mine crashed and not able to restarted 1.11.3-rc1: Deal problem caused mine crashed and not able to restarted Sep 17, 2021
@WillTechMusing WillTechMusing changed the title 1.11.3-rc1: Deal problem caused mine crashed and not able to restarted 1.11.3-rc1: Deal problem caused mine crashed and not able to get restarted Sep 17, 2021
@WillTechMusing
Copy link
Author

WillTechMusing commented Sep 17, 2021

I researched some more about this sector 8964. Here is what I have found:

$grep -ai 8964 /media/lotus/logs/miner.log

2021-09-16T22:59:30.250-0700    INFO    sectors storage-sealing/states_failed.go:26     ComputeProofFailed(8964), waiting 59.749299243s before retrying
2021-09-16T23:00:30.021-0700    INFO    sectors storage-sealing/states_sealing.go:570   KOMIT 8964 4200d81238e82304389a8133d886d15d540b70319579af00e722162852f544a4(1097548); 4784f72533337993f83429ad9eb9c1cdac5fbfe425f9e297e3f2cb959b43f434(1099068); [{34359738368 baga6ea4seaqbi52q7pmeffxz7czj3r7m2ywdu3ovac522favrfnu6xqk3vrxifa}]; r:bagboea4b5abca3znx3wmybynltjdzyat6k5dwvdgcvn5rntwpwyn7fhi4hnxhmib; d:baga6ea4seaqbi52q7pmeffxz7czj3r7m2ywdu3ovac522favrfnu6xqk3vrxifa
2021-09-16T23:00:30.060-0700    DEBUG   advmgr  sector-storage/sched.go:465     SCHED try assign sqi:0 sector 8964 to window 2
2021-09-16T23:00:30.060-0700    DEBUG   advmgr  sector-storage/sched.go:472     SCHED ASSIGNED sqi:0 sector 8964 task seal/v0/commit/1 to window 2
2021-09-16T23:00:30.060-0700    DEBUG   advmgr  sector-storage/sched_worker.go:368      assign worker sector 8964
2021-09-16T23:00:31.648-0700    DEBUG   advmgr  sector-storage/sched.go:465     SCHED try assign sqi:0 sector 8964 to window 2
2021-09-16T23:00:31.649-0700    DEBUG   advmgr  sector-storage/sched.go:472     SCHED ASSIGNED sqi:0 sector 8964 task seal/v0/commit/2 to window 2
2021-09-16T23:00:31.649-0700    DEBUG   advmgr  sector-storage/sched_worker.go:368      assign worker sector 8964
2021-09-16T23:49:06.020-0700    INFO    miner   miner/miner.go:478      completed mineOne       {"tookMilliseconds": 11, "forRound": 1118499, "baseEpoch": 1118498, "baseDeltaSeconds": 6, "nullRounds": 0, "lateStart": false, "beaconEpoch": 1214343, "lookbackEpochs": 900, "networkPowerAtLookback": "12615440896412647424", "minerPowerAtLookback": "492114270060544", "isEligible": true, "isWinner": false, "error": null}
2021-09-17T00:10:13.691-0700    INFO    sectors storage-sealing/states_failed.go:26     ComputeProofFailed(8964), waiting 59.308209051s before retrying
2021-09-17T00:11:13.001-0700    WARN    sectors storage-sealing/fsm.go:616      sector 8964 got error event sealing.SectorSealPreCommit1Failed: consecutive compute fails
2021-09-17T00:11:13.018-0700    INFO    sectors storage-sealing/states_failed.go:26     SealPreCommit1Failed(8964), waiting 59.981501444s before retrying
2021-09-17T00:12:13.022-0700    WARN    sectors storage-sealing/states_sealing.go:207   invalid deals in sector 8964: getting deal 2427710 for piece 0: deal 2427710 not found - deal may not have completed sealing before deal proposal start epoch, or deal may have been slashed

The problem is, how to delete this deal or sector since the miner won't start?

@William8Work
Copy link

with the PR #7346 and fixes, I was able to start the miner successfully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Kind: Bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants