Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seal-Worker does not download SnapDeal-proofs on startup #8549

Closed
8 of 18 tasks
RobQuistNL opened this issue Apr 25, 2022 · 4 comments
Closed
8 of 18 tasks

Seal-Worker does not download SnapDeal-proofs on startup #8549

RobQuistNL opened this issue Apr 25, 2022 · 4 comments
Assignees
Labels

Comments

@RobQuistNL
Copy link
Contributor

RobQuistNL commented Apr 25, 2022

Checklist

  • This is not a security-related bug/issue. If it is, please follow please follow the security policy.
  • This is not a question or a support request. If you have any lotus related questions, please ask in the lotus forum.
  • This is not a new feature request. If it is, please file a feature request instead.
  • This is not an enhancement request. If it is, please file a improvement suggestion instead.
  • I have searched on the issue tracker and the lotus forum, and there is no existing related issue or discussion.
  • I am running the Latest release, or the most recent RC(release canadiate) for the upcoming release or the dev branch(master), or have an issue updating to any of these.
  • I did not make any code changes to lotus.

Lotus component

  • lotus daemon - chain sync
  • lotus miner - mining and block production
  • lotus miner/worker - sealing
  • lotus miner - proving(WindowPoSt)
  • lotus miner/market - storage deal
  • lotus miner/market - retrieval deal
  • lotus miner/market - data transfer
  • lotus client
  • lotus JSON-RPC API
  • lotus message management (mpool)
  • Other

Lotus Version

v1.15.1

Describe the Bug

PC1 worker does not automatically download proofs on start but somehow needs them for Snap deals? Thas what I'm getting from the logs here.

Logging Information

{"level":"debug","ts":"2022-04-25T03:54:05.817Z","logger":"advmgr","caller":"sector-storage/sched.go:480","msg":"SCHED try assign sqi:0 sector 160304 to window 376 (awi:0)"}
{"level":"debug","ts":"2022-04-25T03:54:05.817Z","logger":"advmgr","caller":"sector-storage/sched.go:480","msg":"SCHED try assign sqi:0 sector 160304 to window 377 (awi:1)"}
{"level":"debug","ts":"2022-04-25T03:54:05.817Z","logger":"advmgr","caller":"sector-storage/sched.go:526","msg":"SCHED ASSIGNED","sqi":0,"sector":"160304","task":"seal/v0/provereplicaupdate/1","window":376,"worker":"0e7f9098-0f5b-42f6-9043-1523e5dcbea4","utilization":0.018736502682663667}
{"level":"debug","ts":"2022-04-25T03:54:05.820Z","logger":"advmgr","caller":"sector-storage/sched_worker.go:375","msg":"assign worker sector 160304 to w4-main"}
{"level":"debug","ts":"2022-04-25T03:54:05.823Z","logger":"advmgr","caller":"sector-storage/sched_worker.go:280","msg":"task done","workerid":"0e7f9098-0f5b-42f6-9043-1523e5dcbea4"}
{"level":"info","ts":"2022-04-25T03:54:06.012Z","logger":"miner","caller":"miner/miner.go:479","msg":"completed mineOne","tookMilliseconds":8,"forRound":1751749,"baseEpoch":1751748,"baseDeltaSeconds":6,"nullRounds":0,"lateStart":false,"beaconEpoch":1847593,"lookbackEpochs":900,"networkPowerAtLookback":"18926775564710150144","minerPowerAtLookback":"16145932145557504","isEligible":true,"isWinner":false,"error":null}
{"level":"debug","ts":"2022-04-25T03:54:06.378Z","logger":"advmgr","caller":"sector-storage/sched.go:480","msg":"SCHED try assign sqi:0 sector 160304 to window 376 (awi:0)"}
{"level":"debug","ts":"2022-04-25T03:54:06.378Z","logger":"advmgr","caller":"sector-storage/sched.go:480","msg":"SCHED try assign sqi:0 sector 160304 to window 377 (awi:1)"}
{"level":"debug","ts":"2022-04-25T03:54:06.378Z","logger":"advmgr","caller":"sector-storage/sched.go:526","msg":"SCHED ASSIGNED","sqi":0,"sector":"160304","task":"seal/v0/provereplicaupdate/2","window":376,"worker":"0e7f9098-0f5b-42f6-9043-1523e5dcbea4","utilization":0.018736502682663667}
{"level":"debug","ts":"2022-04-25T03:54:06.378Z","logger":"advmgr","caller":"sector-storage/sched_worker.go:375","msg":"assign worker sector 160304 to w4-main"}
{"level":"debug","ts":"2022-04-25T03:54:06.379Z","logger":"advmgr","caller":"sector-storage/sched_worker.go:280","msg":"task done","workerid":"0e7f9098-0f5b-42f6-9043-1523e5dcbea4"}
{"level":"warn","ts":"2022-04-25T03:54:06.732Z","logger":"sectors","caller":"storage-sealing/fsm.go:763","msg":"sector 160304 got error event sealing.SectorProveReplicaUpdateFailed: prove replica update (2) failed: storage call error 0: %!w(No cached parameters found for empty-sector-update-merkletree-poseidon_hasher-8-8-0-3b7f44a9362e3985369454947bc94022e118211e49fd672d52bec1cbfd599d18 [failure finding /storage0/proofs/v28-empty-sector-update-merkletree-poseidon_hasher-8-8-0-3b7f44a9362e3985369454947bc94022e118211e49fd672d52bec1cbfd599d18.params]\n\nStack backtrace:\n   0: <unknown>\n   1: <unknown>\n   2: <unknown>\n   3: <unknown>\n   4: <unknown>\n   5: <unknown>\n   6: <unknown>\n   7: <unknown>\n   8: <unknown>\n   9: <unknown> [Hostname: w4])"}
{"level":"info","ts":"2022-04-25T03:54:06.733Z","logger":"sectors","caller":"storage-sealing/states_failed.go:28","msg":"ReplicaUpdateFailed(160304), waiting 59.266196062s before retrying"}

Repo Steps

  1. Run lotus-miner sectors pledge
  2. When its done, import a deal
  3. See it fail

The documentation is missing the flags about the new jobs;

   --replica-update              enable replica update (default: true)
   --prove-replica-update2       enable prove replica update 2 (default: true)
   --regen-sector-key            enable regen sector key (default: true)

https://lotus.filecoin.io/storage-providers/seal-workers/seal-workers/#run-the-worker

It also does not state that it needs the proof files.

The worker itself has these methods enabled by default, but it does not check for the existance of these files. This causes these sectors to break.

@RobQuistNL
Copy link
Contributor Author

I would also like to know if these jobs are similar to the PC1 stage (need SDR + 64GB RAM) or PC2 (Need GPU) so I know where to run them.

If its a combination of both, its interesting :+)

@rjan90 rjan90 self-assigned this Apr 25, 2022
@rjan90
Copy link
Contributor

rjan90 commented Apr 25, 2022

Hey @RobQuistNL!

Thanks for the report - I have triaged the issue, and will also try to repro it during the day. The issue seems to be that a PC1 / RU / PRU2 enabled worker does not download the needed params on startup.

I have added a PR for the documentation here: filecoin-project/lotus-docs#169.

To your question:

I would also like to know if these jobs are similar to the PC1 stage (need SDR + 64GB RAM) or PC2 (Need GPU) so I know where to run them.

The SnapDeals tasks are very similar to the C-jobs, and it requires approx 192GB RAM (minimum: 128GiB RAM + 64GiB swap), and approx 8.2GB VRAM. A GPU is not needed, but will speed the process up significantly.

@rjan90 rjan90 changed the title Sealing new deal fails on PC1 worker ( Seal-Worker does not download SnapDeal-proofs on startup Apr 25, 2022
@RobQuistNL
Copy link
Contributor Author

Okay, so we really want to disable it on the PC1 machines and add it to the PC2 machines. Thanks! Also, where are you? ;) @ Amsterdam EMEA

@rjan90
Copy link
Contributor

rjan90 commented May 3, 2022

Closing this ticket now as a fix for the docs and downloading of proofs on the worker if PRU2 is enabled has been merged.

@rjan90 rjan90 closed this as completed May 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants