Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v.1.18.0 - Duplicate Storage Paths in storage.json lead to Miner Crash on lotus-miner run #9739

Open
8 of 18 tasks
TippyFlitsUK opened this issue Nov 28, 2022 · 2 comments
Open
8 of 18 tasks
Labels
Milestone

Comments

@TippyFlitsUK
Copy link
Contributor

TippyFlitsUK commented Nov 28, 2022

Checklist

  • This is not a security-related bug/issue. If it is, please follow please follow the security policy.
  • This is not a question or a support request. If you have any lotus related questions, please ask in the lotus forum.
  • This is not a new feature request. If it is, please file a feature request instead.
  • This is not an enhancement request. If it is, please file a improvement suggestion instead.
  • I have searched on the issue tracker and the lotus forum, and there is no existing related issue or discussion.
  • I am running the Latest release, or the most recent RC(release canadiate) for the upcoming release or the dev branch(master), or have an issue updating to any of these.
  • I did not make any code changes to lotus.

Lotus component

  • lotus daemon - chain sync
  • lotus miner - mining and block production
  • lotus miner/worker - sealing
  • lotus miner - proving(WindowPoSt)
  • lotus miner/market - storage deal
  • lotus miner/market - retrieval deal
  • lotus miner/market - data transfer
  • lotus client
  • lotus JSON-RPC API
  • lotus message management (mpool)
  • Other

Lotus Version

1.18.0+mainnet+git.bd10bdf99

Describe the Bug

We are seeing a few reports from SPs regarding miner crashes following upgrade from version 1.16.X or 1.17.X to v.1.18.0.

SPs attempting to start their miner process following upgrade to v1.18.0 encounter the following error message. The miner crashes entirely and no subsequent logs are output:

2022-11-19T13:44:39.818-0800    INFO    paramfetch      go-paramfetch@v0.0.4/paramfetch.go:233  parameter and key-fetching complete
2022-11-19T13:44:40.024-0800    INFO    stores  paths/index.go:181      New sector storage: 3264ded3-7cac-4ac7-abbb-55a6967bd5ee

The immediate issue can be swiftly resolved by removing duplicate storage paths in the miner/worker storage.json files.

It is not clear how the duplicates happened in the first place. Lotus does not edit these files directly. It is most likely to be a simple manual user config error.

The additional check was added in #9032 as it is now possible to manipulate paths at runtime in many more new ways.

The error messaging following the event is not functioning as expected. SPs are sharing logs that appear to be truncated.

Link to Slack thread

Logging Information

2022-11-19T13:44:39.818-0800    INFO    paramfetch      go-paramfetch@v0.0.4/paramfetch.go:233  parameter and key-fetching complete
2022-11-19T13:44:40.024-0800    INFO    stores  paths/index.go:181      New sector storage: 3264ded3-7cac-4ac7-abbb-55a6967bd5ee

Repo Steps

  1. Run '...'
  2. Do '...'
  3. See error '...'
    ...
@TippyFlitsUK TippyFlitsUK added need/triage kind/bug Kind: Bug kind/enhancement Kind: Enhancement need/team-input Hint: Needs Team Input area/mining Area: Mining and removed need/triage labels Nov 28, 2022
@lbj2004032
Copy link

lbj2004032 commented Nov 30, 2022

I don't have a duplicate path, but I can't start

What should I do?
I have changed V1.18.1 and still cannot start

@lbj2004032
Copy link

This issue has not been resolved yet

Many nodes are stuck in the "New sector" log line

When I modify storage.json and change the Path to a new value (/mnt/lotus/mainData/), this Path is executed by re executing the lotus miner storage attach -- init -- store/mnt/lotus/mainData/, which can be started successfully

Next, I execute the lotus miner attach/mnt/netdisk/mainData command. The program is stuck, and the path/mnt/netdisk/mainData is the old value.

"I have to start using 1.17.0 first. If it succeeds, it must be started using 1.17.0. If it is 1.17.2, the program still cannot be started."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: No status
Development

No branches or pull requests

3 participants