Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: sealing: storage redeclare/detach #9032

Merged
merged 17 commits into from
Aug 5, 2022
Merged

Conversation

magik6k
Copy link
Contributor

@magik6k magik6k commented Jul 13, 2022

Related Issues

Based on #9013
Closes #9019
Closes #3716

Proposed Changes

  1. Add new APIs on lotus-worker/lotus-miner
StorageDetachLocal(ctx context.Context, path string) error                           //perm:admin
StorageRedeclareLocal(ctx context.Context, id *storiface.ID, dropMissing bool) error //perm:admin
  1. Add an API on lotus-miner to tell the sector index that a path is being detached
StorageDetach(ctx context.Context, id storiface.ID, url string) error                                                              //perm:admin
  1. Add storage detach/redeclare commands to lotus-miner/lotus-worker
  • lotus-miner storage detach
NAME:
   lotus-miner storage detach - detach local storage path
USAGE:
   lotus-miner storage detach [command options] [path]
OPTIONS:
   --really-do-it  (default: false)
   
  • lotus-miner storage redeclare
NAME:
   lotus-miner storage redeclare - redeclare sectors in a local storage path
USAGE:
   lotus-miner storage redeclare [command options] [arguments...]
OPTIONS:
   --all           redeclare all storage paths (default: false)
   --drop-missing  Drop index entries with missing files (default: false)
   --id value      storage path ID
   

Checklist

Before you mark the PR ready for review, please make sure that:

  • All commits have a clear commit message.
  • The PR title is in the form of of <PR type>: <area>: <change being made>
    • example: fix: mempool: Introduce a cache for valid signatures
    • PR type: fix, feat, INTERFACE BREAKING CHANGE, CONSENSUS BREAKING, build, chore, ci, docs,perf, refactor, revert, style, test
    • area: api, chain, state, vm, data transfer, market, mempool, message, block production, multisig, networking, paychan, proving, sealing, wallet, deps
  • This PR has tests for new functionality or change in behaviour
  • If new user-facing features are introduced, clear usage guidelines and / or documentation updates should be included in https://lotus.filecoin.io or Discussion Tutorials.
  • CI is green

Base automatically changed from feat/path-type-filters to master July 15, 2022 11:08
@codecov
Copy link

codecov bot commented Jul 15, 2022

Codecov Report

Merging #9032 (beeab64) into master (ac2ea03) will decrease coverage by 0.12%.
The diff coverage is 2.69%.

❗ Current head beeab64 differs from pull request most recent head a1ece56. Consider uploading reports for the commit a1ece56 to get more accurate results

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9032      +/-   ##
==========================================
- Coverage   40.73%   40.61%   -0.13%     
==========================================
  Files         707      707              
  Lines       78875    79093     +218     
==========================================
- Hits        32132    32120      -12     
- Misses      41265    41478     +213     
- Partials     5478     5495      +17     
Impacted Files Coverage Δ
api/api_storage.go 0.00% <ø> (ø)
cmd/lotus-miner/storage.go 41.33% <0.00%> (-2.60%) ⬇️
cmd/lotus-worker/main.go 0.00% <0.00%> (ø)
cmd/lotus-worker/sealworker/rpc.go 23.17% <0.00%> (-14.09%) ⬇️
cmd/lotus-worker/storage.go 0.00% <0.00%> (ø)
node/impl/storminer.go 23.36% <0.00%> (-0.29%) ⬇️
storage/paths/index.go 65.37% <0.00%> (-10.27%) ⬇️
storage/paths/mocks/index.go 13.88% <0.00%> (-1.12%) ⬇️
storage/sealer/manager.go 59.20% <0.00%> (-3.07%) ⬇️
storage/paths/local.go 58.00% <17.14%> (-3.25%) ⬇️
... and 19 more

@magik6k magik6k force-pushed the feat/storage-redeclare branch from 6990226 to 459c5f0 Compare July 15, 2022 14:38
@magik6k magik6k changed the title (wip) feat: sealing: storage redeclare/detach feat: sealing: storage redeclare/detach Jul 15, 2022
@magik6k
Copy link
Contributor Author

magik6k commented Jul 15, 2022

Tests need fixing, likely due to changes in itests ensemble: import presealed sector metadata - a bunch of itests are likely calling SectorsList, and asserting things on those sectors, not expecting genesis sectors in that list.

Easiest fix for those is likely adding a helper method SectorsListNonGenesis on kit.TestMiner, and replacing uses of SectorsList in broken itests

cmd/lotus-miner/storage.go Outdated Show resolved Hide resolved
@rjan90
Copy link
Contributor

rjan90 commented Jul 27, 2022

Been testing this PR, and after fixing the typo in if !cctx.Bool("really-do-id"), things are looking good so far:

  • On a miner
    • Detach a storage path (One small UX-nit here would be to print a "Detached /path/to/storage/ successfully" when detaching.
    • Put sector files, and check that after redeclare, lotus-miner sectors find lists the sector in the path:

Before move:

lotus-miner storage find 2
In c3987105-1450-44fd-af25-ef15690c67a7 (Sealed, Cache)
        Sealing: false; Storage: true
        Local (/root/storage)
        URL: http://127.0.0.1:2345/remote/sealed/s-t01008-2

Moving sectors:

mv storage/sealed/s-t01008-2 storage45/sealed
mv storage/cache/s-t01008-2 storage45/cache

Redeclare & check sectors find:

lotus-miner storage redeclare --all /root/storage45
lotus-miner storage find 2
In c3987105-1450-44fd-af25-ef15690c67a7 (Sealed, Cache)
        Sealing: false; Storage: true
        Local (/root/storage)
        URL: http://127.0.0.1:2345/remote/sealed/s-t01008-2
In fb10368f-f33f-4882-a047-3e7f79412eaa (Sealed, Cache)
        Sealing: false; Storage: true
        Local (/root/storage45)
  • Remove sector file(s), make sure they aren't listed in lotus-miner sectors find after lotus-miner storage redeclare --drop-missing
lotus-miner storage redeclare --drop-missing --all /root/storage
lotus-miner storage find 2
In fb10368f-f33f-4882-a047-3e7f79412eaa (Sealed, Cache)
        Sealing: false; Storage: true
        Local (/root/storage45)
        URL: http://127.0.0.1:2345/remote/sealed/s-t01008-2
  • On a worker
    • Detach a storage path

Before detaching

eb7d2640-20e6-4b2f-81ef-c60ea50c32d0:
        Weight: 10; Use: Seal 
        Local: /root/.lotusworker
2462c12e-ba04-489d-bd3c-b0df21cfb890:
        Weight: 10; Use: Seal 
        Local: /root/worker

Detaching and checking paths:

lotus-worker storage detach --really-do-it /root/worker
lotus-worker info
eb7d2640-20e6-4b2f-81ef-c60ea50c32d0:
        Weight: 10; Use: Seal 
        Local: /root/.lotusworker
  • Put sector files, and check that after redeclare, lotus-miner sectors find lists the sector in the path

Before move

lotus-miner storage find 24
In eb7d2640-20e6-4b2f-81ef-c60ea50c32d0 (Unsealed, Sealed, Cache)
        Sealing: true; Storage: false
        Remote
        URL: http://127.0.0.1:3456/remote/unsealed/s-t01008-24
        URL: http://127.0.0.1:2345/remote/unsealed/s-t01008-24

Moving files:

mv  ~/.lotusworker/cache/s-t01008-24 /root/worker/cache
mv  ~/.lotusworker/unsealed/s-t01008-24 /root/worker/unsealed
mv  ~/.lotusworker/sealed/s-t01008-24 /root/worker/sealed

Redeclare and check paths:

lotus-worker storage redeclare --all /root/worker
lotus-miner storage find 24
In 2462c12e-ba04-489d-bd3c-b0df21cfb890 (Unsealed, Sealed, Cache)
        Sealing: true; Storage: false
        Remote
        URL: http://127.0.0.1:2345/remote/unsealed/s-t01008-24
        URL: http://127.0.0.1:3456/remote/unsealed/s-t01008-24
In eb7d2640-20e6-4b2f-81ef-c60ea50c32d0 (Unsealed, Sealed, Cache)
        Sealing: true; Storage: false
        Remote
        URL: http://127.0.0.1:3456/remote/unsealed/s-t01008-24
        URL: http://127.0.0.1:2345/remote/unsealed/s-t01008-24
  • Remove sector file(s), make sure they aren't listed in lotus-miner sectors find after lotus-worker storage redeclare --drop-missing
lotus-worker storage redeclare --drop-missing --all /root/.lotusworker
lotus-miner storage find 24
In 2462c12e-ba04-489d-bd3c-b0df21cfb890 (Unsealed, Sealed, Cache)
        Sealing: true; Storage: false
        Remote
        URL: http://127.0.0.1:2345/remote/unsealed/s-t01008-24
        URL: http://127.0.0.1:3456/remote/unsealed/s-t01008-24

I moved all the files and redeclaring+drop-missing on the lotus-worker while the sector was in WaitSeed, and the sector where able to finish the sealing 🥳

cmd/lotus-worker/storage.go Outdated Show resolved Hide resolved
@rjan90
Copy link
Contributor

rjan90 commented Jul 28, 2022

Okay, so one strange thing happened during my testing of the lotus-worker storage detach/attach command.

I first detached the /root/worker storage path to test that it functioned, and after that I had to attach it again to test the redeclare function. After I attached the /root/worker path, that path is shown properly in the lotus-worker info:

lotus-worker info
Worker version:  1.6.0
CLI version: lotus-worker version 1.17.1-dev+butterflynet+git.459c5f008.dirty

Session: 9d5b00c5-f330-4447-b87a-c4556470519a
Enabled: true
Hostname: Ubuntu-2004-focal-64-minimal
CPUs: 12; GPUs: []
RAM: 4.781 GiB/62.79 GiB; Swap: 71.25 MiB/128 GiB
Task types: FIN GET FRU UNS C1 C2 PC2 PC1 PR1 PR2 RU AP DC GSK 

eb7d2640-20e6-4b2f-81ef-c60ea50c32d0:
        Weight: 10; Use: Seal 
        Local: /root/.lotusworker
2462c12e-ba04-489d-bd3c-b0df21cfb890:
        Weight: 10; Use: Seal 
        Local: /root/worker

But when I check the lotus-miner storage list that path is shown as not found 🤔. It was able to successfully move the finished sealed sector from the /root/worker path to the long-term storage path though.

007bc396-cf31-4493-b75a-412e8fdccc78:
        [##########################################        ] 366.8 GiB/436.3 GiB 84%
        Unsealed: 0; Sealed: 0; Caches: 0; Reserved: 0 B
        Weight: 10; Use: Seal 
        Local: /root/sealing
        URL: http://127.0.0.1:2345/remote

[...............]

2462c12e-ba04-489d-bd3c-b0df21cfb890:
        Error: fsstat: path not found:

So repro steps to get some errors (This is also mostly caused when mixing attaching a path on both lotus-miner and lotus-worker)

  1. Create/Attach storage path with lotus-miner storage attach --init --seal /root/worker4
  2. Attach path on worker: lotus-worker storage attach /root/worker4
  3. Detach path on worker: lotus-worker storage detach --really-do-it /root/worker4
  4. Get error:
2022-07-29T09:51:42.320+0200    WARN    main    lotus-worker/main.go:113        dropping path (id='b7cb6aca-34c8-46e3-b322-b49a8d8c6636' url='http://127.0.0.1:3456/remote'):
not dropping path, requested and index urls don't match ('http://127.0.0.1:3456/remote' != 'http://127.0.0.1:2345/remote')
  1. Drop path on lotus-miner: lotus-miner storage detach --really-do-it /root/worker4
  2. Try to drop path on worker again: lotus-worker storage detach --really-do-it /root/worker4
    That gives this error, and inability to get rid of the storage path:
    2022-07-29T09:55:43.705+0200 WARN main lotus-worker/main.go:113 path not found in storage.json
    But the path is still in lotus-worker info

@magik6k magik6k force-pushed the feat/storage-redeclare branch from 459c5f0 to 1ee82dd Compare August 1, 2022 14:47
@magik6k
Copy link
Contributor Author

magik6k commented Aug 2, 2022

@rjan90 Managed to repro, and hopefully fix the issue in 6d29903

@magik6k magik6k force-pushed the feat/storage-redeclare branch 4 times, most recently from 65292dc to 82f9221 Compare August 2, 2022 15:24
@magik6k magik6k force-pushed the feat/storage-redeclare branch from 82f9221 to 8294e03 Compare August 2, 2022 15:35
@magik6k magik6k marked this pull request as ready for review August 2, 2022 18:31
@magik6k magik6k requested a review from a team as a code owner August 2, 2022 18:31
Copy link
Contributor

@geoff-vball geoff-vball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two quickly fixed comments.

If @rjan90 can confirm the fix you made, this should be good to go.

cmd/lotus-miner/storage.go Show resolved Hide resolved
cmd/lotus-worker/storage.go Show resolved Hide resolved
@magik6k
Copy link
Contributor Author

magik6k commented Aug 5, 2022

Dropped the args check in the redeclare command as it doesn't use those args.

@rjan90
Copy link
Contributor

rjan90 commented Aug 5, 2022

If @rjan90 can confirm the fix you made, this should be good to go.

I re-ran the detach shared paths issue, and where able to successfully detach the storage on both worker and miner now 👍 So I can confirm that it´s fixed.

@jennijuju
Copy link
Member

@magik6k conflicts on docs - then merge? we want this in v1.17.1!!

@lbj2004032
Copy link

This issue has not been resolved yet

Many nodes are stuck in the "New sector" log line

When I modify storage.json and change the Path to a new value (/mnt/lotus/mainData/), this Path is executed by re executing the lotus miner storage attach -- init -- store/mnt/lotus/mainData/, which can be started successfully

Next, I execute the lotus miner attach/mnt/netdisk/mainData command. The program is stuck, and the path/mnt/netdisk/mainData is the old value.

"I have to start using 1.17.0 first. If it succeeds, it must be started using 1.17.0. If it is 1.17.2, the program still cannot be started."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

lotus-miner/worker storage redeclare Storage detach
5 participants