Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bypass task scheduler for reading unsealed pieces #6280

Merged
merged 25 commits into from
Jun 7, 2021

Conversation

aarshkshah1992
Copy link
Contributor

@aarshkshah1992 aarshkshah1992 commented May 18, 2021

Picks the minimal set of changes we need from @magik6k's PR at #6128.

TODO

  • Unit Tests.

    • Test changes to the http handler to check presence of unsealed pieces and to get sector files.
    • Test the Reader API on the Remote Store that tries to read an unsealed piece from an already unsealed file on any of the workers.
    • Unit tests for the newly introduced SectorUnseal API in the Manager.
  • Integration Tests.

    • ReadPiece for piece in unsealed file on local worker
    • ReadPiece for piece in sealed file on local worker
    • ReadPiece for piece in unsealed file on remote worker
    • ReadPiece for piece in sealed file on remote worker
  • Testing on a devnet/pond with 8MB sector size.

    • Test storage / retrieval
      • Create storage deal
      • Retrieve deal
    • Test storage / retrieval with unsealing
      • Create storage deal
      • Remove unsealed file
      • Retrieve deal (so it unseals)
      • Retrieve deal (so it's freshly unsealed)
    • Test workers
      • Disable unsealing on miner
      • Enable all tasks for worker (so unsealing must happen on worker)
      • Remove unsealed file
      • Retrieve deal (so it unseals)
      • Retrieve deal (so it's freshly unsealed)
  • Manual test against mainnet miner (running single worker)

@aarshkshah1992 aarshkshah1992 marked this pull request as draft May 18, 2021 07:33
@aarshkshah1992 aarshkshah1992 requested review from dirkmc and nonsense May 18, 2021 12:33
Copy link
Contributor

@dirkmc dirkmc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking pretty good 👍 - Just a few nits

extern/sector-storage/manager.go Outdated Show resolved Hide resolved
extern/sector-storage/piece_provider.go Outdated Show resolved Hide resolved
extern/sector-storage/piece_provider.go Show resolved Hide resolved
extern/sector-storage/piece_provider.go Show resolved Hide resolved
extern/sector-storage/piece_provider.go Show resolved Hide resolved
extern/sector-storage/stores/http_handler.go Outdated Show resolved Hide resolved
extern/sector-storage/stores/remote.go Outdated Show resolved Hide resolved
extern/sector-storage/stores/remote.go Show resolved Hide resolved
extern/sector-storage/stores/remote.go Outdated Show resolved Hide resolved
@dirkmc
Copy link
Contributor

dirkmc commented May 18, 2021

I suggest in the description of the PR we add a checkbox list of things that have been done (implement change) and things that need to be done before this PR is ready to be reviewed (adding tests)

return nil, xerrors.Errorf("failed to read sector %v from remote(%d): %w", s, ft, storiface.ErrSectorNotFound)
}

// TODO Why are we sorting in ascending order here -> shouldn't we sort in descending order as higher weight means more likely to have the file ?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@magik6k

This looks incorrect, right ? We should sort in descending order of weight because a higher weight means a higher probability that it'll have the unsealed file we are looking for here.

Copy link
Contributor Author

@aarshkshah1992 aarshkshah1992 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dirkmc Have addresses all your review comments. Unit Tests are WIP.

extern/sector-storage/stores/remote.go Outdated Show resolved Hide resolved
extern/sector-storage/piece_provider.go Show resolved Hide resolved
extern/sector-storage/piece_provider.go Show resolved Hide resolved
extern/sector-storage/piece_provider.go Show resolved Hide resolved
return nil, xerrors.Errorf("failed to read sector %v from remote(%d): %w", s, ft, storiface.ErrSectorNotFound)
}

// TODO Why are we sorting in ascending order here -> shouldn't we sort in descending order as higher weight means more likely to have the file ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's probably a bug, also in acquireFromRemote

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't have much impact, as in most cases sectors are only in one storage, which is likely why it wasn't noticed

Copy link
Contributor

@magik6k magik6k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First review pass, looks good.

We should also bump the minor miner/worker api versions in api/version.go given that the storage http api now has the new endpoint

Also ideally we'd test this on a mainnet miner with some workers before merging just to confirm it doesn't have issues in real-world setups

Copy link
Contributor

@dirkmc dirkmc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good Aarsh 👍

I'll hold off on approving until we have the tests in place

@@ -17,6 +18,14 @@ func (i UnpaddedByteIndex) Padded() PaddedByteIndex {
return PaddedByteIndex(abi.UnpaddedPieceSize(i).Padded())
}

func (i UnpaddedByteIndex) Valid() error {
if i%127 != 0 {
return xerrors.Errorf("unpadded byte index must be a multiple of 127")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return xerrors.Errorf("unpadded byte index must be a multiple of 127")
return xerrors.Errorf("unpadded byte index is %d but must be a multiple of 127", i)

@aarshkshah1992
Copy link
Contributor Author

@magik6k

  1. We are working on some solids unit tests & integration tests for this PR and you should see them here in some days.
  2. Once the unit & integration tests are in place, we'll test it on a devnet with 8K Sectors and on the mainnet too to ensure things are good.

@dirkmc dirkmc force-pushed the feat/pieceread-outside-scheduler branch from 9c6462f to ed4d54b Compare May 19, 2021 20:39
@dirkmc dirkmc force-pushed the feat/pieceread-outside-scheduler branch 2 times, most recently from b606f8a to c17300d Compare May 20, 2021 21:25
@@ -87,7 +86,6 @@ type WorkerCalls interface {
ReleaseUnsealed(ctx context.Context, sector storage.SectorRef, safeToFree []storage.Range) (CallID, error)
MoveStorage(ctx context.Context, sector storage.SectorRef, types SectorFileType) (CallID, error)
UnsealPiece(context.Context, storage.SectorRef, UnpaddedByteIndex, abi.UnpaddedPieceSize, abi.SealRandomness, cid.Cid) (CallID, error)
ReadPiece(context.Context, io.Writer, storage.SectorRef, UnpaddedByteIndex, abi.UnpaddedPieceSize) (CallID, error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

extern/sector-storage/stores/util_unix.go Show resolved Hide resolved
Comment on lines +241 to +252
svc := &http.Server{
Addr: nl.Addr().String(),
Handler: mux,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use https://golang.org/pkg/net/http/httptest if it would simplify things

extern/sector-storage/stores/http_handler.go Outdated Show resolved Hide resolved
extern/sector-storage/stores/remote.go Outdated Show resolved Hide resolved
@jacobheun jacobheun added the team/ignite Issues and PRs being tracked by Team Ignite at Protocol Labs label May 21, 2021

var uns bool

if r == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if r == nil we don't have to call unlock ? Because we do have a few returns in this branch that are not calling unlock...?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If r == nil unlock is called within tryReadUnsealedPiece:
https://github.com/filecoin-project/lotus/pull/6280/files/40642b2cad0bb247dd789837c6465163e5b59742#diff-1fe5f948c5ef6d7d11919b998fedd74927b0cfac11ebb3ef1dbaf518fc026999R61

I found this very hard to follow as well, I wonder if there's a way we can refactor to simplify

Copy link
Contributor

@dirkmc dirkmc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking really good Aarsh, I love these comprehensive unit and integration tests 👍 👍

extern/sector-storage/stores/remote.go Show resolved Hide resolved
extern/sector-storage/piece_provider.go Show resolved Hide resolved
@aarshkshah1992
Copy link
Contributor Author

@magik6k @dirkmc @nonsense Have addressed your review. Please take a look.

Copy link
Contributor

@magik6k magik6k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, if testing on mainnet goes well we can probably just merge

@aarshkshah1992 aarshkshah1992 marked this pull request as ready for review June 7, 2021 10:15
@aarshkshah1992 aarshkshah1992 force-pushed the feat/pieceread-outside-scheduler branch from 50e023e to 21e6b50 Compare June 7, 2021 10:32
@aarshkshah1992 aarshkshah1992 changed the title [WIP Draft] Bypass task scheduler for reading unsealed pieces Bypass task scheduler for reading unsealed pieces Jun 7, 2021
@aarshkshah1992
Copy link
Contributor Author

@magik6k The failing tests look like flakies. Please can you merge the PR if you are happy ?

@magik6k
Copy link
Contributor

magik6k commented Jun 7, 2021

build-lotus-soup is failing (and not flaky)

@aarshkshah1992
Copy link
Contributor Author

@magik6k The build-lotus-soup job has been fixed. Please let's ship this ! :) 🚢

@magik6k magik6k merged commit fadc79a into master Jun 7, 2021
@magik6k magik6k deleted the feat/pieceread-outside-scheduler branch June 7, 2021 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team/ignite Issues and PRs being tracked by Team Ignite at Protocol Labs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants