Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

integrate DAG store and CARv2 in deal-making #6671

Merged
merged 1 commit into from
Aug 16, 2021

Conversation

aarshkshah1992
Copy link
Contributor

@aarshkshah1992 aarshkshah1992 commented Jul 3, 2021

(Description added by @raulk)

Technical description

This commit removes badger from the deal-making processes, and moves to a new architecture with the dagstore as the cental component on the miner-side, and CARv2s on the client-side.

Every deal that has been handed off to the sealing subsystem becomes a shard in the dagstore. Shards are mounted via the LotusMount, which teaches the dagstore how to load the related piece when serving retrievals.

When the miner starts the Lotus for the first time with this patch, we will perform a one-time migration of all active deals into the dagstore. This is a lightweight process, and it consists simply of registering the shards in the dagstore.

Shards are backed by the unsealed copy of the piece. This is currently a CARv1. However, the dagstore keeps CARv2 indices for all pieces, so when it's time to acquire a shard to serve a retrieval, the unsealed CARv1 is joined with its index (safeguarded by the dagstore), to form a read-only blockstore, thus taking the place of the monolithic badger.

Data transfers have been adjusted to interface directly with CARv2 files. On inbound transfers (client retrievals, miner storage deals), we stream the received data into a CARv2 ReadWrite blockstore. On outbound transfers (client storage deals, miner retrievals), we serve the data off a CARv2 ReadOnly blockstore.

Client-side imports are managed by the refactored *imports.Manager component (when not using IPFS integration). Just like it before, we use the go-filestore library to avoid duplicating the data from the original file in the resulting UnixFS DAG (concretely the leaves). However, the target of those imports are what we call "ref-CARv2s": CARv2 files placed under the $LOTUS_PATH/imports directory, containing the intermediate nodes in full, and the leaves as positional references to the original file on disk.

Client-side retrievals are placed into CARv2 files in the location: $LOTUS_PATH/retrievals.

A new set of Dagstore* JSON-RPC operations and lotus-miner dagstore subcommands have been introduced on the miner-side to inspect and manage the dagstore.

Despite moving to a CARv2-backed system, the IPFS integration has been respected, and it continues to be possible to make storage deals with data held in an IPFS node, and to perform retrievals directly into an IPFS node.


Client imports

Because the "client" Badger blockstores are no longer used, existing imports on the client will be rendered useless.

On startup, Lotus will enumerate all imports and print WARN statements on the log for each import that needs to be reimported. These log lines contain these messages:

  • import lacks carv2 path; import will not work; please reimport
  • import has missing/broken carv2; please reimport

At the end, we will print a "sanity check completed" message indicating the count of imports found, and how many were deemed broken.


About this PR

We believe the test coverage to be pretty satisfactory, with a substantial portion of the LoC diff corresponding to tests.

This commit is a squashed version of 200+ commits developed by @aarshkshah1992, @dirkmc, and @raulk over the course of months.

This contribution was thoroughly tested in the M1 milestone with minerX.2: #6852.

Read more about the motivation here:

Co-authored-by: @aarshkshah1992 @dirkmc @raulk.


TODO

@aarshkshah1992 aarshkshah1992 marked this pull request as draft July 3, 2021 07:28
@aarshkshah1992 aarshkshah1992 changed the title Replace Multistore with CARv2 [WIP] Replace Multistore with CARv2 Jul 3, 2021
@aarshkshah1992 aarshkshah1992 force-pushed the feat/replace-multistore-carv2 branch from 996feea to de13b6e Compare July 3, 2021 08:56
@aarshkshah1992 aarshkshah1992 changed the title [WIP] Replace Multistore with CARv2 [WIP] Replace Badger from deal-making with CARv2 Jul 3, 2021
@aarshkshah1992 aarshkshah1992 changed the title [WIP] Replace Badger from deal-making with CARv2 [WIP] Replace Badger from deal-making with CARv2 and Sharded DAG Store Jul 3, 2021
@aarshkshah1992 aarshkshah1992 changed the title [WIP] Replace Badger from deal-making with CARv2 and Sharded DAG Store Replace Badger from deal-making with CARv2 and Sharded DAG Store Jul 3, 2021
@aarshkshah1992 aarshkshah1992 changed the title Replace Badger from deal-making with CARv2 and Sharded DAG Store Replace Badger from deal-making with CARv2 and the Sharded DAG Store Jul 3, 2021
Copy link
Contributor

@dirkmc dirkmc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking really good so far 👍

node/impl/client/client.go Outdated Show resolved Hide resolved
node/impl/client/client.go Outdated Show resolved Hide resolved
node/repo/importmgr/mgr.go Outdated Show resolved Hide resolved
node/repo/importmgr/mgr.go Outdated Show resolved Hide resolved
Copy link
Contributor

@magik6k magik6k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

((looks like itests/fstmp889313799 got committed by accident))

api/docgen/docgen.go Outdated Show resolved Hide resolved
api/api_full.go Outdated Show resolved Hide resolved
node/builder_chain.go Outdated Show resolved Hide resolved
node/builder_miner.go Outdated Show resolved Hide resolved
node/impl/client/import.go Outdated Show resolved Hide resolved
node/impl/client/import.go Outdated Show resolved Hide resolved
node/repo/importmgr/mgr.go Outdated Show resolved Hide resolved
node/impl/client/import.go Outdated Show resolved Hide resolved
Copy link
Contributor

@hannahhoward hannahhoward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be writing CAR v2 files to the hard drive on the client side. I thought we'd discussed avoiding storing a second copy of the clients data on the HDD. @magik6k are you ok with every import for regular files on the client having a CarV2 copy on the hard drive (which will of course be 32 additional GB in many cases).

The current code jumped through a lot of hoops to avoid ever storing a CAR file on the client's machine, and always using OS-pipes when we did file i/o needed for CommP calculation. So I was wondering if the intent is in fact to no longer do that.

Copy link
Member

@raulk raulk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aarshkshah1992 @dirkmc I think there's a misunderstanding in the changes to client import; this part should be mostly unchanged, as @hannahhoward and @magik6k pointed out above.

IUC, the current state of the system is as follows:

  1. User provides a raw file.
  2. Lotus instantiates a new Filestore using the multistore. The multistore combines two things: an actual blockstore (Badger in this case) and a CID => file offset/length tracker. See https://github.com/filecoin-project/go-multistore/blob/master/store.go#L33. There's special-casing for a special type of block called PosInfo that encodes the offset of the block within the file, and its length.
    • What's not clear to me is how Lotus client import leverages that special casing, because we seem to use a normal DAG service that feeds normal IPLD blocks to the blockstore. @hannahhoward -- illuminate?
  3. When chunking, we do not import blocks into the Badger blockstore, but somehow the PosInfo route is used, so we record CID => (offset, length) in the datastore, thus avoiding creating a copy of the file.

Later, when performing the deal, we simulate forming the CARv1 and feed it to the commP writer as a stream, to calculate the commP. But the CARv1 is never actually written on disk.

The footprint of this operation is just the cost of the CID => (offset, length) positional tracking, which likely wouldn't exceed 3MiB for a 64GiB file chunked with the 1MiB UnixFS chunking policy.

The Graphsync transfer is conducted using the positional mapping (it does not use a Badger store), which is why the original file is needed.

Conversely, this PR introduces these changes:

  1. User provides a raw file.
  2. File is chunked into a CARv2 to calculate its root.
  3. Another CARv2 is generated with the right root.

For a 64GiB raw file, the footprint of the new code is in the order of 128GiB, which is a regression we can't afford.


I suggest to revert the changes in the client import process.

node/builder_miner.go Outdated Show resolved Hide resolved
node/builder_miner.go Outdated Show resolved Hide resolved
Base automatically changed from nonsense/split-market-miner-processes to master July 13, 2021 15:16
node/modules/storageminer.go Outdated Show resolved Hide resolved
@jennijuju jennijuju modified the milestone: 1.11.1 Jul 19, 2021
@jacobheun jacobheun added team/ignite Issues and PRs being tracked by Team Ignite at Protocol Labs epic/sharded-dag-store labels Jul 22, 2021
@raulk raulk force-pushed the feat/replace-multistore-carv2 branch from f648a72 to 72391cc Compare July 29, 2021 20:36
@raulk raulk force-pushed the feat/replace-multistore-carv2 branch 4 times, most recently from 8c90489 to 892e146 Compare August 9, 2021 19:03
Copy link
Contributor

@hannahhoward hannahhoward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, awesome! Yay for this huge change. Most of my comments are suggestions.

However, I have three blocking comments:

  1. The use of RetrievalProviderNode as a dependency to get the IsUnsealed/UnsealSector methods is super awkward and the dependencies need untangling. I've put suggestions on how to do it.
  2. The IPFS blockstore integration is super awkward for retrieval, and I'm pretty sure it's broken completely for storage. We either need to fix it or work closely with partners before we ship with this feature removed. I know PowerGate is the main user, I don't know how important that is.
  3. Unless I am reading the code wrong, it appears that we are still writing second full copies of files out to the hard drive when we import them -- see comments on doImport in node/impl/client/import.go, despite the existence of ReadWriteFilestore which certainly appears like it could be used to avoid doing these writes. This isn't blocking if someone can explain to me why I'm reading the code wrong -- it's fine if I just am not seeing it.

I wonder: Blocking issues 2 & 3 have to do with the person on the client side of deals, not the miner. What is our testing plan for working with this code with people who are making deals? (cause I know we've been focused on beta testing with miners). I realize the dealbot is testing the dealmaking, but it also has space do deal with extra file copies AND it doesn't test the IPFS blockstore functionality.

api/api_storage.go Show resolved Hide resolved
blockstore/badger/blockstore.go Outdated Show resolved Hide resolved
cli/auth.go Outdated Show resolved Hide resolved
cli/client.go Show resolved Hide resolved
cli/util/api.go Outdated Show resolved Hide resolved
node/modules/client.go Outdated Show resolved Hide resolved
@@ -169,7 +169,7 @@ func ConfigFullNode(c interface{}) Option {
If(cfg.Client.UseIpfs,
Override(new(dtypes.ClientBlockstore), modules.IpfsClientBlockstore(ipfsMaddr, cfg.Client.IpfsOnlineMode)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still here but it appears that ClientBlockstore is being ignored by the StorageClient

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nod, we can't lose this, good catch!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


carV2FilePath = resp.CarFilePath
// remove the temp CARv2 file when retrieval is complete
defer os.Remove(carV2FilePath) //nolint:errcheck
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put this in the markets PR but if this code is deleting CarV2 files, it should be managing creating them to.

node/impl/client/client.go Outdated Show resolved Hide resolved
markets/storageadapter/provider.go Outdated Show resolved Hide resolved
@jennijuju jennijuju added this to the v1.11.2 milestone Aug 10, 2021
@jennijuju jennijuju added the P2 P2: Should be resolved label Aug 10, 2021
Copy link
Contributor

@hannahhoward hannahhoward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-emptive approval to unblock merge while I'm on vacation. Will review tomorrow to see if there are any changes before I leave.

@dirkmc
Copy link
Contributor

dirkmc commented Aug 12, 2021

@raulk need to audit if blockstore provider to loader and storer in node/modules/graphsync.go are actually being used

Comment on lines -537 to -538
if err := a.imgr().AddLabel(id, "source", "import-local"); err != nil {
return cid.Cid{}, err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aarshkshah1992 removing this label causes a functional regression.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

prefix, err := merkledag.PrefixForCidVersion(1)
func (a *API) ClientImportLocal(ctx context.Context, r io.Reader) (cid.Cid, error) {
// write payload to temp file
tmpPath, err := a.imgr().NewTempFile(imports.ID(rand.Uint64()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be using an actual ID generated by the import manager.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

nd, err := balanced.Layout(db)
if err != nil {
defer tmpF.Close() //nolint:errcheck
if _, err := io.Copy(tmpF, r); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a performance regression. We just replaced direct UnixFS chunking into a "heavy" CAR file (containing leaf data) with two files (the temp file where we wrote the upload, and the position CAR file). I don't think that's right.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should import straight into a heavy CAR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment on lines 1122 to 1123
id := imports.ID(rand.Uint64())
tmp, err := a.imgr().NewTempFile(id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this even on the ImportMgr, if it's not managed by it nor it uses an actual ID.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

This commit removes badger from the deal-making processes, and
moves to a new architecture with the dagstore as the cental
component on the miner-side, and CARv2s on the client-side.

Every deal that has been handed off to the sealing subsystem becomes
a shard in the dagstore. Shards are mounted via the LotusMount, which
teaches the dagstore how to load the related piece when serving
retrievals.

When the miner starts the Lotus for the first time with this patch,
we will perform a one-time migration of all active deals into the
dagstore. This is a lightweight process, and it consists simply
of registering the shards in the dagstore.

Shards are backed by the unsealed copy of the piece. This is currently
a CARv1. However, the dagstore keeps CARv2 indices for all pieces, so
when it's time to acquire a shard to serve a retrieval, the unsealed
CARv1 is joined with its index (safeguarded by the dagstore), to form
a read-only blockstore, thus taking the place of the monolithic
badger.

Data transfers have been adjusted to interface directly with CARv2 files.
On inbound transfers (client retrievals, miner storage deals), we stream
the received data into a CARv2 ReadWrite blockstore. On outbound transfers
(client storage deals, miner retrievals), we serve the data off a CARv2
ReadOnly blockstore.

Client-side imports are managed by the refactored *imports.Manager
component (when not using IPFS integration). Just like it before, we use
the go-filestore library to avoid duplicating the data from the original
file in the resulting UnixFS DAG (concretely the leaves). However, the
target of those imports are what we call "ref-CARv2s": CARv2 files placed
under the `$LOTUS_PATH/imports` directory, containing the intermediate
nodes in full, and the leaves as positional references to the original file
on disk.

Client-side retrievals are placed into CARv2 files in the location:
`$LOTUS_PATH/retrievals`.

A new set of `Dagstore*` JSON-RPC operations and `lotus-miner dagstore`
subcommands have been introduced on the miner-side to inspect and manage
the dagstore.

Despite moving to a CARv2-backed system, the IPFS integration has been
respected, and it continues to be possible to make storage deals with data
held in an IPFS node, and to perform retrievals directly into an IPFS node.

NOTE: because the "staging" and "client" Badger blockstores are no longer
used, existing imports on the client will be rendered useless. On startup,
Lotus will enumerate all imports and print WARN statements on the log for
each import that needs to be reimported. These log lines contain these
messages:

- import lacks carv2 path; import will not work; please reimport
- import has missing/broken carv2; please reimport

At the end, we will print a "sanity check completed" message indicating
the count of imports found, and how many were deemed broken.

Co-authored-by: Aarsh Shah <aarshkshah1992@gmail.com>
Co-authored-by: Dirk McCormick <dirkmdev@gmail.com>
@raulk raulk force-pushed the feat/replace-multistore-carv2 branch from e4c4699 to 9d9fd9d Compare August 16, 2021 22:20
@raulk raulk marked this pull request as ready for review August 16, 2021 22:30
@raulk raulk requested a review from a team as a code owner August 16, 2021 22:30
@raulk raulk merged commit d707677 into master Aug 16, 2021
@raulk raulk deleted the feat/replace-multistore-carv2 branch August 16, 2021 22:34
)

replace github.com/multiformats/go-multihash => github.com/multiformats/go-multihash v0.0.14
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not do this without a very good reason. If you do, leave a comment explaining why. As far as I can tell, there was no reason to do this and it makes it impossible to update other dependencies.

Copy link
Contributor

@magik6k magik6k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Post merge review pass - looks really good, only 3 nitpicks

// It returns a stream of events to report progress.
DagstoreInitializeAll(ctx context.Context, params DagstoreInitializeAllParams) (<-chan DagstoreInitializeAllEvent, error) //perm:write

// DagstoreGC runs garbage collection on the DAG store.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some basic description (or a docs link) on what this operation does would be great.

Comment on lines +65 to +70
colors := map[string]color.Attribute{
"ShardStateAvailable": color.FgGreen,
"ShardStateServing": color.FgBlue,
"ShardStateErrored": color.FgRed,
"ShardStateNew": color.FgYellow,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would drop the ShardState prefix, since it provides no information to the user, and makes it slightly harder to parse visually


var dagstoreGcCmd = &cli.Command{
Name: "gc",
Usage: "Garbage collect the dagstore",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like with API, some more detailed description would be really useful

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 P2: Should be resolved team/ignite Issues and PRs being tracked by Team Ignite at Protocol Labs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants