Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preloader for selector traversal #452

Merged
merged 14 commits into from
Mar 29, 2023
Merged

Conversation

hannahhoward
Copy link
Collaborator

This is oft asked for. Wanted to build a selector traversal that was somewhat parallel. (not completely -- just preloading at each level) Also wanted to make sure it would work with graphsync.

Still very WIP, but wanted to make sure this work was documented and available.

@hannahhoward hannahhoward force-pushed the feat/parallel-selector-buffered branch from 8d6c1ae to a5f2039 Compare August 8, 2022 02:44
@willscott
Copy link
Member

I like that we now have several extensions to the link system abstraction that can compose reasonably well. Kudos to @warpfork on that interface.

@rvagg
Copy link
Member

rvagg commented Aug 9, 2022

Neato, what's outstanding to be done here @hannahhoward?

@BigLep BigLep added the P3 Low: Not priority right now label Aug 9, 2022
@rvagg rvagg force-pushed the feat/parallel-selector-buffered branch 2 times, most recently from 9c1cf98 to 21e556c Compare March 8, 2023 00:45
Copy link
Member

@willscott willscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is very neat. looking forward to being able to use it!

linking/preload/buffered.go Outdated Show resolved Hide resolved
@rvagg rvagg force-pushed the feat/parallel-selector-buffered branch from 5d6b7cf to 78271a1 Compare March 11, 2023 03:10
@rvagg
Copy link
Member

rvagg commented Mar 11, 2023

I've done some major surgery on here, because the current implementation didn't quite work how we need.

What we want is to be able to preload all the links in a block before we start doing the traversal-proper across the block. The previous implementation relied on breaking at walkAdv() calls, but those breaks happen both at the block level and at the recursive-node level (lists and maps); so preloads are emitted at the top layer of a block and within each list or map inside that block. Which is unfortunate for dag-pb because every link is inside its own map, which are inside a large list inside the block, so that's 3 layers of walkAdv() deep before we're emitting preload links and we only get one at a time and we immediately try to traverse those links; so we don't even get a chance to preload before we want them.

This new implementation, which I don't think is ideal (it's got a smell, and it doesn't yet handle budgets or LinkVisitOnlyOnce, but it at least works), does two traversals from the block root, the first does a link-collection pass and doesn't load links, then we emit the preload list, then the second pass does the proper traversal.

See newNestedRootNode in the test case for how the old implementation breaks—we get two batches of 3 preload links for the first block but we want one batch of 6 links.

@hannahhoward
Copy link
Collaborator Author

Ah I think I see what you're saying -- basically the design as is doesn't get us really to "preloading all the links in the block before we cross boundaries" huh?

@rvagg
Copy link
Member

rvagg commented Mar 11, 2023

Yes, I realised this after using it in dag-pb with an explore-all and got no real preloading, every preload was singular and immediately followed by an actual load of the same block. Which makes sense when you notice these recursive calls are for doing map and list iterators as well as link loading. Very awkward to manage and I don't think it's going to be practical or sensible to preload the traversal structure like you had either, I think just running the traversal twice over each block is going to be easiest if you want preloading and compared to network costs it's pretty trivial.

@rvagg rvagg added P1 High: Likely tackled by core team if no one steps up and removed P3 Low: Not priority right now labels Mar 13, 2023
@rvagg rvagg self-assigned this Mar 13, 2023
@rvagg
Copy link
Member

rvagg commented Mar 21, 2023

Latest push adds support for Link & Node budgets and LinkVisitOnlyOnce while using the preloader. Which is a bit of a tricky proposition. Unfortunately this makes for more spaghetti, but it's at least a working starting point to attempt to refactor; if I can come up some ideas that don't make more spaghetti out of it!

@rvagg rvagg force-pushed the feat/parallel-selector-buffered branch from 87eabe4 to f7f7e29 Compare March 24, 2023 10:16
@rvagg rvagg changed the title WIP: Parallel selector traversal Preloader for selector traversal Mar 24, 2023
@rvagg rvagg force-pushed the feat/parallel-selector-buffered branch from f7f7e29 to 8759948 Compare March 24, 2023 10:21
@rvagg
Copy link
Member

rvagg commented Mar 24, 2023

@willscott @hannahhoward I think I've landed on a form that both works across the full required feature set and isn't awful. I took my two-pass approach and made a "phase" property that gets passed around. This can then be used to fork the actions for preload vs traversal. I managed to dedupe a bunch of existing things while pulling out more pieces of functionality into separate private methods on Progress.

In this new form, the Preloader function takes one link at a time rather than an array of links for the block - this both makes it easier to manage state (no need to collect them here) and also avoids the need to allocate a slice to collect them unless the end-user wants to do that themselves. Keeps it simple and pushes off the hard decisions to the user.

I've updated filecoin-project/lassie#149 to use the new preloader signature.

Copy link
Member

@willscott willscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really cool!

  • the way the optional preload stage is structured in walkBlock / the use of initialPhase was confusing to me as i read it.
    • might be clearer to either in-line the logic of initialPhase to a block e.g. if (has preloader) { ... do preload .. }?
    • maybe just a comment of 'if preloader we do preload and then walk again for traversal - otherwise this walk is the traversal one' is sufficient

@rvagg
Copy link
Member

rvagg commented Mar 25, 2023

Thanks @willscott, the initialPhase() overhead was a remnant of a previous approach but you're right that it became extraneous. I've inlined it now and it looks much clearer I think.

@willscott
Copy link
Member

A couple non-blocking questions:

  • should the preloader be able to return an error? if preloading of a link fails, that may have the actual underlying error, and the naive error from the subsequent attempt at the traversal would simply be something like "couldn't load block". (put another way, is it worth the complexity for that correlation of errors to live in traversal versus in the consumer?)
  • I wonder if the preload-traversal logic from lassie is sufficiently generic to live here rather than there? What is different in how any other user of this preload functionality would do besides change the parameters on the caching vs main link system?

@rvagg
Copy link
Member

rvagg commented Mar 27, 2023

should the preloader be able to return an error

I don't think so? It's probably safe to assume the preloader will be an async concern, but everything here is synchronous so blocking for an error is going to make things very complicated. The alternative is to build it into the logic of the preloader, which should be the case in Lassie right now (see pl.err in that code), so that when the real traversal loader encounters a block that errored on fetch, it gets that original error.

logic from lassie is sufficiently generic to live here rather than there

Yep! That's the plan. That code started life as @hannahhoward's BufferedLoader that lived in this PR until I removed it. We felt the approach needed perfecting outside of this code before pulling some core of it back in. I still don't know what that core is, but it'd be worth doing another pass later to figure it out and pull back in a utility that anyone can use to plug in their loading logic to. I think it mainly needs to be independent of questions of where/how the blocks are stored and how they are fetched. If we abstract that away then I think we have a neat utility to help run your "parallel"(ish) traversals in whatever form that is.

Copy link
Collaborator Author

@hannahhoward hannahhoward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend considering comments on preloading and budgets. I think it's ok but it should be spelled out.

Otherwise, awesome work!


haveStartAtPath := prog.Cfg.StartAtPath.Len() > 0
var reachedStartAtPath bool
recurse := func(v datamodel.Node, ps datamodel.PathSegment) error {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worry somewhat about the closure memory cost here, though also I honestly don't think I fully understand the memory costs of closures like this one. would be nice to write some benchmark tests, if not now then later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the overhead of the function itself is fairly minimal, but it will obviously hold on to any variables closed over. In this case there's 2 booleans, a progress and the Node and PathSegment arguments.

But, I think it's only the closure reference and one of the booleans that are new in what would be held onto in the recursive call stack. Unless there's something special I'm not aware of with Go functions used in this way then I believe the overhead is negligible compared to what else we're holding on to through the recursion.

traversal/walk.go Show resolved Hide resolved
@rvagg
Copy link
Member

rvagg commented Mar 28, 2023

I've put ample docs in there abut Preloader and Budget but also took the opportunity to update and clean up the traversal docs while I was in there.

Last call before I merge and tag this.

Copy link
Collaborator Author

@hannahhoward hannahhoward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't LGTM cause I originated this PR, but LGTM

@rvagg rvagg force-pushed the feat/parallel-selector-buffered branch from d013fc0 to f5f48aa Compare March 29, 2023 00:53
silence dependabot security, these are already replaced
@rvagg rvagg merged commit f0306af into master Mar 29, 2023
@rvagg rvagg deleted the feat/parallel-selector-buffered branch March 29, 2023 01:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 High: Likely tackled by core team if no one steps up
Projects
Status: 🎉 Done
Development

Successfully merging this pull request may close these issues.

4 participants