-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preloader for selector traversal #452
Conversation
8d6c1ae
to
a5f2039
Compare
I like that we now have several extensions to the link system abstraction that can compose reasonably well. Kudos to @warpfork on that interface. |
Neato, what's outstanding to be done here @hannahhoward? |
9c1cf98
to
21e556c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is very neat. looking forward to being able to use it!
5d6b7cf
to
78271a1
Compare
I've done some major surgery on here, because the current implementation didn't quite work how we need. What we want is to be able to preload all the links in a block before we start doing the traversal-proper across the block. The previous implementation relied on breaking at This new implementation, which I don't think is ideal (it's got a smell, and it doesn't yet handle budgets or LinkVisitOnlyOnce, but it at least works), does two traversals from the block root, the first does a link-collection pass and doesn't load links, then we emit the preload list, then the second pass does the proper traversal. See |
Ah I think I see what you're saying -- basically the design as is doesn't get us really to "preloading all the links in the block before we cross boundaries" huh? |
Yes, I realised this after using it in dag-pb with an explore-all and got no real preloading, every preload was singular and immediately followed by an actual load of the same block. Which makes sense when you notice these recursive calls are for doing map and list iterators as well as link loading. Very awkward to manage and I don't think it's going to be practical or sensible to preload the traversal structure like you had either, I think just running the traversal twice over each block is going to be easiest if you want preloading and compared to network costs it's pretty trivial. |
Latest push adds support for Link & Node budgets and LinkVisitOnlyOnce while using the preloader. Which is a bit of a tricky proposition. Unfortunately this makes for more spaghetti, but it's at least a working starting point to attempt to refactor; if I can come up some ideas that don't make more spaghetti out of it! |
87eabe4
to
f7f7e29
Compare
f7f7e29
to
8759948
Compare
@willscott @hannahhoward I think I've landed on a form that both works across the full required feature set and isn't awful. I took my two-pass approach and made a "phase" property that gets passed around. This can then be used to fork the actions for preload vs traversal. I managed to dedupe a bunch of existing things while pulling out more pieces of functionality into separate private methods on In this new form, the I've updated filecoin-project/lassie#149 to use the new preloader signature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really cool!
- the way the optional preload stage is structured in
walkBlock
/ the use ofinitialPhase
was confusing to me as i read it.- might be clearer to either in-line the logic of
initialPhase
to a block e.g.if (has preloader) { ... do preload .. }
? - maybe just a comment of 'if preloader we do preload and then walk again for traversal - otherwise this walk is the traversal one' is sufficient
- might be clearer to either in-line the logic of
Thanks @willscott, the |
A couple non-blocking questions:
|
I don't think so? It's probably safe to assume the preloader will be an async concern, but everything here is synchronous so blocking for an error is going to make things very complicated. The alternative is to build it into the logic of the preloader, which should be the case in Lassie right now (see
Yep! That's the plan. That code started life as @hannahhoward's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recommend considering comments on preloading and budgets. I think it's ok but it should be spelled out.
Otherwise, awesome work!
|
||
haveStartAtPath := prog.Cfg.StartAtPath.Len() > 0 | ||
var reachedStartAtPath bool | ||
recurse := func(v datamodel.Node, ps datamodel.PathSegment) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worry somewhat about the closure memory cost here, though also I honestly don't think I fully understand the memory costs of closures like this one. would be nice to write some benchmark tests, if not now then later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the overhead of the function itself is fairly minimal, but it will obviously hold on to any variables closed over. In this case there's 2 booleans, a progress
and the Node
and PathSegment
arguments.
But, I think it's only the closure reference and one of the booleans that are new in what would be held onto in the recursive call stack. Unless there's something special I'm not aware of with Go functions used in this way then I believe the overhead is negligible compared to what else we're holding on to through the recursion.
I've put ample docs in there abut Last call before I merge and tag this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't LGTM cause I originated this PR, but LGTM
recursing in walk_* breaks preloading at lists and maps, so we have to do it higher
this may come back in some form but needs to be experimented with externally first
d013fc0
to
f5f48aa
Compare
silence dependabot security, these are already replaced
This is oft asked for. Wanted to build a selector traversal that was somewhat parallel. (not completely -- just preloading at each level) Also wanted to make sure it would work with graphsync.
Still very WIP, but wanted to make sure this work was documented and available.