-
Notifications
You must be signed in to change notification settings - Fork 824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eager loading for DataObject relationships #8426
Comments
I really like this, but given the importance of GraphQL to the current but certainly the future performance of the CMS, I think we'd want to consider how any kind of N+1 avoidance also fits with GraphQL. From what I understand from |
This was a bit of a hasty brain dump, and I completely omitted one of the most important points, which is GraphQL. Have edited the OP to reflect that. My understanding of The deferred resolvers basically afford the developer the opportunity to provide different implementations based on what and how much is being queried. With each call to a nested field, the deferred resolver would respond by adding the appropriate |
OK, nice sounds promising! |
Some specific feedback on the PoC implementation: SyntaxThe nested fluent syntax is kind of complicated, and I'm not sure that it would be the most practical way of managing a set of eager-loading metadata that may change frequently and/or be dynamically generated. I'd prefer something that just accepted a payload of eager-loading metadata, eg; Team::get()->eagerLoad([
'Title',
'Schedule' => [
'Title',
'Matches' => [
'Score'
'Attendees' => [
'Name'
],
],
]); Or: Team::get()->eagerLoad([
'Title',
'Schedule.Title',
'Schedule.Matches.Score',
'Schedule.Matches.Attendees.Name',
]); Of course, we could have the nested fluent syntax as well — I just don't think it should be the only API for specifying configuration. ScopeIn my view, "eager loading" should be interpreted as "data-requirement hints". This means a few things:
Identity Map / Query CacheThe eager loader currently relies on calling setComponent and an as-yet-unspecified set[ComponentName] on DataObjects while they're being instantiated. It feels like it could become a bit brittle. On the other hand, SilverStripe's get_one / get_by_id cache is both absolutely critical for performance and half baked, and nothing similar exists for multi-record queries. An alternative approach that might be better, would be for data (both single records and lists) to be fetched from an intermediary service that allows some caching and batch pre-population. The eager loader can then trigger pre-populations of this cache. This might also provide facilities for caching / performance improvement in other areas. Tie-in implementationsWith that scope comment in place, GraphQL should pass every property request into the eager loading implementations, not just the relations. However, I think we could do something with the template engine as well — we should look to have some ability for a template to provide hints about the fields that are being requested on this template, and pass that to the eager loader. Probably something like |
I think Sam's syntax looks nicer. That's also how Laravel does eager loading: |
Yeah, the implementation I started last hackday uses I'm not so keen on adding individual fields as I think something like: $teams->with([
'Captain',
'Schedule' => [
'Matches' => [
'Referees'
]
]
]) Where |
I think it's important that we do so at some point, because excessive-triggering-of-lazy-loading is another source of querysplosion. It could be phase 2, but let's consider it in the API design.
I see where you're coming from but I think it's misrepresents the situation:
|
Yeah, I think all that makes sense as a good phase two. I think we're in agreement that explicit field selection is almost a separate story. The fact that I only want to render The low-hanging fruit for this is in the High-level query count maths:
|
Great to see progress on this - haven't really caught up with the discussion, but since there's been people asking for this to go into 4.4, I'd just raise that the internal feature freeze for this is 18th of Jan, in order to hit a stable release target of 1st of March. Yes, that's a long time, there's good reasons for all of this. In terms of Open Sourcerers involvement, I've moved this up to the top of the backlog to be discussed on 7th of Jan. It's cutting it pretty close overall, so any peer review and stabilisation effort from @silverstripe/core-team until then is appreciated. Next release window for 4.5 would be around 1st of June, so six months away. |
I have a POC working for basic has_one and has_many. #8687 Benchmarks
Baseline: (no query, just plain request): 340ms, 7 queries API: return Team::get()
->with([
'CurrentSchedule',
'Players',
]); |
I realise I'm a bit late to the party, but what were the considered options for naming this part of the API? I know what eager loading is because I've been following it a little, but I don't know if it communicates what its doing that clearly |
|
FYI Aaron has a lot on his plate at the moment, and we're under pressure to get 4.4.0 out soon. We've got an internal feature freeze for that release line next week. So I don't see this happening in 4.4.x, it would need to be 4.5.x (around June/July?) |
Ok, that seems realistic. @chillu you mentioned this morning "not sure how this relates to GraphQL" so just wanted to make sure we were all clear that this is a massive performance increase for larger GraphQL queries? I would expect that this would rapidly come a critical feature for a complex, high-traffic GraphQL-backed site. 4.5 is probably okay for that, but just wanted to make sure we're all on the same page. |
Yeah I'm aware of that connection, what I meant to say this morning is that I don't see it as a strong one. I want to see this in core as well, but unless somebody other than Aaron can take the lead, it'll have to wait a few weeks |
Just for reference, Aaron has raised a PR at #8687 |
Been searching to see if something like this was available. I've got a page with just shy of a thousand queries which would be solved with eager loading. My temp fix as been to stop using the ORM for parts of it and hand crafting some queries SQLSelect. Is this something possible for 4.7? |
Had to implement something myself at least for has_one/has_many with syntax like Can you, guys, tell me if this approach worth exploring further? Thank you. |
Came up with more or less working product with minimal required configuration: https://github.com/gurucomkz/silverstripe-eagerloading Looking forward for some feedback. |
@gurucomkz Nice work! That was the initial approach I used when solving this problem for a bespoke project, but we found that a more suitable long-term fix was to add caching to the DataQuery level, and allow eager loading to essentially be a cache warmer with a nice API. |
@unclecheese Thanks! You see, I've got a website with 500k+ accounts each having a 10-year history of orders and a shopping cart + prognosis & analytics tools inside. All the data has to be retrieved fresh and not cached, I can't see how a cache warmer may help here - for a live admin that deals with constantly changing structures the cache is pure evil, is it not? |
PRs have been merged. Close issue. |
DataList
does not allow caching or joining of relationships, particularlyhas_many
andmany_many
relationships, which leads to an N+1 problemin many contexts.
If there are 50 teams and each team has 10 matches, this results in 601 queries.
While DataObjects do a good job of caching
has_one
relationships viagetComponent
, plural relationships are not cached, and joining their respective tables on to the query doesn't help. Each time one of those methods is called, it results in a new query.[EDIT]
This has a massive impact on GraphQL, where nested queries are encouraged.
Proposal
Allow eager loading of relationships and stuff them into memory somewhere, and apply them to dataobjects when they're created.
Proof of concept
https://github.com/unclecheese/silverstripe-eager-loader
For discussion
How are other frameworks/ORMs solving this issue?
Acceptance criteria
@internal
where appropriate.Notes
PRs
The text was updated successfully, but these errors were encountered: