-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve cte scheduling #22205
Improve cte scheduling #22205
Conversation
Did not do this since it needs a lot of additional refactor in the scheduling layer to handle + this won't be that effective since the query will be blocked till the cte is written wasting resources which is not always ideal. |
a491dca
to
6115223
Compare
c0b5982
to
edd771f
Compare
I'm having trouble understanding why we need to keep track of the whole cte graph vs. just the direct dependencies of the sequence node. (also, for the commits - the commit that moves sequence node out of spi should be first since the other one depends on it. And also the other commit, take out the uber specific references and maybe add more information to the body). |
After logical planning, the sequence node stores the cte producers in a topologically sorted list.
We need to keep a track of the whole graph since in this example cte1 must be executed before cte2 since cte1 is a dependency of cte2 Also consider cases like Graph (cte3->cte4, cte2->cte1, cte5). Here the three subgraphs are independent and can be scheduled independently. same for (cte1, cte2, cte3) where all are independent. |
Moved Sequence node to presto main since a dependency of com.google.common.graph was needed which is not available in spi and previous disussions have suggested not adding that dependency to spi
09b13cc
to
1367d3c
Compare
Codenotify: Notifying subscribers in CODENOTIFY files for diff c5674d6...ca02943. No notifications. |
But does cte2 have its own sequence node where it depends on cte1? Something like below:
I'm basically trying to understand why the graph needs to be specified explicitly vs. derived from the plan structure. |
Ahh I see, nope currently there is only one sequence node per query and it is responsible for maintaining all these dependencies. Reasoning behind this was ease of other optimizations and cases where its difficult to represent in the plan structure since there is a common node |
// This approach creates subgraphs sequentially, enhancing control over the execution flow. However, there are optimization opportunities: | ||
// 1. Can consider blocking only the CTEConsumer stages that are in a reading state. | ||
// This approach sounds good on paper may not be ideal as it can block the entire query, leading to resource wastage since no progress can be made until the writing operations are complete. | ||
// 2. ToDo: Another improvement will be to schedule the execution of subgraphs based on their order in the overall execution plan instead of a topological sorting done here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
general question again, but still trying to understand the basics. can you draw out what a fragmented plan with dependent ctes would look like before and after this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, lets take an example case of Graph(a, b) or sql
WITH t as (SELECT * FROM orders), t1 as (Select * FROM customer) SELECT * FROM t JOIN t1 ON true
Lets say that the Plan would be the below for simplicity.
...
Exchange
Sequence:
cteProducer1
cteProducer2
PrimarySource:
Exchange
Join
....
Lets say the primary source creates subplan (subPrimary1->subPrimary2)
and creates a list of subplans for the cteProducers as (subCteProducerT1 and subCteProducerT2)
The code is a bit tricker here but in the visitSequence we add the the cteproducers to the context created by the parent exchange. (here is actually where the parent exchange creates new properties or context)
so before the subplans look like
SubParent -> subPrimary1 -> subPrimary2
-> subCteProducer1 -> subCteProducer2
code is where we were adding the subCteProducer2 as a child of subCteProducer1
Now we would add the subCteProducer2 to the same level as the subCteProducer1
SubParent -> subPrimary1 -> subPrimary2
-> subCteProducer1
-> subCteProducer2
We can look at the subCteProducer1 and subCteProducer2 as independent subgraphs for the general case of independent subgraphs
In both cases similar structure would be represented in the sectionedPlan
1367d3c
to
75b3799
Compare
Optional<String> parentCte = peekActiveCte(); | ||
parentCte.ifPresent(s -> graph.putEdge(currentCte, s)); | ||
// (current -> parentCte) this indicates that currentCte must be processed first |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed wrong comments about the dependency graph and added explanations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could reverse the graph to be according to the comments and then add a reverse in topological ordering, but then i would have to update some tests (since topological ordering is not unique). Its easier to update the comments
@@ -342,9 +349,29 @@ public boolean isComplexCte(String cteId) | |||
public List<PlanNode> getTopologicalOrdering() | |||
{ | |||
ImmutableList.Builder<PlanNode> topSortedCteProducerListBuilder = ImmutableList.builder(); | |||
Traverser.forGraph(graph).depthFirstPostOrder(graph.nodes()) | |||
Traverser.forGraph(cteDependencyGraph).depthFirstPostOrder(cteDependencyGraph.nodes()) | |||
.forEach(cteName -> topSortedCteProducerListBuilder.add(cteProducerMap.get(cteName))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we are not reversing the list and hence the topological ordering is correct, if we reverse the graph, we should reverse the list here and in SequenceNode
@@ -531,7 +532,7 @@ public void testSequence() | |||
Optional.empty(), | |||
new PlanNodeId("sequence"), | |||
ImmutableList.of(cteProducerNode1, cteProducerNode2), | |||
joinNode); | |||
joinNode, GraphBuilder.directed().build()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put this on its own line
This improves latency of cte materialized queries by scheduling subgraphs independently
75b3799
to
ca02943
Compare
Description
Fixes #21639
Previously each materialized cte was scheduled one by one for safe dependent cte handling. This PR reduces latency improves cte scheduling by scheduling multiple subgraphs of dependent ctes independently.
This PR has these major changes
Commit 1 - link
Commit 2 link
Motivation and Context
Impact
Test Plan
Tested with prod queries and confirmed from the splits page that the behavior was as expected for independent ctes
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.