-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Heuristic CTE Materialization Strategy #21720
Conversation
cfa8b25
to
b3d82f7
Compare
@feilong-liu @mlyublena Please review when you get time, thanks! |
@@ -357,7 +357,9 @@ public boolean isAdoptingMergedPreference() | |||
public enum CteMaterializationStrategy | |||
{ | |||
ALL, // Materialize all CTES | |||
NONE // Materialize no ctes | |||
NONE, // Materialize no CTES | |||
HEURISTIC, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add comments for the new values
NONE // Materialize no ctes | ||
NONE, // Materialize no CTES | ||
HEURISTIC, | ||
HEURISTIC_STRICT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there is a better name than strict: perhaps HEURISTIC_COMPLEX_QUERIES_ONLY? we already have "complex query" reference in another session property (RESTRICT_HISTORY_BASED_OPTIMIZATION_TO_COMPLEX_QUERY)
HashMap<String, CTEInformation> cteInformationMap = session.getCteInformationCollector().getCteInformationMap(); | ||
CTEInformation cteInfo = cteInformationMap.get(node.getCteId()); | ||
switch (getCteMaterializationStrategy(session)) { | ||
case HEURISTIC_STRICT: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to also disable materialization of VALUES nodes? so only proceed if the query is a proper join/aggregation on top of table scans
LGTM modulo some nits |
93ee4fb
to
693ad79
Compare
@mlyublena Thanks for your review, I have
|
7b1cc96
to
db3edae
Compare
Rebased |
@@ -1082,6 +1083,11 @@ public SystemSessionProperties( | |||
"Enable pushing of filters and projections inside common table expressions.", | |||
featuresConfig.getCteFilterAndProjectionPushdownEnabled(), | |||
false), | |||
integerProperty( | |||
CTE_HEURISTIC_REPLICATION_THRESHOLD, | |||
"Used with CTE Materialization Strategy = Heuristic. CTES are only materialized if they are used greater than or equal to this number", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: perhaps change comment to "if they are used more than this number of times"
...to-main/src/main/java/com/facebook/presto/sql/planner/optimizations/LogicalCteOptimizer.java
Outdated
Show resolved
Hide resolved
public CteConsumerTransformer(PlanNodeIdAllocator idAllocator, VariableAllocator variableAllocator) | ||
private final Session session; | ||
|
||
private static final List<Class<? extends PlanNode>> PRECOMPUTE_PLAN_NODES = ImmutableList.of(JoinNode.class, SemiJoinNode.class, AggregationNode.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we call these two COMPLEX_PLAN_NODES and COMPLEX_DATA_SOURCE_NODES or something like that to better correlate with the session property name?
Please make sure failing tests are not related |
f91f0b9
to
df68261
Compare
@mlyublena @pranjalssh Thank you for the review. However I found a test case added here wherein, the outer CTE would be materialized even if it wasn't complex if the inner CTE was complex,
|
@@ -364,7 +364,9 @@ public boolean isAdoptingMergedPreference() | |||
public enum CteMaterializationStrategy | |||
{ | |||
ALL, // Materialize all CTES | |||
NONE // Materialize no ctes | |||
NONE, // Materialize no CTES | |||
HEURISTIC, // Materialze CTES occuring > CTE_HEURISTIC_REPLICATION_THRESHOLD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HEURISTIC, // Materialze CTES occuring > CTE_HEURISTIC_REPLICATION_THRESHOLD | |
HEURISTIC, // Materialize CTES occuring > CTE_HEURISTIC_REPLICATION_THRESHOLD |
Summary: Add 2 materialization strategies Heuristic and Heuristic_Strict. In Heuristic Strategy the Ctes will only be materialized if they occured > x times. In Heuristic_strict strategy the CTE need to have a join or aggregate
Codenotify: Notifying subscribers in CODENOTIFY files for diff 9dfc461...09db46e. No notifications. |
@pranjalssh Need a review again due to the comment change |
This PR adds support Heuristic CTE materialization.
We have rolled this in one of our production cluster and confirmed that this is stable.
Our current testing has shown upto 50% cpu gain and > 50% less hdfs read for affected queries.
Its heuristic so there may be cases where non-ideal plans are generated.
Changed cteName member variable to cteId for cte plan nodes.
Also fixed CteInformation to include cteId so that correct references are incremented which crucial for cte materialization.
In heuristic strategy, the relational planner adds cteReferences everywhere and the LogicalCteOptimizer decides which references to implement
Fixes #21637
Motivation and Context
Impact
Test Plan
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.