-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial phase of support for constraint optimization to the PrestoDB optimizer in response to issue #16413 #16416
Conversation
a0a6c8f
to
299c4ae
Compare
@simmend Just browsing the PR and noticed two small things before reading other parts:
|
@yingsu00 : This change does't introduce any syntax changes to create table. It uses the already existing constraints defined for the Hive tables from Hive client interfaces. |
This change touches 92 files. I know it's a bit annoying to change after the fact, but it would make it much easier to review if you split it into multiple commits (can all still be in this PR). Especially splitting out parts that are refactoring/threading some additional properties through vs. core logic. If there are logically distinct pieces of the core changes, that can be helpful to split out too. Haven't gone through the design doc yet. |
ddffe44
to
738e2f5
Compare
738e2f5
to
4e0ea30
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial high level comments
presto-main/src/main/java/com/facebook/presto/sql/planner/iterative/Key.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/sql/planner/iterative/KeyProperty.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/sql/planner/iterative/MaxCardProperty.java
Outdated
Show resolved
Hide resolved
...in/java/com/facebook/presto/sql/planner/iterative/rule/RemoveRedundantAggregateDistinct.java
Show resolved
Hide resolved
4e0ea30
to
4b09d29
Compare
@simmend Gently reminder, this is a nice feature! |
Tables with constraints must be defined directly via Hive 3.0. Please see section 2 of the design doc. That is, this feature does not introduce CREATE/ALTER DDL changes to Presto. Those changes will require thinking about how to abstract constraint definitions across various federated sources and are deferred to future work. Please see section 5.4 of the design document. Will look into the release notes guidelines. Thank you for pointing that out. |
Yes it is. I cannot wait until it merges. There are some powerful extensions and optimization that can build off of this work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General comments: Please read the https://chris.beams.io/posts/git-commit/ for commit title and commit message guidelines.
...e-metastore/src/main/java/com/facebook/presto/hive/metastore/thrift/ThriftHiveMetastore.java
Outdated
Show resolved
Hide resolved
...e-metastore/src/main/java/com/facebook/presto/hive/metastore/thrift/ThriftHiveMetastore.java
Outdated
Show resolved
Hide resolved
...e-metastore/src/main/java/com/facebook/presto/hive/metastore/thrift/ThriftHiveMetastore.java
Outdated
Show resolved
Hide resolved
...e-metastore/src/main/java/com/facebook/presto/hive/metastore/thrift/ThriftHiveMetastore.java
Outdated
Show resolved
Hide resolved
...e-metastore/src/main/java/com/facebook/presto/hive/metastore/thrift/ThriftHiveMetastore.java
Outdated
Show resolved
Hide resolved
presto-spi/src/main/java/com/facebook/presto/spi/PrimaryKeyConstraint.java
Outdated
Show resolved
Hide resolved
presto-spi/src/main/java/com/facebook/presto/spi/UniqueConstraint.java
Outdated
Show resolved
Hide resolved
presto-spi/src/main/java/com/facebook/presto/spi/UniqueConstraint.java
Outdated
Show resolved
Hide resolved
presto-spi/src/main/java/com/facebook/presto/spi/plan/TableScanNode.java
Outdated
Show resolved
Hide resolved
presto-spi/src/main/java/com/facebook/presto/spi/plan/TableScanNode.java
Outdated
Show resolved
Hide resolved
6e3311b
to
7443301
Compare
9a1112b
to
2e613d0
Compare
Retrieved table constraints are associated with TableConnectorMetadata and subsequently made available to the optimizer via a TableScanNode argument. Subsequent commits will take advantage of these constraints by mapping them into logical properties that can be exploited by optimization rules. Note that if the session variable exploit_constraints=false (the default now), no attempt is even made made to read constraints from HMS.
Logical properties are initially derived from constraints defined for base tables and from properties of values nodes.These logical properties hold for the result table produced by a plan node. These base logical properties are then propagated through various query operations including filters, projects, joins, and aggregations. Logical properties are only computed by iterative planners that pass a logical property provider as input. See the design doc linked from issue 16413 for futher details. Such optimizers will be introduced by next commit; however, there are test cases in this commit that trigger logical property propgation. Note that if the session variable exploit_constraints=false (the default now) no attempt is made to compute logical properties and hence optimization rules that seek them out will simply fail to fire.
d5b85bf
to
081ab07
Compare
Implements iterative optimizers that look to exploit logical properties propagated as per the previous commit. Note that if the session variable exploit_constraints=false (the default now) no attempt is made to compute logical properties and the optimization rules commited here will not fire.
081ab07
to
7bce0d4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For 2nd commit.
return otherEquivalenceClassProperty.equivalenceClasses.entrySet() | ||
.stream() | ||
.allMatch(e -> { | ||
final Set<RowExpression> otherEqClass = new HashSet<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nits: final keyword is not necessary. Also, why not using immutable set here as well?
extractConjuncts(predicate).stream() | ||
.filter(CallExpression.class::isInstance) | ||
.map(CallExpression.class::cast) | ||
.filter(e -> isVariableEqualVariableOrConstant(e)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nits: filter(EquivalenceClassProperty::isVariableEqualVariableOrConstant)
if (head1 instanceof ConstantExpression) { | ||
return head1; | ||
} | ||
else if (head2 instanceof ConstantExpression) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nits: else
is not needed.
{ | ||
requireNonNull(equivalenceClassProperty, "Equivalence class property must be provided."); | ||
Set<VariableReferenceExpression> unBoundVariables = new HashSet<>(); | ||
variables.stream().forEach(v -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nits: variables.forEach
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it's not clear to me why you use forEach
at some places but for(.. : ..)
at others. I don't have strong opinions. Generally speaking, if the logic is simple, use forEeach
, otherwise use for loop. So here you might want to use for loop.
*/ | ||
private void addNonRedundantKey(Key newKey) | ||
{ | ||
requireNonNull(newKey, "newKey is null"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this null check is unnecessary. it's a private method.
...rc/main/java/com/facebook/presto/sql/planner/iterative/properties/LogicalPropertiesImpl.java
Show resolved
Hide resolved
...rc/main/java/com/facebook/presto/sql/planner/iterative/properties/LogicalPropertiesImpl.java
Show resolved
Hide resolved
private KeyProperty keyProperty = new KeyProperty(); | ||
private EquivalenceClassProperty equivalenceClassProperty; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
equivalenceClassProperty is updated in class methods and cannot be final.
* This logical properties builder should be used by PlanNode's that propagate their | ||
* source properties and add a limit. For example, TopNNode and LimitNode. | ||
*/ | ||
public static class PropagateAndLimitBuilder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of these Builder classes are actual builders. They don't need to exist. Just have static creator methods like public static LogicalPropertiesImpl propagateAndLimitProperties(...)
, public static tableScanProperties(...)
, etc.
@Override | ||
public LogicalProperties computeLogicalProperties(LogicalPropertiesProvider logicalPropertiesProvider) | ||
{ | ||
requireNonNull(logicalPropertiesProvider, "logicalPropertiesProvider cannot be null."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nits: we typically don't check nulls in methods.
node.getSource(), | ||
node.getAggregations().entrySet().stream().collect(Collectors.toMap(e -> e.getKey(), e -> | ||
(e.getValue().isDistinct() && | ||
((GroupReference) node.getSource()).getLogicalProperties().get().isDistinct( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you have to run this same logic again in apply
, might as well use a simpler pattern, and check for match once here.
|
||
//already in same equivalence class, nothing to do | ||
//note that we do not check head1.equal(head2) so that two different variable reference objects | ||
//referencing the same reference are both added to the equivalence class |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@simmend can you explain why we want this to be true? (and it seems like when we use it, it's always as part of a set, or inside a map, so the distinction isn't preserved).
Pasting for @rschlussel |
This feature adds an initial phase of support for constraint optimization to the PrestoDB optimizer in response to issue #16413. In particular, it adds support for constraint optimizations involving primary and unique key constraints. This support manifests as a general-purpose optimization framework for associating logical properties with the result table produced by a plan node that optimization rules then exploit to generate more efficient plans. These logical properties initially derive from unique key constraints defined on database tables. They are further augmented and refined by the grouping, limiting, predicate application, and other operations performed by plan nodes. The feature adds several new optimization rules that exploit logical properties to discover and remove redundant query operations. Future work will extend this framework with constraint optimizations involving logical properties derived from referential integrity constraints, functional dependencies, order dependencies, and other types of constraints. This work exploits Hive 3.1.2 catalog capabilities that allow for the definition of informational constraints on Hive tables. However, this feature can extend to any data source that provides enforced or informational table constraints. Please see the design and test strategy documents attached issue #16413 for additional details.
Please find the detailed design document here:
https://docs.google.com/document/d/1h9C7dJck2PFPtvhksUCB74082zIn9sAZG1jNlQ7kqaU/edit
Test plan - (Please fill in how you tested your changes)
Please see the test plan here: https://docs.google.com/document/d/19SpdkE6z4Q_hT6BIox9zR5DvBbwUW_RlSZty862emn4/edit
Please make sure your submission complies with our Development, Formatting, and Commit Message guidelines. Don't forget to follow our attribution guidelines for any code copied from other projects.
Fill in the release notes towards the bottom of the PR description.
See Release Notes Guidelines for details.