Initial phase of support for constraint optimization to the PrestoDB optimizer in response to issue #16413 #16416

simmend · 2021-07-13T18:29:28Z

This feature adds an initial phase of support for constraint optimization to the PrestoDB optimizer in response to issue #16413. In particular, it adds support for constraint optimizations involving primary and unique key constraints. This support manifests as a general-purpose optimization framework for associating logical properties with the result table produced by a plan node that optimization rules then exploit to generate more efficient plans. These logical properties initially derive from unique key constraints defined on database tables. They are further augmented and refined by the grouping, limiting, predicate application, and other operations performed by plan nodes. The feature adds several new optimization rules that exploit logical properties to discover and remove redundant query operations. Future work will extend this framework with constraint optimizations involving logical properties derived from referential integrity constraints, functional dependencies, order dependencies, and other types of constraints. This work exploits Hive 3.1.2 catalog capabilities that allow for the definition of informational constraints on Hive tables. However, this feature can extend to any data source that provides enforced or informational table constraints. Please see the design and test strategy documents attached issue #16413 for additional details.

Please find the detailed design document here:
https://docs.google.com/document/d/1h9C7dJck2PFPtvhksUCB74082zIn9sAZG1jNlQ7kqaU/edit

Test plan - (Please fill in how you tested your changes)
Please see the test plan here: https://docs.google.com/document/d/19SpdkE6z4Q_hT6BIox9zR5DvBbwUW_RlSZty862emn4/edit

Please make sure your submission complies with our Development, Formatting, and Commit Message guidelines. Don't forget to follow our attribution guidelines for any code copied from other projects.

Fill in the release notes towards the bottom of the PR description.
See Release Notes Guidelines for details.

== RELEASE NOTES ==

General Changes
* Add support for constraint optimizations in Hive MetaStore. This feature can be turned on by setting the session property "exploit_constraints" and config property "optimizer.exploit-constraints" to true.

yingsu00 · 2021-07-15T01:58:27Z

@simmend Just browsing the PR and noticed two small things before reading other parts:

It seems the documentation change is missing. This feature introduces new language constructs and users need to know about it from Presto documentation. For example, you may want to add an example how to create table with primary key in https://prestodb.io/docs/current/sql/create-table.html
Release notes section also needs to be updated. The guidelines can be found https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines

nmahadevuni · 2021-07-15T08:34:56Z

@simmend Just browsing the PR and noticed two small things before reading other parts:

It seems the documentation change is missing. This feature introduces new language constructs and users need to know about it from Presto documentation. For example, you may want to add an example how to create table with primary key in https://prestodb.io/docs/current/sql/create-table.html

Release notes section also needs to be updated. The guidelines can be found https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines

@yingsu00 : This change does't introduce any syntax changes to create table. It uses the already existing constraints defined for the Hive tables from Hive client interfaces.

rschlussel · 2021-07-15T19:26:42Z

This change touches 92 files. I know it's a bit annoying to change after the fact, but it would make it much easier to review if you split it into multiple commits (can all still be in this PR). Especially splitting out parts that are refactoring/threading some additional properties through vs. core logic. If there are logically distinct pieces of the core changes, that can be helpful to split out too.

Haven't gone through the design doc yet.

kaikalur

Initial high level comments

presto-main/src/main/java/com/facebook/presto/sql/planner/iterative/Key.java

presto-main/src/main/java/com/facebook/presto/sql/planner/iterative/KeyProperty.java

presto-main/src/main/java/com/facebook/presto/sql/planner/iterative/MaxCardProperty.java

...in/java/com/facebook/presto/sql/planner/iterative/rule/RemoveRedundantAggregateDistinct.java

rubenssoto · 2021-11-12T23:23:04Z

@simmend Gently reminder, this is a nice feature!

simmend · 2021-11-22T23:03:44Z

@simmend Just browsing the PR and noticed two small things before reading other parts:

It seems the documentation change is missing. This feature introduces new language constructs and users need to know about it from Presto documentation. For example, you may want to add an example how to create table with primary key in https://prestodb.io/docs/current/sql/create-table.html

Release notes section also needs to be updated. The guidelines can be found https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines

Tables with constraints must be defined directly via Hive 3.0. Please see section 2 of the design doc. That is, this feature does not introduce CREATE/ALTER DDL changes to Presto. Those changes will require thinking about how to abstract constraint definitions across various federated sources and are deferred to future work. Please see section 5.4 of the design document. Will look into the release notes guidelines. Thank you for pointing that out.

simmend · 2021-11-22T23:11:15Z

@simmend Gently reminder, this is a nice feature!

Yes it is. I cannot wait until it merges. There are some powerful extensions and optimization that can build off of this work.

rongrong

General comments: Please read the https://chris.beams.io/posts/git-commit/ for commit title and commit message guidelines.

...e-metastore/src/main/java/com/facebook/presto/hive/metastore/thrift/ThriftHiveMetastore.java

presto-spi/src/main/java/com/facebook/presto/spi/PrimaryKeyConstraint.java

presto-spi/src/main/java/com/facebook/presto/spi/UniqueConstraint.java

presto-spi/src/main/java/com/facebook/presto/spi/plan/TableScanNode.java

Retrieved table constraints are associated with TableConnectorMetadata and subsequently made available to the optimizer via a TableScanNode argument. Subsequent commits will take advantage of these constraints by mapping them into logical properties that can be exploited by optimization rules. Note that if the session variable exploit_constraints=false (the default now), no attempt is even made made to read constraints from HMS.

Logical properties are initially derived from constraints defined for base tables and from properties of values nodes.These logical properties hold for the result table produced by a plan node. These base logical properties are then propagated through various query operations including filters, projects, joins, and aggregations. Logical properties are only computed by iterative planners that pass a logical property provider as input. See the design doc linked from issue 16413 for futher details. Such optimizers will be introduced by next commit; however, there are test cases in this commit that trigger logical property propgation. Note that if the session variable exploit_constraints=false (the default now) no attempt is made to compute logical properties and hence optimization rules that seek them out will simply fail to fire.

Implements iterative optimizers that look to exploit logical properties propagated as per the previous commit. Note that if the session variable exploit_constraints=false (the default now) no attempt is made to compute logical properties and the optimization rules commited here will not fire.

rongrong

For 2nd commit.

rongrong · 2022-06-08T22:23:19Z

...main/java/com/facebook/presto/sql/planner/iterative/properties/EquivalenceClassProperty.java

+        return otherEquivalenceClassProperty.equivalenceClasses.entrySet()
+                .stream()
+                .allMatch(e -> {
+                    final Set<RowExpression> otherEqClass = new HashSet<>();


nits: final keyword is not necessary. Also, why not using immutable set here as well?

rongrong · 2022-06-08T22:24:40Z

...main/java/com/facebook/presto/sql/planner/iterative/properties/EquivalenceClassProperty.java

+        extractConjuncts(predicate).stream()
+                .filter(CallExpression.class::isInstance)
+                .map(CallExpression.class::cast)
+                .filter(e -> isVariableEqualVariableOrConstant(e))


nits: filter(EquivalenceClassProperty::isVariableEqualVariableOrConstant)

rongrong · 2022-06-08T22:28:25Z

...main/java/com/facebook/presto/sql/planner/iterative/properties/EquivalenceClassProperty.java

+        if (head1 instanceof ConstantExpression) {
+            return head1;
+        }
+        else if (head2 instanceof ConstantExpression) {


nits: else is not needed.

rongrong · 2022-06-08T22:33:02Z

presto-main/src/main/java/com/facebook/presto/sql/planner/iterative/properties/Key.java

+    {
+        requireNonNull(equivalenceClassProperty, "Equivalence class property must be provided.");
+        Set<VariableReferenceExpression> unBoundVariables = new HashSet<>();
+        variables.stream().forEach(v -> {


nits: variables.forEach.

Also, it's not clear to me why you use forEach at some places but for(.. : ..) at others. I don't have strong opinions. Generally speaking, if the logic is simple, use forEeach, otherwise use for loop. So here you might want to use for loop.

rongrong · 2022-06-08T22:37:09Z

presto-main/src/main/java/com/facebook/presto/sql/planner/iterative/properties/KeyProperty.java

+     */
+    private void addNonRedundantKey(Key newKey)
+    {
+        requireNonNull(newKey, "newKey is null");


this null check is unnecessary. it's a private method.

...rc/main/java/com/facebook/presto/sql/planner/iterative/properties/LogicalPropertiesImpl.java

rongrong · 2022-06-08T22:58:56Z

...rc/main/java/com/facebook/presto/sql/planner/iterative/properties/LogicalPropertiesImpl.java

+        private KeyProperty keyProperty = new KeyProperty();
+        private EquivalenceClassProperty equivalenceClassProperty;


equivalenceClassProperty is updated in class methods and cannot be final.

rongrong · 2022-06-08T23:00:46Z

...rc/main/java/com/facebook/presto/sql/planner/iterative/properties/LogicalPropertiesImpl.java

+     * This logical properties builder should be used by PlanNode's that propagate their
+     * source properties and add a limit. For example, TopNNode and LimitNode.
+     */
+    public static class PropagateAndLimitBuilder


None of these Builder classes are actual builders. They don't need to exist. Just have static creator methods like public static LogicalPropertiesImpl propagateAndLimitProperties(...), public static tableScanProperties(...), etc.

rongrong · 2022-06-08T23:03:05Z

presto-spi/src/main/java/com/facebook/presto/spi/plan/AggregationNode.java

+    @Override
+    public LogicalProperties computeLogicalProperties(LogicalPropertiesProvider logicalPropertiesProvider)
+    {
+        requireNonNull(logicalPropertiesProvider, "logicalPropertiesProvider cannot be null.");


nits: we typically don't check nulls in methods.

rongrong · 2022-06-09T01:57:03Z

...in/java/com/facebook/presto/sql/planner/iterative/rule/RemoveRedundantAggregateDistinct.java

+                node.getSource(),
+                node.getAggregations().entrySet().stream().collect(Collectors.toMap(e -> e.getKey(), e ->
+                        (e.getValue().isDistinct() &&
+                                ((GroupReference) node.getSource()).getLogicalProperties().get().isDistinct(


Since you have to run this same logic again in apply, might as well use a simpler pattern, and check for match once here.

yingsu00 · 2022-06-13T21:51:10Z

@rongrong Thank you for reviewing again. I'll address your comments in a later PR. Issue created here:#17869
@kaikalur Sreeni, thank you for reviewing. Did Dave's explanation make sense to you? If yes would you please update your review?

rschlussel · 2022-08-09T15:53:24Z

...main/java/com/facebook/presto/sql/planner/iterative/properties/EquivalenceClassProperty.java

+
+        //already in same equivalence class, nothing to do
+        //note that we do not check head1.equal(head2) so that two different variable reference objects
+        //referencing the same reference are both added to the equivalence class


@simmend can you explain why we want this to be true? (and it seems like when we use it, it's always as part of a set, or inside a map, so the distinction isn't preserved).

yingsu00 · 2022-08-12T00:33:10Z

Pasting for @rschlussel
"I think the code also should be refactored to change all the property classes to be immutable. There isn’t really a good reason for them to be constructed and then updated. Most actually start with an empty property and then update once to some source properties. There are a few that combine property classes, but it would be better to have a way to combine that constructs a new combined instance rather than the mutable objects we currently have. They never need to be mutated after they are fully constructed by the logical properties builders, so we should make them immutable."

yingsu00 changed the title ~~This feature adds an initial phase of support for constraint optimization to the PrestoDB optimizer in response to issue #16413~~ WIP - initial phase of support for constraint optimization to the PrestoDB optimizer in response to issue #16413 Jul 13, 2021

simmend force-pushed the constraintOptimization branch from a0a6c8f to 299c4ae Compare July 14, 2021 01:43

simmend requested review from kaikalur and rongrong July 14, 2021 02:50

simmend marked this pull request as ready for review July 14, 2021 02:50

simmend requested a review from rschlussel July 14, 2021 02:53

simmend changed the title ~~WIP - initial phase of support for constraint optimization to the PrestoDB optimizer in response to issue #16413~~ Initial phase of support for constraint optimization to the PrestoDB optimizer in response to issue #16413 Jul 14, 2021

yingsu00 mentioned this pull request Jul 14, 2021

Constraint Support and Optimizations #16413

Closed

yingsu00 linked an issue Jul 14, 2021 that may be closed by this pull request

Constraint Support and Optimizations #16413

Closed

yingsu00 self-requested a review July 15, 2021 01:53

simmend force-pushed the constraintOptimization branch 2 times, most recently from ddffe44 to 738e2f5 Compare July 26, 2021 22:28

simmend force-pushed the constraintOptimization branch from 738e2f5 to 4e0ea30 Compare August 13, 2021 20:00

kaikalur reviewed Aug 18, 2021

View reviewed changes

simmend force-pushed the constraintOptimization branch from 4e0ea30 to 4b09d29 Compare October 6, 2021 16:44

rongrong reviewed Nov 23, 2021

View reviewed changes

simmend force-pushed the constraintOptimization branch 5 times, most recently from 6e3311b to 7443301 Compare December 8, 2021 23:22

simmend force-pushed the constraintOptimization branch 10 times, most recently from 9a1112b to 2e613d0 Compare May 20, 2022 16:28

simmend added 2 commits June 6, 2022 17:34

simmend force-pushed the constraintOptimization branch 2 times, most recently from d5b85bf to 081ab07 Compare June 6, 2022 23:33

simmend force-pushed the constraintOptimization branch from 081ab07 to 7bce0d4 Compare June 7, 2022 12:11

rongrong reviewed Jun 8, 2022

View reviewed changes

rongrong reviewed Jun 9, 2022

View reviewed changes

yingsu00 mentioned this pull request Jun 13, 2022

Address additional comments for PR 16416 #17869

Closed

rongrong approved these changes Jun 13, 2022

View reviewed changes

rongrong requested a review from kaikalur June 13, 2022 22:25

kaikalur approved these changes Jun 13, 2022

View reviewed changes

rongrong merged commit 5c3ac4c into prestodb:master Jun 13, 2022

highker mentioned this pull request Jul 6, 2022

Add release notes for 0.274 #17987

Closed

7 tasks

rschlussel reviewed Aug 9, 2022

View reviewed changes

nmahadevuni mentioned this pull request Aug 30, 2022

Address review comments for constraints optimization #18169

Merged

simmend mentioned this pull request Oct 20, 2022

Remove redundant distinct over group by #18512

Merged

kaikalur mentioned this pull request Oct 21, 2022

Reconncile/integrate logicalproperties and planproperties #18547

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial phase of support for constraint optimization to the PrestoDB optimizer in response to issue #16413 #16416

Initial phase of support for constraint optimization to the PrestoDB optimizer in response to issue #16413 #16416

simmend commented Jul 13, 2021 •

edited by yingsu00

Loading

yingsu00 commented Jul 15, 2021

nmahadevuni commented Jul 15, 2021

rschlussel commented Jul 15, 2021

kaikalur left a comment

rubenssoto commented Nov 12, 2021

simmend commented Nov 22, 2021 •

edited

Loading

simmend commented Nov 22, 2021

rongrong left a comment

rongrong left a comment

rongrong Jun 8, 2022

rongrong Jun 8, 2022

rongrong Jun 8, 2022

rongrong Jun 8, 2022

rongrong Jun 8, 2022

rongrong Jun 8, 2022

rongrong Jun 8, 2022

yingsu00 Jul 8, 2022

rongrong Jun 8, 2022

rongrong Jun 8, 2022

rongrong Jun 9, 2022

yingsu00 commented Jun 13, 2022

rschlussel Aug 9, 2022

yingsu00 commented Aug 12, 2022

		private KeyProperty keyProperty = new KeyProperty();
		private EquivalenceClassProperty equivalenceClassProperty;

Initial phase of support for constraint optimization to the PrestoDB optimizer in response to issue #16413 #16416

Initial phase of support for constraint optimization to the PrestoDB optimizer in response to issue #16413 #16416

Conversation

simmend commented Jul 13, 2021 • edited by yingsu00 Loading

yingsu00 commented Jul 15, 2021

nmahadevuni commented Jul 15, 2021

rschlussel commented Jul 15, 2021

kaikalur left a comment

Choose a reason for hiding this comment

rubenssoto commented Nov 12, 2021

simmend commented Nov 22, 2021 • edited Loading

simmend commented Nov 22, 2021

rongrong left a comment

Choose a reason for hiding this comment

rongrong left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yingsu00 commented Jun 13, 2022

Choose a reason for hiding this comment

yingsu00 commented Aug 12, 2022

simmend commented Jul 13, 2021 •

edited by yingsu00

Loading

simmend commented Nov 22, 2021 •

edited

Loading