Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new plan canonicalization strategy for history based optimizer #21832

Merged
merged 1 commit into from
Feb 15, 2024

Conversation

feilong-liu
Copy link
Contributor

@feilong-liu feilong-liu commented Jan 31, 2024

Description

Rename remove_safe_constants to ignore_safe_constants.

Add a new plan canonicalization strategy ignore_scan_constants for history based optimization. This is further relaxed compared with the currently used ignore_safe_constants.
For ignore_safe_constants, it will canonicalize constants in project and constants in predicates in scan node when the column is a partitioned column. For ignore_scan_constants, it will canonicalize constants in predicates in scan node, regardless of the column is a partitioned column or not.

Also add a session property history_optimization_plan_canonicalize_strategy to specify the canonicalization strategy to be used for HBO. It can be used to specify multiple strategies, and match the strategy with more strict criteria first.

Motivation and Context

If we know that a query is from a recurring workload, we can relax the match criteria so as to increase the coverage. Also we guarantee that the more relaxed strategy will match only when the other strategy does not match.

Impact

Increase coverage for HBO, with potential loss of accuracy as a tradeoff

Test Plan

Add unit tests, also test locally end to end

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Add session property `history_optimization_plan_canonicalize_strategy` to specify the plan canonicalization strategies to use for HBO.
* Add a new plan canonicalization strategy `ignore_scan_constants` which canonicalize predicates for both partitioned and non-partitioned columns in scan node.

@feilong-liu feilong-liu requested review from shrinidhijoshi and a team as code owners January 31, 2024 20:49
@feilong-liu feilong-liu marked this pull request as draft January 31, 2024 20:49
@feilong-liu feilong-liu force-pushed the hbo_strategy branch 5 times, most recently from 6cfd80a to a78c497 Compare February 7, 2024 19:34
@@ -290,7 +290,7 @@ private TupleDomain<ColumnHandle> getConstraint(PlanCanonicalizationStrategy can
.transform(ColumnHandle.class::cast)
.intersect(constraint);

constraint = constraint.canonicalize(HiveTableLayoutHandle::isPartitionKey);
constraint = canonicalizationStrategy.equals(PlanCanonicalizationStrategy.REMOVE_SCAN_CONSTANTS) ? constraint.canonicalize(x -> true) : constraint.canonicalize(HiveTableLayoutHandle::isPartitionKey);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the strategy is REMOVE_SCAN_CONSTANTS, always canonicalize even when not a partitioned key

List<PlanCanonicalizationStrategy> strategyList;
try {
strategyList = Splitter.on(",").trimResults().splitToList(session.getSystemProperty(HISTORY_OPTIMIZATION_PLAN_CANONICALIZE_STRATEGY, String.class)).stream()
.map(x -> PlanCanonicalizationStrategy.valueOf(x)).sorted().collect(toImmutableList());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call sorted here, so that the returned strategies are in order from more strict to less strict but more coverage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit hard to manage.

I think we can remove CONNECTOR strategy(remove safe constants is pretty safe). We can then just use the session property to specify an integer(danger level) - which uses all strategies below that level. Is that sufficient?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a second look at it, I think it is easier to manage, but less flexible. For example, now we have ignore_safe_constant as danger level 1, and ignore_scan_constant as danger level 2, if people want to add a new strategy which is between ignore_safe_constant and ignore_scan_constant, we may need to re-assign the level of ignore_scan_constant to 3. This can be problematic as users are now setting the danger level in session property, and it will change behavior of existing settings.
Also it does not allow user to specify the specific strategies to use, but only able to specify a range of strategies.

In order to make it more explicit about the order of the strategies, I added a new field errorLevel in the PlanCanonicalizationStrategy enum.

@@ -24,11 +26,13 @@ public class PlanNodeWithHash
private final PlanNode planNode;
// An optional canonical hash of the corresponding plan node
private final Optional<String> hash;
private final PlanCanonicalizationStrategy planCanonicalizationStrategy;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add strategy to hash, since different hash strategies result with different hash

Copy link
Contributor

@pranjalssh pranjalssh Feb 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CanonicalPlan already contains strategy - so hash contains the strategy. So we don't need it here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that the Map used in HistoryBasedPlanStatisticsProvider is using PlanNodeWithHash as key, should we keep the strategy? https://github.com/prestodb/presto/blob/master/presto-spi/src/main/java/com/facebook/presto/spi/statistics/HistoryBasedPlanStatisticsProvider.java#L37

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline, this field is not needed and removed it, also add comment that hash is already taking the strategy into account.

@feilong-liu feilong-liu marked this pull request as ready for review February 9, 2024 18:35
@feilong-liu feilong-liu force-pushed the hbo_strategy branch 2 times, most recently from e126050 to 76cb625 Compare February 9, 2024 21:08
@@ -1528,6 +1532,14 @@ public SystemSessionProperties(
"When the size difference between current table and history table exceed this threshold, do not match history statistics",
0.0,
true),
stringProperty(
HISTORY_OPTIMIZATION_PLAN_CANONICALIZE_STRATEGY,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we just call this history_optimization_strategy? the current name is a bit long and has many internal details (that we canonicalize plans) + incorrect grammar

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename it to history_based_optimization_plan_canonicalization_strategy. I think it's OK to have details on the name, after all people who use it should know what they are doing.

@@ -61,15 +56,24 @@ public enum PlanCanonicalizationStrategy
*
* This is used in context of history based optimizations.
*/
REMOVE_SAFE_CONSTANTS;
REMOVE_SAFE_CONSTANTS,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps a better name is ignore_safe_constants/ignore_scan_constants

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree this is better. Just this will include more renaming changes in the PR, which I want to avoid to keep this PR simple.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is a user-facing parameter value, we might not be able to rename it later when it's already in use

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, renamed them.

@steveburnett
Copy link
Contributor

Please consider documenting any new session properties.

@feilong-liu feilong-liu force-pushed the hbo_strategy branch 4 times, most recently from a84f19c to 5d8e28d Compare February 13, 2024 07:13
@@ -48,7 +43,7 @@ public enum PlanCanonicalizationStrategy
*/
CONNECTOR,
/**
* REMOVE_SAFE_CONSTANTS strategy is used to canonicalize plan with
* IGNORE_SAFE_CONSTANTS strategy is used to canonicalize plan with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GG

List<PlanCanonicalizationStrategy> strategyList;
try {
strategyList = Splitter.on(",").trimResults().splitToList(session.getSystemProperty(HISTORY_OPTIMIZATION_PLAN_CANONICALIZE_STRATEGY, String.class)).stream()
.map(x -> PlanCanonicalizationStrategy.valueOf(x)).sorted().collect(toImmutableList());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit hard to manage.

I think we can remove CONNECTOR strategy(remove safe constants is pretty safe). We can then just use the session property to specify an integer(danger level) - which uses all strategies below that level. Is that sufficient?

@feilong-liu
Copy link
Contributor Author

Please consider documenting any new session properties.

Added

Copy link

Codenotify: Notifying subscribers in CODENOTIFY files for diff 56a0910...de56a33.

Notify File(s)
@steveburnett presto-docs/src/main/sphinx/admin/properties.rst

@feilong-liu feilong-liu merged commit 32628cd into prestodb:master Feb 15, 2024
57 checks passed
@feilong-liu feilong-liu deleted the hbo_strategy branch February 15, 2024 20:37
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

@wanglinsong wanglinsong mentioned this pull request May 1, 2024
48 tasks
@steveburnett
Copy link
Contributor

@feilong-liu, reviewing this while working on #22647, I noticed that the new in this PR session properties ignore_scan_constants and history_optimization_plan_canonicalize_strategy were not documented in https://github.com/prestodb/presto/blob/master/presto-docs/src/main/sphinx/admin/properties.rst.

Would you be willing to open a new PR and add documentation for these two properties?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants