-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a new plan canonicalization strategy for history based optimizer #21832
Conversation
6cfd80a
to
a78c497
Compare
a78c497
to
33e71ee
Compare
@@ -290,7 +290,7 @@ private TupleDomain<ColumnHandle> getConstraint(PlanCanonicalizationStrategy can | |||
.transform(ColumnHandle.class::cast) | |||
.intersect(constraint); | |||
|
|||
constraint = constraint.canonicalize(HiveTableLayoutHandle::isPartitionKey); | |||
constraint = canonicalizationStrategy.equals(PlanCanonicalizationStrategy.REMOVE_SCAN_CONSTANTS) ? constraint.canonicalize(x -> true) : constraint.canonicalize(HiveTableLayoutHandle::isPartitionKey); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the strategy is REMOVE_SCAN_CONSTANTS, always canonicalize even when not a partitioned key
List<PlanCanonicalizationStrategy> strategyList; | ||
try { | ||
strategyList = Splitter.on(",").trimResults().splitToList(session.getSystemProperty(HISTORY_OPTIMIZATION_PLAN_CANONICALIZE_STRATEGY, String.class)).stream() | ||
.map(x -> PlanCanonicalizationStrategy.valueOf(x)).sorted().collect(toImmutableList()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call sorted here, so that the returned strategies are in order from more strict to less strict but more coverage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit hard to manage.
I think we can remove CONNECTOR strategy(remove safe constants is pretty safe). We can then just use the session property to specify an integer(danger level) - which uses all strategies below that level. Is that sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a second look at it, I think it is easier to manage, but less flexible. For example, now we have ignore_safe_constant
as danger level 1, and ignore_scan_constant
as danger level 2, if people want to add a new strategy which is between ignore_safe_constant
and ignore_scan_constant
, we may need to re-assign the level of ignore_scan_constant
to 3. This can be problematic as users are now setting the danger level in session property, and it will change behavior of existing settings.
Also it does not allow user to specify the specific strategies to use, but only able to specify a range of strategies.
In order to make it more explicit about the order of the strategies, I added a new field errorLevel
in the PlanCanonicalizationStrategy
enum.
@@ -24,11 +26,13 @@ public class PlanNodeWithHash | |||
private final PlanNode planNode; | |||
// An optional canonical hash of the corresponding plan node | |||
private final Optional<String> hash; | |||
private final PlanCanonicalizationStrategy planCanonicalizationStrategy; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add strategy to hash, since different hash strategies result with different hash
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CanonicalPlan already contains strategy - so hash contains the strategy. So we don't need it here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that the Map used in HistoryBasedPlanStatisticsProvider
is using PlanNodeWithHash as key, should we keep the strategy? https://github.com/prestodb/presto/blob/master/presto-spi/src/main/java/com/facebook/presto/spi/statistics/HistoryBasedPlanStatisticsProvider.java#L37
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline, this field is not needed and removed it, also add comment that hash is already taking the strategy into account.
e126050
to
76cb625
Compare
@@ -1528,6 +1532,14 @@ public SystemSessionProperties( | |||
"When the size difference between current table and history table exceed this threshold, do not match history statistics", | |||
0.0, | |||
true), | |||
stringProperty( | |||
HISTORY_OPTIMIZATION_PLAN_CANONICALIZE_STRATEGY, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we just call this history_optimization_strategy? the current name is a bit long and has many internal details (that we canonicalize plans) + incorrect grammar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename it to history_based_optimization_plan_canonicalization_strategy
. I think it's OK to have details on the name, after all people who use it should know what they are doing.
@@ -61,15 +56,24 @@ public enum PlanCanonicalizationStrategy | |||
* | |||
* This is used in context of history based optimizations. | |||
*/ | |||
REMOVE_SAFE_CONSTANTS; | |||
REMOVE_SAFE_CONSTANTS, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps a better name is ignore_safe_constants/ignore_scan_constants
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree this is better. Just this will include more renaming changes in the PR, which I want to avoid to keep this PR simple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this is a user-facing parameter value, we might not be able to rename it later when it's already in use
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, renamed them.
Please consider documenting any new session properties. |
a84f19c
to
5d8e28d
Compare
@@ -48,7 +43,7 @@ public enum PlanCanonicalizationStrategy | |||
*/ | |||
CONNECTOR, | |||
/** | |||
* REMOVE_SAFE_CONSTANTS strategy is used to canonicalize plan with | |||
* IGNORE_SAFE_CONSTANTS strategy is used to canonicalize plan with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GG
List<PlanCanonicalizationStrategy> strategyList; | ||
try { | ||
strategyList = Splitter.on(",").trimResults().splitToList(session.getSystemProperty(HISTORY_OPTIMIZATION_PLAN_CANONICALIZE_STRATEGY, String.class)).stream() | ||
.map(x -> PlanCanonicalizationStrategy.valueOf(x)).sorted().collect(toImmutableList()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit hard to manage.
I think we can remove CONNECTOR strategy(remove safe constants is pretty safe). We can then just use the session property to specify an integer(danger level) - which uses all strategies below that level. Is that sufficient?
5d8e28d
to
f6419b1
Compare
f6419b1
to
c96ca28
Compare
c96ca28
to
f4e9eee
Compare
f4e9eee
to
de56a33
Compare
Added |
Codenotify: Notifying subscribers in CODENOTIFY files for diff 56a0910...de56a33.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! (docs)
@feilong-liu, reviewing this while working on #22647, I noticed that the new in this PR session properties ignore_scan_constants and history_optimization_plan_canonicalize_strategy were not documented in https://github.com/prestodb/presto/blob/master/presto-docs/src/main/sphinx/admin/properties.rst. Would you be willing to open a new PR and add documentation for these two properties? |
Description
Rename
remove_safe_constants
toignore_safe_constants
.Add a new plan canonicalization strategy
ignore_scan_constants
for history based optimization. This is further relaxed compared with the currently usedignore_safe_constants
.For
ignore_safe_constants
, it will canonicalize constants in project and constants in predicates in scan node when the column is a partitioned column. Forignore_scan_constants
, it will canonicalize constants in predicates in scan node, regardless of the column is a partitioned column or not.Also add a session property
history_optimization_plan_canonicalize_strategy
to specify the canonicalization strategy to be used for HBO. It can be used to specify multiple strategies, and match the strategy with more strict criteria first.Motivation and Context
If we know that a query is from a recurring workload, we can relax the match criteria so as to increase the coverage. Also we guarantee that the more relaxed strategy will match only when the other strategy does not match.
Impact
Increase coverage for HBO, with potential loss of accuracy as a tradeoff
Test Plan
Add unit tests, also test locally end to end
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.