-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add auto-tuning of max executor and hash partition #18716
Add auto-tuning of max executor and hash partition #18716
Conversation
@bot kick off tests |
1 similar comment
@bot kick off tests |
...park-base/src/main/java/com/facebook/presto/spark/PrestoSparkPhysicalResourceCalculator.java
Outdated
Show resolved
Hide resolved
...park-base/src/main/java/com/facebook/presto/spark/PrestoSparkPhysicalResourceCalculator.java
Outdated
Show resolved
Hide resolved
...-base/src/main/java/com/facebook/presto/spark/execution/PrestoSparkStaticQueryExecution.java
Outdated
Show resolved
Hide resolved
8c076fd
to
d88f57f
Compare
@vermapratyush Can you please sign the CLA |
@vermapratyush Please squash the commits as well. |
15587ef
to
3197e4a
Compare
@ajaygeorge Already signed the CLA, but somehow it still keeps failing. |
Can you try signing the individual contributor CLA as well if not already. |
1ca1e32
to
db890d3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments
presto-spark-base/src/main/java/com/facebook/presto/spark/PhysicalResourceSettingsHolder.java
Outdated
Show resolved
Hide resolved
presto-spark-base/src/main/java/com/facebook/presto/spark/PhysicalResourceSettingsHolder.java
Outdated
Show resolved
Hide resolved
presto-spark-base/src/main/java/com/facebook/presto/spark/PhysicalResourceSettingsHolder.java
Outdated
Show resolved
Hide resolved
presto-spark-base/src/main/java/com/facebook/presto/spark/PrestoSparkConfig.java
Outdated
Show resolved
Hide resolved
...park-base/src/main/java/com/facebook/presto/spark/PrestoSparkPhysicalResourceCalculator.java
Outdated
Show resolved
Hide resolved
...park-base/src/main/java/com/facebook/presto/spark/PrestoSparkPhysicalResourceCalculator.java
Outdated
Show resolved
Hide resolved
e410779
to
eb5f715
Compare
Unrelated failure in module |
...park-base/src/main/java/com/facebook/presto/spark/PrestoSparkPhysicalResourceCalculator.java
Outdated
Show resolved
Hide resolved
eb5f715
to
45c214d
Compare
presto-spark-base/src/main/java/com/facebook/presto/spark/PhysicalResourceSettings.java
Outdated
Show resolved
Hide resolved
presto-spark-base/src/main/java/com/facebook/presto/spark/PrestoSparkConfig.java
Outdated
Show resolved
Hide resolved
|
||
double inputDataInBytes = new PrestoSparkSourceStatsCollector(metaData, session).collectSourceStats(plan); | ||
if (!anyAllocationStrategyEnabled(session)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do the check for the main property here: isSparkResourceAllocationStrategyEnabled(session)
and if its disabled, we return back. This is basically a single switch to disable allocation strategy completely.
Basically, my thinking is that spark_resource_allocation_strategy_enabled
control this whole logic. If this is disabled, we skip this logic completely. If spark_resource_allocation_strategy_enabled
is true, then only we check for individual strategy properties and trigger the one that is enabled.
I know this is bit weird. Ideally, we should just remove spark_resource_allocation_strategy_enabled
session property to make this logic cleaner
@highker : What is the right approach of deprecating/removing a property?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the right approach of deprecating/removing a property?
- We use
@DefunctConfig
annotation in a config class. Usually, we also rename the property todeprecated.XXX
. CheckFeatureConfig
for examples. We don't usually delete the property directly. - We have release note to tell users what config has the deprecated one migrated to
return defaultResourceSettings; | ||
} | ||
// update hashPartitionCount only if resource allocation or hash partition allocation is enabled | ||
if (isSparkResourceAllocationStrategyEnabled(session) || isSparkHashPartitionCountAllocationStrategyEnabled(session)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check for isSparkResourceAllocationStrategyEnabled(session)
can be removed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will deprecate this config in subsequent PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make it an AND condition? That will give us the ability to control individual optimizations separately. Else if isSparkResourceAllocationStrategyEnabled(session)
is true, both optimizations will run regardless.
log.warn(String.format("Failed to retrieve correct size, data read=%.2f, skipping automatic resource tuning.", inputDataInBytes)); | ||
return DISABLED_PHYSICAL_RESOURCE_SETTING; | ||
// update maxExecutorCount only if resource allocation or executor allocation is enabled | ||
if (isSparkResourceAllocationStrategyEnabled(session) || isSparkExecutorAllocationStrategyEnabled(session)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will deprecate this config in subsequent PR.
presto-spark-base/src/main/java/com/facebook/presto/spark/planner/PrestoSparkQueryPlanner.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM % a few nits.
presto-spark-base/src/main/java/com/facebook/presto/spark/PhysicalResourceSettings.java
Outdated
Show resolved
Hide resolved
presto-spark-base/src/main/java/com/facebook/presto/spark/PhysicalResourceSettings.java
Outdated
Show resolved
Hide resolved
presto-spark-base/src/main/java/com/facebook/presto/spark/PrestoSparkConfig.java
Outdated
Show resolved
Hide resolved
8fa0604
to
d1de7d5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just 1 final comment.
return defaultResourceSettings; | ||
} | ||
// update hashPartitionCount only if resource allocation or hash partition allocation is enabled | ||
if (isSparkResourceAllocationStrategyEnabled(session) || isSparkHashPartitionCountAllocationStrategyEnabled(session)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make it an AND condition? That will give us the ability to control individual optimizations separately. Else if isSparkResourceAllocationStrategyEnabled(session)
is true, both optimizations will run regardless.
@pgupta2 I don't want to couple isSparkResourceAllocationStrategyEnabled and isSparkHashPartitionCountAllocationStrategyEnabled together since isSparkResourceAllocationStrategyEnabled will be going away in subsequent PR anyway. |
d1de7d5
to
fd34c8c
Compare
presto-spark-base/src/main/java/com/facebook/presto/spark/PhysicalResourceSettings.java
Outdated
Show resolved
Hide resolved
...ava/com/facebook/presto/spark/planner/TestPrestoSparkPhysicalResourceAllocationStrategy.java
Outdated
Show resolved
Hide resolved
...ava/com/facebook/presto/spark/planner/TestPrestoSparkPhysicalResourceAllocationStrategy.java
Outdated
Show resolved
Hide resolved
4df8b74
to
6a7b385
Compare
6a7b385
to
3ce7856
Compare
Test Plan
Added unit test to the classes modified.
Ran presto server on local and submitted queries on "customers" table to confirm correct hash_partition_count is used.
Summary
Added 2 configuration options (
spark_executor_allocation_strategy_enabled
,spark_hash_partition_count_allocation_strategy_enabled
) to auto-tune,spark.dynamicAllocation.maxExecutors
andhash_partition_count
based on the input table stats, respectively.The auto tune configuration option take precedence over explicitly provided values for
spark.dynamicAllocation.maxExecutors
andhash_partition_count
.If
spark_resource_allocation_strategy_enabled
is enabled, both executor and hash partition are considered to be auto-tune enabled. This configuration option could be removed in subsequent revision, as it is redundant.