Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate a separate file to list bootstrap properties #1517

Merged
merged 8 commits into from
Jan 31, 2025

Conversation

amahussein
Copy link
Collaborator

@amahussein amahussein commented Jan 28, 2025

Signed-off-by: Ahmed Hussein (amahussein) a@ahussein.me

Fixes #1509

  • Add a new report to generate appId-bootstrap.conf
  • Add more context to the AutoTuner results
  • Add a yaml file to initialize the list of the bootstrap entries (resources/boostrap/tuningTable.yaml [link])
  • Allow a defaultSpark value to initialize the property: this is common for spark properties
  • Bug Fixes:
    • spark.locality.wait was not processed by the AutoTuner
    • spark.rapids.sql.enabled was not added to the recommended entries
  • Behavior Change:
    • If AutoTuner cannot set a specific value, then it can inject a pattern so that a user can override it. Example "conf spark.sql.adaptive.autoBroadcastJoinThreshold=[FILL_IN_VALUE]"

This pull request includes several changes to improve the tuning configuration and profiling tools in the project. The most important changes include the addition of a new tuning table, modifications to the AutoTuner class and related classes to use a new TuningEntryTrait, and the introduction of a bootstrap report generation.

Sample Output:

Example of bootstrap file generated as rapids_4_spark_qualification_output/tuning/APPID-bootstrap.conf

--conf spark.executor.memory=32768m
--conf spark.locality.wait=0
--conf spark.rapids.filecache.enabled=true
--conf spark.rapids.memory.pinnedPool.size=4096m
--conf spark.rapids.shuffle.multiThreaded.maxBytesInFlight=4g
--conf spark.rapids.shuffle.multiThreaded.reader.threads=28
--conf spark.rapids.shuffle.multiThreaded.writer.threads=28
--conf spark.rapids.sql.batchSizeBytes=1073741824
--conf spark.rapids.sql.concurrentGpuTasks=3
--conf spark.rapids.sql.enabled=true
--conf spark.rapids.sql.format.parquet.multithreaded.combine.waitTime=1000
--conf spark.rapids.sql.multiThreadedRead.numThreads=80
--conf spark.rapids.sql.reader.multithreaded.combine.sizeBytes=10485760
--conf spark.shuffle.manager=com.nvidia.spark.rapids.spark332db.RapidsShuffleManager
--conf spark.sql.adaptive.autoBroadcastJoinThreshold=[FILL_IN_VALUE]
--conf spark.sql.adaptive.coalescePartitions.minPartitionSize=4m
--conf spark.sql.adaptive.coalescePartitions.parallelismFirst=false
--conf spark.sql.shuffle.partitions=200
--conf spark.task.resource.gpu.amount=0.0625

Updates to Tuning Configuration:

  • Added a new tuning table in tuningTable.yaml with various Spark properties for tuning and functionality.
  • Fields defined in the yaml file:
     * @param label the property name
     * @param description used to explain the importance of that property and how it is used
     * @param enabled global flag to enable/disable the tuning entry. This is used to turn off a
     *                tuning entry
     * @param level This is used to group the tuning entries (job/cluster)
     * @param category Indicates the purpose of that property for RAPIDS.
     *                 "functionality": required to enable RAPIDS
     *                 "tuning": required to tune the runtime.
     * @param bootstrapEntry When true, the property should be added to the bootstrap configuration.
     *                       Default is true.
     * @param defaultSpark The default value of the property in Spark. This is used to set the
     *                     originalValue of the property in case it is not set by the eventlog.
    

Refactoring for Tuning Entry (formerly: RecommendationEntry):

  • Updated imports in Profiler.scala to include TuningEntryTrait.
  • Changed the return type of runAutoTuner and related methods in Profiler.scala to use TuningEntryTrait instead of RecommendedPropertyResult. [1] [2]
  • Removed RecommendationEntry class and replaced its usage with TuningEntryTrait in AutoTuner.scala. [1] [2] [3] [4] [5] [6] [7] [8] [9]

New Feature for Bootstrap Report:

  • Introduced BootstrapReport class to generate reports containing required and tuned configurations.
  • Added logic to QualificationAutoTunerRunner to generate a bootstrap report after tuning. [1] [2]

Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me>

Fixes NVIDIA#1509

- Add a new report to generate appId-bootstrap.conf
- Add more context to the AutoTuner results
- Add a yaml file to initialize the list of the bootstrap entries
Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me>
@amahussein amahussein added feature request New feature or request core_tools Scope the core module (scala) labels Jan 28, 2025
@amahussein amahussein self-assigned this Jan 28, 2025
@amahussein
Copy link
Collaborator Author

This is submitted as a draft until we decide if we want to address some of the issues in followups or not:

  • bootstrap.conf may include entries that were not resolved by the autoTuner.
  • improve the initialization so that properties that are not applicable gets disabled (based on platform..etc)
  • Apply the same improvements on the Profiler AutoTuner
  • Use the OpTuningType to generate a usable tuner.log . For example, now we can track how each individiual netry was modified.
  • Use default values , or initialization in the yaml file instead of hardcoding them.

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
@parthosa parthosa marked this pull request as ready for review January 29, 2025 01:15
Copy link
Collaborator

@cindyyuanjiang cindyyuanjiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @amahussein! Some minor nits and questions.
General question: for FILL_IN_VALUE recommendations, I don't see explanation in comments for why Autotuner cannot set it. Do we expect users to know a good starting value or they can omit/research it? I am just wondering what is the thought behind this change. Thanks!

Copy link
Collaborator Author

@amahussein amahussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @amahussein! Some minor nits and questions. General question: for FILL_IN_VALUE recommendations, I don't see explanation in comments for why Autotuner cannot set it. Do we expect users to know a good starting value or they can omit/research it? I am just wondering what is the thought behind this change. Thanks!

Thanks @cindyyuanjiang !
Yes, the part of explaining why each entry is not set is yet to be implemented.
This PR provides the skeleton to allow such improvements. Depending on the bandwidth and priorities, each tuning strategy needs to be re-visited to add such explanation to the user.

I discussed with @parthosa offline some future work that need to be done.
Since there is a large backlog of AutoTuner requests, I will leave it to him to issue followups as he finds suits the priorities.
I do not think that filing those ideas as issues now will benefit us because it will a swamp that makes it harder to navigate AutoTuner issues.

Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me>
Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me>
Copy link
Collaborator Author

@amahussein amahussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mattahrens
I addressed the comments and populated the descriptions.

Copy link
Collaborator

@cindyyuanjiang cindyyuanjiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @amahussein! LGTM.

@amahussein amahussein merged commit 46ec5f0 into NVIDIA:dev Jan 31, 2025
13 checks passed
@amahussein amahussein deleted the rapids-tools-1509-b branch January 31, 2025 03:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core_tools Scope the core module (scala) feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Generate a separate config file with only tuning recommendations from the AutoTuner
4 participants