-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Data] Enable execution plan optimizer for supported Ray Data APIs #36294
Conversation
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Failing CI and release tests look unrelated to this PR to me. |
python/ray/data/_internal/plan.py
Outdated
@@ -136,6 +136,8 @@ def __init__( | |||
# determined by the config at the time it was created. | |||
self._context = copy.deepcopy(DataContext.get_current()) | |||
|
|||
self._skip_optimizer_pipeline = not self._context.optimizer_enabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, what about naming this variable "_generated_from_pipeline"? This will be easier to reason about. Because it only indicates whether this plan is generated from a pipeline, and doesn't nothing to do with the optimizer here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, updated.
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG
python/ray/data/_internal/plan.py
Outdated
# Whether the corresponding dataset is generated from a pipeline. | ||
# Currently, when this is True, this skips the new execution plan optimizer. | ||
# TODO(scottjlee): remove this once we remove DatasetPipeline. | ||
self._generated_from_pipeline = not self._context.optimizer_enabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be true regardless of the optimizer flag. Because in _get_execution_dag
, we will check the optimizer flag.
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
Signed-off-by: Scott Lee <sjl@anyscale.com>
…ay-project#36294) This PR further expands support for general use of the execution plan optimizer, including all currently existing Ray Data APIs, with the major exception of `DatasetPipeline`. In the case where a DatasetPipeline is used, the `ExecutionPlan` will include a flag to skip the new optimizer path and fall back to the legacy plan optimizer. In addition, this PR implements several small patches to fully enable the new execution plan optimizer on existing Ray Data APIs, such as data lineage serialization. Signed-off-by: Scott Lee <sjl@anyscale.com> Signed-off-by: e428265 <arvind.chandramouli@lmco.com>
Why are these changes needed?
This PR further expands support for general use of the execution plan optimizer, including all currently existing Ray Data APIs, with the major exception of
DatasetPipeline
. In the case where a DatasetPipeline is used, theExecutionPlan
will include a flag to skip the new optimizer path and fall back to the legacy plan optimizer.In addition, this PR implements several small patches to fully enable the new execution plan optimizer on existing Ray Data APIs, such as data lineage serialization.
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.