-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Data][3/N] Enable optimizer: fix stats and RandomizeBlocks #35952
Conversation
# callable object. | ||
return fn.__class__.__name__ | ||
except AttributeError as e: | ||
logging.error("Failed to get name of UDF %s: %s", fn, e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we use DatasetLogger
here for consistency? https://github.com/ray-project/ray/blob/master/python/ray/data/_internal/dataset_logger.py
@@ -374,7 +374,7 @@ def test_map_batches_basic(ray_start_regular_shared, tmp_path, restore_data_cont | |||
|
|||
def test_map_batches_extra_args(shutdown_only, tmp_path): | |||
ray.shutdown() | |||
ray.init(num_cpus=2) | |||
ray.init(num_cpus=3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this needed as a result of code changes in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because we are no longer fusing actors.
…ect#35952) ## Why are these changes needed? This is the last PR for enabling optimizer by default. It contains the following changes: - Fix DatasetStats related issues, including: - Map op names not including function names. - Read op names not including data source names. - RandomizeBlocks's name. - `generate_randomize_blocks_fn` returning empty stats, making it being skipped in summary. - Dropping support for fusing 2 actor-based map ops. - In the old backend, we fuse 2 actors only if they are the same class and have the same constructor args. This is not useful in practice. Test changes about `num_cpus` are related to this. - Fix the issue that `ReorderRandomizeBlocksRule` will modify the operator's input_dependencies in place. This is a bug because the operator instance might be shared by multiple Datasets. ## Related issue number Closes ray-project#32596
…ect#35952) ## Why are these changes needed? This is the last PR for enabling optimizer by default. It contains the following changes: - Fix DatasetStats related issues, including: - Map op names not including function names. - Read op names not including data source names. - RandomizeBlocks's name. - `generate_randomize_blocks_fn` returning empty stats, making it being skipped in summary. - Dropping support for fusing 2 actor-based map ops. - In the old backend, we fuse 2 actors only if they are the same class and have the same constructor args. This is not useful in practice. Test changes about `num_cpus` are related to this. - Fix the issue that `ReorderRandomizeBlocksRule` will modify the operator's input_dependencies in place. This is a bug because the operator instance might be shared by multiple Datasets. ## Related issue number Closes ray-project#32596 Signed-off-by: e428265 <arvind.chandramouli@lmco.com>
Why are these changes needed?
This is the last PR for enabling optimizer by default. It contains the following changes:
generate_randomize_blocks_fn
returning empty stats, making it being skipped in summary.num_cpus
are related to this.ReorderRandomizeBlocksRule
will modify the operator's input_dependencies in place. This is a bug because the operator instance might be shared by multiple Datasets.Related issue number
Closes #32596
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.