[Data] Enable optimizer by default #34937

scottjlee · 2023-05-02T04:25:09Z

Why are these changes needed?

This PR enables the execution plan optimizer in Ray Data, and fixes some bugs discovered via unit tests. We will ensure that Data CI and release tests are healthy before merging.

Related issue number

Closes #32596

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Scott Lee <sjl@anyscale.com>

Signed-off-by: Hao Chen <chenh1024@gmail.com>

ericl · 2023-05-16T20:25:50Z

python/ray/data/__init__.py

+# Used for caching user-defined callable classes.
+# Key the class, value is the object.
+# see make_callable_class_concurrent in python/ray/data/_internal/execution/util.py.
+# The reason why this is a dict is because we may fuse multiple map operators into one.


Do we actually fuse multiple actors into one? I don't think we do that / should do that.

Using a dict is a bit concerning since we could leak closures over time, compared to a singleton that is overwritten.

ericl · 2023-05-16T20:27:30Z

python/ray/data/_internal/planner/plan_read_op.py

-                            input_files=[],
-                            exec_stats=None,
-                        ),
+                        read_task.get_metadata(),


Can you make sure this calls cleaned_metadata(read_task) in legacy_compat.py to implement the same logic?

ericl · 2023-05-16T20:28:53Z

python/ray/data/tests/test_optimize.py

    _assert_has_stages(ds._plan._last_optimized_stages, ["ReadRange->Map"])


 def test_optimize_reorder(ray_start_regular_shared):
+    # The ReorderRandomizeBlocksRule optimizer rule collapses RandomizeBlocks operators,


Shouldn't it still show up in the stage names?

Signed-off-by: Hao Chen <chenh1024@gmail.com>

This reverts commit 0f375ff.

Signed-off-by: Hao Chen <chenh1024@gmail.com>

raulchen · 2023-05-23T03:49:07Z

As this PR is getting huge, I'll split it into a few small PRs: the first two are #35621 and #35648.

Other known issues not fixed in these 2 PRs include:

Actors with the same class & args can be fused when not enabling optimizer, but not fused when enabling optimizer (need to identify necessity).
__repr__() and stats() are different.
Some config flags (i.e., context.optimize_fuse_read_stages) are not respected.

## Why are these changes needed? This PR is the 1st part of enabling optimizer by default (split from #34937). - Fix inconsistent behaviors for the Read op by reusing the `ReadTask`s from `read_api.py` in `plan_read_op.py`. - Support cache in `materialize`. Note, some changes in this PR may not be covered in this PR's CI, as the optimizer must be enabled to cover them. But they are already verified in #34937 CI). ## Related issue number #32596

## Why are these changes needed? This PR is the 2nd part of enabling optimizer by default (split from #34937). It fixes the following issues: - `ray_remote_args` not correctly set for a fused operator. - `init_fn` not correctly set for a fused operator. - Allowed cases for fusion (see `operator_fusion.py`). - `ray_remote_args` compatibility check for fusion. - Limit operator not handled when converting logical operator to physical. - Other small fixes. Note, some changes in this PR may not be covered in this PR's CI, as the optimizer must be enabled to cover them. But they are already verified in #34937's CI). ## Related issue number #32596

…5648) ## Why are these changes needed? This PR is the 1st part of enabling optimizer by default (split from ray-project#34937). - Fix inconsistent behaviors for the Read op by reusing the `ReadTask`s from `read_api.py` in `plan_read_op.py`. - Support cache in `materialize`. Note, some changes in this PR may not be covered in this PR's CI, as the optimizer must be enabled to cover them. But they are already verified in ray-project#34937 CI). ## Related issue number ray-project#32596

## Why are these changes needed? This PR is the 2nd part of enabling optimizer by default (split from ray-project#34937). It fixes the following issues: - `ray_remote_args` not correctly set for a fused operator. - `init_fn` not correctly set for a fused operator. - Allowed cases for fusion (see `operator_fusion.py`). - `ray_remote_args` compatibility check for fusion. - Limit operator not handled when converting logical operator to physical. - Other small fixes. Note, some changes in this PR may not be covered in this PR's CI, as the optimizer must be enabled to cover them. But they are already verified in ray-project#34937's CI). ## Related issue number ray-project#32596

…5648) ## Why are these changes needed? This PR is the 1st part of enabling optimizer by default (split from ray-project#34937). - Fix inconsistent behaviors for the Read op by reusing the `ReadTask`s from `read_api.py` in `plan_read_op.py`. - Support cache in `materialize`. Note, some changes in this PR may not be covered in this PR's CI, as the optimizer must be enabled to cover them. But they are already verified in ray-project#34937 CI). ## Related issue number ray-project#32596 Signed-off-by: e428265 <arvind.chandramouli@lmco.com>

## Why are these changes needed? This PR is the 2nd part of enabling optimizer by default (split from ray-project#34937). It fixes the following issues: - `ray_remote_args` not correctly set for a fused operator. - `init_fn` not correctly set for a fused operator. - Allowed cases for fusion (see `operator_fusion.py`). - `ray_remote_args` compatibility check for fusion. - Limit operator not handled when converting logical operator to physical. - Other small fixes. Note, some changes in this PR may not be covered in this PR's CI, as the optimizer must be enabled to cover them. But they are already verified in ray-project#34937's CI). ## Related issue number ray-project#32596 Signed-off-by: e428265 <arvind.chandramouli@lmco.com>

Scott Lee and others added 24 commits May 1, 2023 21:21

enable optimizer by default

ea82377

Signed-off-by: Scott Lee <sjl@anyscale.com>

Merge branch 'master' into enable-optimizer

c3f8c1a

Signed-off-by: Scott Lee <sjl@anyscale.com>

Merge branch 'master' into enable-optimizer

5250e0e

Signed-off-by: Scott Lee <sjl@anyscale.com>

Merge branch 'master' into enable-optimizer

b52b920

Signed-off-by: Scott Lee <sjl@anyscale.com>

update ownedership to false for from_xxx ops

bd69c5b

Signed-off-by: Scott Lee <sjl@anyscale.com>

Merge branch 'master' into enable-optimizer

39637f2

Signed-off-by: Scott Lee <sjl@anyscale.com>

wip

a0872da

Signed-off-by: Scott Lee <sjl@anyscale.com>

process up to test_consumption.py

cf15e38

Signed-off-by: Scott Lee <sjl@anyscale.com>

process up to test_optimize.py

637613e

Signed-off-by: Scott Lee <sjl@anyscale.com>

process up to test_stats.py

8558a42

Signed-off-by: Scott Lee <sjl@anyscale.com>

finish going through tests

d014b70

Signed-off-by: Scott Lee <sjl@anyscale.com>

Merge branch 'master' into enable-optimizer

6ad0a10

Signed-off-by: Scott Lee <sjl@anyscale.com>

undo bad format

e60121b

Signed-off-by: Scott Lee <sjl@anyscale.com>

undo more bad lint

79efaae

Signed-off-by: Scott Lee <sjl@anyscale.com>

use correct metadata for PhysicalOp generate from Read logical ops

1adab23

Signed-off-by: Scott Lee <sjl@anyscale.com>

Merge branch 'master' into enable-optimizer

caaf6bd

Signed-off-by: Scott Lee <sjl@anyscale.com>

fix repartition with num_output_blocks=1

5f3d8ac

Signed-off-by: Scott Lee <sjl@anyscale.com>

pass ray remote args to shuffle op

af4c626

Signed-off-by: Scott Lee <sjl@anyscale.com>

progress

8f075c7

Signed-off-by: Scott Lee <sjl@anyscale.com>

Merge branch 'master' into enable-optimizer

5c44653

Signed-off-by: Scott Lee <sjl@anyscale.com>

Merge branch 'master' into enable-optimizer

e40faef

fix fuse init_fn

8e03c24

Signed-off-by: Hao Chen <chenh1024@gmail.com>

do not fuse ActorPoolMap -> AllToAll

3ff3880

Signed-off-by: Hao Chen <chenh1024@gmail.com>

remove TODOs

bacc79c

Signed-off-by: Hao Chen <chenh1024@gmail.com>

raulchen assigned ericl, amogkam and bveeramani May 15, 2023

raulchen marked this pull request as ready for review May 15, 2023 21:46

raulchen requested review from ericl and scv119 as code owners May 15, 2023 21:46

ericl reviewed May 16, 2023

View reviewed changes

ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label May 16, 2023

raulchen added 16 commits May 16, 2023 15:48

fix read metadata

705aa97

Signed-off-by: Hao Chen <chenh1024@gmail.com>

Merge branch 'master' into enable-optimizer

8d7addb

fix ray remote args check

b74dc05

Signed-off-by: Hao Chen <chenh1024@gmail.com>

don't fuse actor -> actor map

b745463

Signed-off-by: Hao Chen <chenh1024@gmail.com>

comment

15590da

Signed-off-by: Hao Chen <chenh1024@gmail.com>

lint

b11ec7d

Signed-off-by: Hao Chen <chenh1024@gmail.com>

fix cloudpickle

d4260d6

Signed-off-by: Hao Chen <chenh1024@gmail.com>

fuse actor -> actor

0f375ff

Signed-off-by: Hao Chen <chenh1024@gmail.com>

Revert "fuse actor -> actor"

39119b4

This reverts commit 0f375ff.

support materialize

b01647b

Signed-off-by: Hao Chen <chenh1024@gmail.com>

comment and lint

a0affbe

Signed-off-by: Hao Chen <chenh1024@gmail.com>

Read op stores read tasks

098a4e1

Signed-off-by: Hao Chen <chenh1024@gmail.com>

materialize generates the same number of blocks

7a2e7b3

Signed-off-by: Hao Chen <chenh1024@gmail.com>

fix

7e1433f

Signed-off-by: Hao Chen <chenh1024@gmail.com>

fix Read

32abf15

Signed-off-by: Hao Chen <chenh1024@gmail.com>

fix args

8625b5d

Signed-off-by: Hao Chen <chenh1024@gmail.com>

This was referenced May 22, 2023

[Data] [2/N] Enable optimizer: fix fusion #35621

Merged

[Data][1/N] Enable optimizer: fix read and materialize #35648

Merged

raulchen closed this May 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Data] Enable optimizer by default #34937

[Data] Enable optimizer by default #34937

scottjlee commented May 2, 2023 •

edited

Loading

ericl May 16, 2023 •

edited

Loading

ericl May 16, 2023

ericl May 16, 2023

raulchen commented May 23, 2023 •

edited

Loading

[Data] Enable optimizer by default #34937

[Data] Enable optimizer by default #34937

Conversation

scottjlee commented May 2, 2023 • edited Loading

Why are these changes needed?

Related issue number

Checks

ericl May 16, 2023 • edited Loading

Choose a reason for hiding this comment

ericl May 16, 2023

Choose a reason for hiding this comment

ericl May 16, 2023

Choose a reason for hiding this comment

raulchen commented May 23, 2023 • edited Loading

scottjlee commented May 2, 2023 •

edited

Loading

ericl May 16, 2023 •

edited

Loading

raulchen commented May 23, 2023 •

edited

Loading