[Data] [2/N] Enable optimizer: fix fusion #35621

raulchen · 2023-05-22T18:57:35Z

Why are these changes needed?

This PR is the 2nd part of enabling optimizer by default (split from #34937).
It fixes the following issues:

ray_remote_args not correctly set for a fused operator.
init_fn not correctly set for a fused operator.
Allowed cases for fusion (see operator_fusion.py).
ray_remote_args compatibility check for fusion.
Limit operator not handled when converting logical operator to physical.
Other small fixes.

Note, some changes in this PR may not be covered in this PR's CI, as the optimizer must be enabled to cover them. But they are already verified in #34937's CI).

Related issue number

#32596

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

amogkam

thanks!

amogkam · 2023-05-23T23:26:44Z

python/ray/data/_internal/execution/legacy_compat.py

-    if DataContext.get_current().optimizer_enabled:
+    if (
+        DataContext.get_current().optimizer_enabled
+        # TODO(hchen): Remove this when all operators support local plan.


Suggested change

# TODO(hchen): Remove this when all operators support local plan.

# TODO(hchen): Remove this when all operators support logical plan.

amogkam · 2023-05-23T23:31:14Z

python/ray/data/_internal/logical/rules/operator_fusion.py

@@ -330,18 +356,29 @@ def fused_all_to_all_transform_fn(
        return op


-def _are_remote_args_compatible(up_args, down_args):
+def _are_remote_args_compatible(prev_args, next_args):


would it be possible to add unit tests for this function?

Good idea. Added in test_read_map_batches_operator_fusion_compatible_remote_args and test_read_map_batches_operator_fusion_incompatible_remote_args

amogkam · 2023-05-23T23:35:16Z

python/ray/data/_internal/planner/plan_from_arrow_op.py

@@ -22,7 +22,7 @@ def get_input_data() -> List[RefBundle]:
        get_metadata = cached_remote_fn(get_table_block_metadata)
        metadata = ray.get([get_metadata.remote(t) for t in op._tables])
        ref_bundles: List[RefBundle] = [
-            RefBundle([(table_ref, block_metadata)], owns_blocks=True)
+            RefBundle([(table_ref, block_metadata)], owns_blocks=False)


what are all the owns_blocks changes for?

The blocks are put into object store inside the FromArrowRefs op, so this RefBundle shouldn't own the blocks. This was a bug. This function is used for the optimizer code path only.

amogkam · 2023-05-23T23:36:36Z

python/ray/data/_internal/planner/random_shuffle.py

        if map_transform_fn:
            upstream_map_fn = lambda block: map_transform_fn(block, ctx)  # noqa: E731
+            # If there is a fused upstream operator,
+            # also use the ray_remote_args from the fused upstream operator.
+            ray_remote_args = ctx.upstream_map_ray_remote_args


is there a test for this in the later PR?

yes, test_map_batches_extra_args if I remember correctly.

amogkam · 2023-05-23T23:37:08Z

python/ray/data/tests/test_all_to_all.py

@@ -1730,7 +1730,7 @@ def test_random_shuffle_check_random(shutdown_only):
            prev = x


-def test_random_shuffle_with_custom_resource(ray_start_cluster):
+def test_random_shuffle_with_custom_resource(ray_start_cluster, use_push_based_shuffle):


doesn't seem like use_push_based_shuffle is actually being used by the test?

use_push_based_shuffle is actually a fixture that will set ctx.use_push_based_shuffle to True/False and run the test twice. The first time I saw this, I was confused as well. But we do have many such usages already.

amogkam

thanks!

ollie-iterators · 2023-05-24T18:25:48Z

The failing tests should be fixed by merging in recent changes from the main branch

Signed-off-by: Hao Chen <chenh1024@gmail.com>

scottjlee · 2023-05-30T19:29:51Z

python/ray/data/_internal/logical/rules/operator_fusion.py

-            up_op, MapOperator
+        if not (
+            (
+                isinstance(up_op, TaskPoolMapOperator)


nit: should we combine the two cases into a single isinstance check on down_op?

also i recall discussing that we will potentially not support this for Actor case?

Having 2 separate cases looks more clear to me. But I don't have strong preference.
We are dropping support for actor->actor case. task->actor is still supported.

Signed-off-by: Hao Chen <chenh1024@gmail.com>

ericl · 2023-05-30T22:45:27Z

python/ray/data/_internal/planner/plan_from_numpy_op.py

@@ -27,7 +27,7 @@ def get_input_data() -> List[RefBundle]:
        blocks, metadata = map(list, zip(*res))
        metadata = ray.get(metadata)
        ref_bundles: List[RefBundle] = [
-            RefBundle([(block, block_metadata)], owns_blocks=True)
+            RefBundle([(block, block_metadata)], owns_blocks=False)


Shouldn't this still be True since ndarray_to_block will create a copy?

Right. Fixed.

ericl · 2023-05-30T22:45:41Z

python/ray/data/_internal/planner/plan_from_items_op.py

@@ -48,7 +48,7 @@ def get_input_data() -> List[RefBundle]:
            )
            block_ref_bundle = RefBundle(
                [(ray.put(block), block_metadata)],
-                owns_blocks=True,
+                owns_blocks=False,


True, since this is created by the ray.put().

ericl · 2023-05-30T22:54:12Z

python/ray/data/_internal/execution/interfaces.py

@@ -237,6 +237,10 @@ class TaskContext:
    # an AllToAllOperator with an upstream MapOperator.
    upstream_map_transform_fn: Optional["MapTransformFn"] = None

+    # The Ray remote arguments of the fused upstream MapOperator.
+    # This should be set if upstream_map_transform_fn is set.
+    upstream_map_ray_remote_args: Dict[str, Any] = None


Hmm, it's not ideal to pass this at runtime. Ideally, the optimizer would rewrite the downstream op's ray remote args to this value, instead of having each operator need to properly decide which of the two args to use and looking at the context.

I agree with you. But currently it's hard to avoid this. For most operators, we are already doing the way you mentioned.
upstream_map_ray_remote_args, along with upstream_map_transform_fn, are used only for RandomShuffle. Because the corresponding AllToAllOperator physical op itself doesn't directly do the shuffle. instead, it uses ExchangeTaskScheduler to launch new tasks to do the shuffle. That's why we need this ad-hoc handling here. I'll add a TODO here.

update: see generate_random_shuffle_fn for more details

Signed-off-by: Hao Chen <chenh1024@gmail.com>

ericl · 2023-05-31T01:12:32Z

LGTM pending tests.

## Why are these changes needed? This PR is the 2nd part of enabling optimizer by default (split from ray-project#34937). It fixes the following issues: - `ray_remote_args` not correctly set for a fused operator. - `init_fn` not correctly set for a fused operator. - Allowed cases for fusion (see `operator_fusion.py`). - `ray_remote_args` compatibility check for fusion. - Limit operator not handled when converting logical operator to physical. - Other small fixes. Note, some changes in this PR may not be covered in this PR's CI, as the optimizer must be enabled to cover them. But they are already verified in ray-project#34937's CI). ## Related issue number ray-project#32596

## Why are these changes needed? This PR is the 2nd part of enabling optimizer by default (split from ray-project#34937). It fixes the following issues: - `ray_remote_args` not correctly set for a fused operator. - `init_fn` not correctly set for a fused operator. - Allowed cases for fusion (see `operator_fusion.py`). - `ray_remote_args` compatibility check for fusion. - Limit operator not handled when converting logical operator to physical. - Other small fixes. Note, some changes in this PR may not be covered in this PR's CI, as the optimizer must be enabled to cover them. But they are already verified in ray-project#34937's CI). ## Related issue number ray-project#32596 Signed-off-by: e428265 <arvind.chandramouli@lmco.com>

raulchen requested review from ericl, scv119, c21, amogkam, scottjlee and bveeramani as code owners May 22, 2023 18:57

raulchen changed the title ~~[Data] [1/N] Enable optimizer by default~~ [Data] [1/N] Enable optimizer: fix fusion May 23, 2023

raulchen mentioned this pull request May 23, 2023

[Data] Enable optimizer by default #34937

Closed

8 tasks

raulchen force-pushed the enable-optimizer-1 branch from 1c5ea1b to 4203e78 Compare May 23, 2023 03:56

amogkam reviewed May 23, 2023

View reviewed changes

raulchen added 7 commits May 24, 2023 11:46

patch all

c192455

Signed-off-by: Hao Chen <chenh1024@gmail.com>

revert some changes

a98877f

Signed-off-by: Hao Chen <chenh1024@gmail.com>

revert plan_read_op.py

5996f01

Signed-off-by: Hao Chen <chenh1024@gmail.com>

refine

758e369

Signed-off-by: Hao Chen <chenh1024@gmail.com>

refine

6cd3c5a

Signed-off-by: Hao Chen <chenh1024@gmail.com>

skip

3432b74

Signed-off-by: Hao Chen <chenh1024@gmail.com>

unit test

1b9acf8

Signed-off-by: Hao Chen <chenh1024@gmail.com>

raulchen force-pushed the enable-optimizer-1 branch from 5933363 to 1b9acf8 Compare May 24, 2023 18:46

raulchen assigned ericl, amogkam and scottjlee May 30, 2023

scottjlee approved these changes May 30, 2023

View reviewed changes

fix _plan_from_pandas_refs_op

385162f

Signed-off-by: Hao Chen <chenh1024@gmail.com>

ericl reviewed May 30, 2023

View reviewed changes

ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label May 30, 2023

raulchen added 2 commits May 30, 2023 17:28

fix owns_blocks

0d38d0c

Signed-off-by: Hao Chen <chenh1024@gmail.com>

add note

f3d43be

Signed-off-by: Hao Chen <chenh1024@gmail.com>

raulchen removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label May 31, 2023

ericl approved these changes May 31, 2023

View reviewed changes

ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label May 31, 2023

raulchen changed the title ~~[Data] [1/N] Enable optimizer: fix fusion~~ [Data] [2/N] Enable optimizer: fix fusion May 31, 2023

raulchen added 2 commits May 30, 2023 20:20

Merge branch 'master' into enable-optimizer-1

3b7bc9a

Merge branch 'master' into enable-optimizer-1

95de138

raulchen merged commit 6d18218 into ray-project:master May 31, 2023

raulchen deleted the enable-optimizer-1 branch May 31, 2023 20:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Data] [2/N] Enable optimizer: fix fusion #35621

[Data] [2/N] Enable optimizer: fix fusion #35621

raulchen commented May 22, 2023 •

edited

Loading

amogkam left a comment

amogkam May 23, 2023

amogkam May 23, 2023

raulchen May 24, 2023

amogkam May 23, 2023

raulchen May 24, 2023

amogkam May 23, 2023

raulchen May 24, 2023

amogkam May 23, 2023

raulchen May 24, 2023

amogkam left a comment

ollie-iterators commented May 24, 2023

scottjlee May 30, 2023

raulchen May 30, 2023

ericl May 30, 2023

raulchen May 31, 2023

ericl May 30, 2023

ericl May 30, 2023

raulchen May 31, 2023 •

edited

Loading

ericl commented May 31, 2023

	# TODO(hchen): Remove this when all operators support local plan.
	# TODO(hchen): Remove this when all operators support logical plan.

[Data] [2/N] Enable optimizer: fix fusion #35621

[Data] [2/N] Enable optimizer: fix fusion #35621

Conversation

raulchen commented May 22, 2023 • edited Loading

Why are these changes needed?

Related issue number

Checks

amogkam left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amogkam left a comment

Choose a reason for hiding this comment

ollie-iterators commented May 24, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raulchen May 31, 2023 • edited Loading

Choose a reason for hiding this comment

ericl commented May 31, 2023

raulchen commented May 22, 2023 •

edited

Loading

raulchen May 31, 2023 •

edited

Loading