-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[data] Use default scheduling strategy to fix failing dataset_shuffle_push_based_sort_1tb test #36722
Merged
stephanie-wang
merged 4 commits into
ray-project:master
from
stephanie-wang:dataset-sort-test
Jun 28, 2023
Merged
[data] Use default scheduling strategy to fix failing dataset_shuffle_push_based_sort_1tb test #36722
stephanie-wang
merged 4 commits into
ray-project:master
from
stephanie-wang:dataset-sort-test
Jun 28, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Running the test here. |
ericl
approved these changes
Jun 22, 2023
scottjlee
approved these changes
Jun 22, 2023
feaf4ae
to
c0944ca
Compare
e2ebeca
to
a20b673
Compare
Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>
a20b673
to
6d1badd
Compare
FYI, branch cut is Friday so it would be easier if we can merge it by then. Thankkks |
8 tasks
8 tasks
arvind-chandra
pushed a commit
to lmco/ray
that referenced
this pull request
Aug 31, 2023
…_push_based_sort_1tb test (ray-project#36722) Fixes a bug introduced in ray-project#36290 where SPREAD scheduling was getting used in many Datasets tasks. This led to poor locality, which we can see in the sort test failure described in ray-project#36449. This PR removes the scheduling strategy arg, so the behavior is now to use Ray's default scheduling strategy by default, unless the scheduling strategy is explicitly overridden by an operator implementation. Unfortunately there is not a good way to add a smaller regression test right now; long-term we should collect the metrics from Ray core about what ray.remote args were passed to tasks, and check that these are correct. Related issue number Closes ray-project#36449. --------- Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu> Signed-off-by: e428265 <arvind.chandramouli@lmco.com>
8 tasks
raulchen
pushed a commit
that referenced
this pull request
Oct 2, 2023
As #39871 indicated, The [current Data Internals page](https://docs.ray.io/en/latest/data/data-internals.html#scheduling) has a section on Scheduling, which confusingly states that both SPREAD and DEFAULT are the default scheduling strategies used. This PR summarized the scheduling strategy used by Ray Data as follows: 1. By default, the scheduling strategy is set to Default Hybrid Policy([code](https://github.com/ray-project/ray/blob/9c143f63233d5cbde8a6943db31b91fb3b05f017/python/ray/data/_internal/remote_fn.py#L26), [related PR](#36722)). 2. Read operation overrides the scheduling strategy to Spread Policy if the file is not located locally; otherwise, it is scheduled to the current node([code](https://github.com/Yicheng-Lu-llll/ray/blob/9c143f63233d5cbde8a6943db31b91fb3b05f017/python/ray/data/read_api.py#L338)). 3. Map operation overrides the scheduling strategy to Spread Policy if total argument size <50MB([code](https://github.com/ray-project/ray/blob/9c143f63233d5cbde8a6943db31b91fb3b05f017/python/ray/data/_internal/execution/operators/map_operator.py#L213), [related PR](#36290)). Slack discussion: https://ray-distributed.slack.com/archives/C02PHB3SQHH/p1695756535614819 --------- Signed-off-by: Yicheng-Lu-llll <luyc58576@gmail.com>
vymao
pushed a commit
to vymao/ray
that referenced
this pull request
Oct 11, 2023
As ray-project#39871 indicated, The [current Data Internals page](https://docs.ray.io/en/latest/data/data-internals.html#scheduling) has a section on Scheduling, which confusingly states that both SPREAD and DEFAULT are the default scheduling strategies used. This PR summarized the scheduling strategy used by Ray Data as follows: 1. By default, the scheduling strategy is set to Default Hybrid Policy([code](https://github.com/ray-project/ray/blob/9c143f63233d5cbde8a6943db31b91fb3b05f017/python/ray/data/_internal/remote_fn.py#L26), [related PR](ray-project#36722)). 2. Read operation overrides the scheduling strategy to Spread Policy if the file is not located locally; otherwise, it is scheduled to the current node([code](https://github.com/Yicheng-Lu-llll/ray/blob/9c143f63233d5cbde8a6943db31b91fb3b05f017/python/ray/data/read_api.py#L338)). 3. Map operation overrides the scheduling strategy to Spread Policy if total argument size <50MB([code](https://github.com/ray-project/ray/blob/9c143f63233d5cbde8a6943db31b91fb3b05f017/python/ray/data/_internal/execution/operators/map_operator.py#L213), [related PR](ray-project#36290)). Slack discussion: https://ray-distributed.slack.com/archives/C02PHB3SQHH/p1695756535614819 --------- Signed-off-by: Yicheng-Lu-llll <luyc58576@gmail.com> Signed-off-by: Victor <vctr.y.m@example.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
@author-action-required
The PR author is responsible for the next step. Remove tag to send back to the reviewer.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Why are these changes needed?
Fixes a bug introduced in #36290 where SPREAD scheduling was getting used in many Datasets tasks. This led to poor locality, which we can see in the sort test failure described in #36449.
This PR removes the scheduling strategy arg, so the behavior is now to use Ray's default scheduling strategy by default, unless the scheduling strategy is explicitly overridden by an operator implementation.
Unfortunately there is not a good way to add a smaller regression test right now; long-term we should collect the metrics from Ray core about what
ray.remote
args were passed to tasks, and check that these are correct.Related issue number
Closes #36449.
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.