Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Introduce spill_on_unavailable option for soft NodeAffinitySchedulingStrategy #34224

Merged
merged 5 commits into from
Apr 11, 2023

Conversation

jjyao
Copy link
Collaborator

@jjyao jjyao commented Apr 10, 2023

Why are these changes needed?

Introduce a private _spill_on_unavailable semantic for soft NodeAffinitySchedulingStrategy. In 2.4 this will only be used by Dataset and we will figure out how to properly expose this as a public API in 2.5.

Related issue number

Closes #34170

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

jjyao added 4 commits April 10, 2023 13:47
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
@jjyao jjyao marked this pull request as ready for review April 11, 2023 04:23
"""

def __init__(self, node_id: str, soft: bool):
def __init__(self, node_id: str, soft: bool, _spill_on_unavailable: bool = False):
# This will be removed once we standardize on node id being hex string.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a TODO to promote this to public API?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #34283 to track this.

Copy link
Contributor

@clarng clarng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, let's also verify this fixes the release test

Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
@@ -166,6 +166,7 @@ def __call__(self, args):
args["scheduling_strategy"] = NodeAffinitySchedulingStrategy(
self.locs[self.i],
soft=True,
_spill_on_unavailable=True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we don't allow to spill to another node when options.locality_with_output==True? locality_with_output is not the default behavior, and when users opt in to enable locality_with_output, they probably don't want any surprise that task running on an arbitrary node. cc @ericl for opinion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Locality is always best effort--- so this is fine. And less locality is preferred in order to be work preserving in all cases.

Copy link
Collaborator

@zhe-thoughts zhe-thoughts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved for picking into 2.4. cc @clarng

@jianoaix
Copy link
Contributor

Can you test the 100tb shuffle loadtest?

@jjyao
Copy link
Collaborator Author

jjyao commented Apr 11, 2023

dataset_shuffle_push_based_random_shuffle_100tb succeeded: https://buildkite.com/ray-project/release-tests-pr/builds/34544#018770ad-310d-4c5f-ba36-1ec2433c0497

@jjyao
Copy link
Collaborator Author

jjyao commented Apr 11, 2023

Failed tests are unrelated.

@jjyao jjyao merged commit fd6b99a into ray-project:master Apr 11, 2023
@jjyao jjyao deleted the jjyao/soft branch April 11, 2023 20:28
jjyao added a commit that referenced this pull request Apr 11, 2023
…edulingStrategy (#34224)

Introduce a private _spill_on_unavailable semantic for soft NodeAffinitySchedulingStrategy.

Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
jjyao added a commit that referenced this pull request Apr 12, 2023
…edulingStrategy (#34224) (#34285)

Introduce a private _spill_on_unavailable semantic for soft NodeAffinitySchedulingStrategy.

Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
elliottower pushed a commit to elliottower/ray that referenced this pull request Apr 22, 2023
…edulingStrategy (ray-project#34224)

Introduce a private _spill_on_unavailable semantic for soft NodeAffinitySchedulingStrategy.

Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
Signed-off-by: elliottower <elliot@elliottower.com>
ProjectsByJackHe pushed a commit to ProjectsByJackHe/ray that referenced this pull request May 4, 2023
…edulingStrategy (ray-project#34224)

Introduce a private _spill_on_unavailable semantic for soft NodeAffinitySchedulingStrategy.

Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
Signed-off-by: Jack He <jackhe2345@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[data] release test failure: dataset_shuffle_push_based_random_shuffle_100tb
6 participants