Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Autoscaler] Make AUTOSCALER_MAX_RESOURCE_DEMAND_VECTOR_SIZE configur…
…able (#50176) <!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? <!-- Please give a short summary of the change and the problem this solves. --> This change makes `AUTOSCALER_MAX_RESOURCE_DEMAND_VECTOR_SIZE` configurable. Power users may wish to submit more than 1000 tasks at once and have the autoscaler respond by immediately scaling up the requisite number of nodes. To make this happen, `AUTOSCALER_MAX_RESOURCE_DEMAND_VECTOR_SIZE` must be increased beyond the 1000 cap; otherwise, the demand from most tasks is ignored and upscaling is slow. ## Related issue number <!-- For example: "Closes #1234" --> Limited `AUTOSCALER_MAX_RESOURCE_DEMAND_VECTOR_SIZE` causes the issue experienced in #45373. This PR provides a workaround. After merging this PR, if a user wants, say, 10k tasks to trigger quick upscaling, then the user can increase `AUTOSCALER_MAX_RESOURCE_DEMAND_VECTOR_SIZE` past 10k. ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( I tested it experimentally by increasing `AUTOSCALER_MAX_RESOURCE_DEMAND_VECTOR_SIZE` to 100k and submitting 10k tasks; upscaling happened smoothly. --------- Signed-off-by: Dmitri Gekhtman <dmitri.gekhtman@getcruise.com> Co-authored-by: Dmitri Gekhtman <dmitri.gekhtman@getcruise.com> Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>
- Loading branch information