[serve] Don't change deployment status when autoscaling #36520

zcin · 2023-06-16T22:34:17Z

Why are these changes needed?

The deployment state UPDATING should only be used during redeployment. Right now the state is updating during autoscaling, which can be confusing for users. This PR makes it so that the state doesn't change during autoscaling. This usually means that a deployment's status will remain HEALTHY while it's autoscaling.

Related issue number

Closes #35948

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>

edoakes · 2023-06-20T19:31:57Z

python/ray/serve/_private/deployment_state.py

@@ -1289,6 +1289,19 @@ def _set_target_state(self, target_info: DeploymentInfo) -> None:

        logger.info(f"Deploying new version of deployment {self._name}.")

+    def _set_target_state_autoscaling(self, num_replicas: int) -> None:
+        """Update the target number of replicas based on an autoscaling decision."""


please update this comment to indicate why we need a separate codepath and what is different from _set_target_state

Added comments!

edoakes · 2023-06-20T19:33:00Z

python/ray/serve/tests/test_autoscaling_policy.py

+def wait_for_condition_raise(
+    condition_predictor, timeout=10, retry_interval_ms=100, **kwargs: Any
+):
+    """Wait until a condition is met. If exception occurs, raise it."""


can you just make this behavior a flag passed to existing wait_for_condition? would reduce the likelihood of someone else rewriting this themselves in the future

Good point, I've added this as a parameter to wait_for_condition

Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>

sihanwang41

sync offline, one change needed to always set version (new_info.version = self._target_state.version.code_version) . approval for unblocking! LGTM

zcin · 2023-06-21T23:32:24Z

Thanks @sihanwang41, I've addressed your comment!

@edoakes Tests are passing, should be ready to merge!

…36520) The deployment state UPDATING should only be used during redeployment. Right now the state is updating during autoscaling, which can be confusing for users. This PR makes it so that the state doesn't change during autoscaling. This usually means that a deployment's status will remain HEALTHY while it's autoscaling. Signed-off-by: e428265 <arvind.chandramouli@lmco.com>

zcin added 2 commits June 16, 2023 15:33

don't change status when autoscaling

b1f95e0

Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>

improve tests

93d13de

Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>

zcin marked this pull request as ready for review June 20, 2023 06:20

zcin requested a review from a team June 20, 2023 14:28

zcin self-assigned this Jun 20, 2023

edoakes approved these changes Jun 20, 2023

View reviewed changes

zcin added 4 commits June 20, 2023 16:17

add parameter to wait_for_condition and add comments

1e51adb

Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>

Merge branch 'master' into autoscale-status

2e007f6

Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>

Merge branch 'master' into autoscale-status

c96cfef

Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>

try

8d3f27b

Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>

sihanwang41 approved these changes Jun 21, 2023

View reviewed changes

edoakes merged commit 648c4aa into ray-project:master Jun 22, 2023

akshay-anyscale mentioned this pull request Jul 21, 2023

Add service deployment instructions to stable diffusion template #37645

Closed

8 tasks

zcin deleted the autoscale-status branch August 25, 2023 17:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[serve] Don't change deployment status when autoscaling #36520

[serve] Don't change deployment status when autoscaling #36520

zcin commented Jun 16, 2023 •

edited

Loading

edoakes Jun 20, 2023

zcin Jun 21, 2023

edoakes Jun 20, 2023

zcin Jun 21, 2023

sihanwang41 left a comment

zcin commented Jun 21, 2023

[serve] Don't change deployment status when autoscaling #36520

[serve] Don't change deployment status when autoscaling #36520

Conversation

zcin commented Jun 16, 2023 • edited Loading

Why are these changes needed?

Related issue number

Checks

edoakes Jun 20, 2023

Choose a reason for hiding this comment

zcin Jun 21, 2023

Choose a reason for hiding this comment

edoakes Jun 20, 2023

Choose a reason for hiding this comment

zcin Jun 21, 2023

Choose a reason for hiding this comment

sihanwang41 left a comment

Choose a reason for hiding this comment

zcin commented Jun 21, 2023

zcin commented Jun 16, 2023 •

edited

Loading