Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[serve] add early return when there are still in progress http health check #36461

Merged
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions python/ray/serve/_private/http_state.py
Original file line number Diff line number Diff line change
Expand Up @@ -191,6 +191,14 @@ def update(self):
f"{self._node_id} after {DEFAULT_HEALTH_CHECK_TIMEOUT_S}s"
)
self.try_update_status(HTTPProxyStatus.UNHEALTHY)
else:
# This return is important to not trigger a new health check when
# there is an in progress health check. When the health check object
# is still in progress and before the timeout is triggered, we will
# do an early return here to signal the completion of this update call
# and to prevent another health check object from recreated in the
# code below.
return

# If there's no active in-progress health check and it has been more than 10
# seconds since the last health check, perform another health check.
Expand Down
49 changes: 47 additions & 2 deletions python/ray/serve/tests/test_http_state.py
Original file line number Diff line number Diff line change
Expand Up @@ -418,9 +418,54 @@ async def check_health(self):

@patch("ray.serve._private.http_state.DEFAULT_HEALTH_CHECK_TIMEOUT_S", 0.1)
@patch("ray.serve._private.http_state.PROXY_HEALTH_CHECK_PERIOD_S", 0.1)
def test_http_proxy_state_update_healthy_check_health_always_timeout():
def test_http_proxy_state_check_health_always_timeout_timtout_eq_period():
"""Test calling update method on HTTPProxyState when the proxy state is HEALTHY and
when the ready call always timed out.
when the ready call always timed out and health check timeout and period equals.

The proxy state started with STARTING. After update is called and ready call
succeeded, the status will change to HEALTHY. After the next few check_health calls
takes very long time to finished, the status will eventually change to UNHEALTHY
after all retries have exhausted.
"""

@ray.remote(num_cpus=0)
class NewMockHTTPProxyActor:
async def ready(self):
return json.dumps(["mock_worker_id", "mock_log_file_path"])

async def check_health(self):
await asyncio.sleep(100)

proxy_state = _create_http_proxy_state(proxy_actor_class=NewMockHTTPProxyActor)

# Continuously trigger update. The status should change from STARTING to HEALTHY
# when ready.
wait_for_condition(
condition_predictor=_update_and_check_proxy_status,
state=proxy_state,
status=HTTPProxyStatus.HEALTHY,
)
first_check_time = proxy_state._last_health_check_time

# Continuously trigger update and status should change to UNHEALTHY.
wait_for_condition(
condition_predictor=_update_and_check_proxy_status,
state=proxy_state,
status=HTTPProxyStatus.UNHEALTHY,
)

# Ensure the check time have changed since the last update
assert first_check_time != proxy_state._last_health_check_time

# Ensure _consecutive_health_check_failures is correct
assert proxy_state._consecutive_health_check_failures == 3


@patch("ray.serve._private.http_state.DEFAULT_HEALTH_CHECK_TIMEOUT_S", 1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the name of previous test case and added new test case testing the same exact behavior. The only difference is the new one has a longer DEFAULT_HEALTH_CHECK_TIMEOUT_S than PROXY_HEALTH_CHECK_PERIOD_S in the setup

@patch("ray.serve._private.http_state.PROXY_HEALTH_CHECK_PERIOD_S", 0.1)
def test_http_proxy_state_check_health_always_timeout_timtout_greater_than_period():
"""Test calling update method on HTTPProxyState when the proxy state is HEALTHY and
when the ready call always timed out and health check timeout greater than period.

The proxy state started with STARTING. After update is called and ready call
succeeded, the status will change to HEALTHY. After the next few check_health calls
Expand Down