Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[serve] Fix Router race condition (#46864)
Fix race condition in Router when `update_deployment_config` is called but `_metrics_manager` instance is not ready. Which resulted in error message: AttributeError: 'Router' object has no attribute '_metrics_manager' ## Why are these changes needed? While using Ray Serve, we occasionally encountered this error, which caused Ray cluster failed to start. It turns out the ordering of initialization inside `Route.__init__` might cause this race condition in distributed environments. ```(ProxyActor pid=368) Exception in callback <function LongPollClient._process_update.<locals>.chained at 0x7f6b11644a60> (ProxyActor pid=368) handle: <Handle LongPollClient._process_update.<locals>.chained> (ProxyActor pid=368) Traceback (most recent call last): (ProxyActor pid=368) File "uvloop/cbhandles.pyx", line 61, in uvloop.loop.Handle._run (ProxyActor pid=368) File "/usr/local/lib/python3.10/site-packages/ray/serve/_private/long_poll.py", line 171, in chained (ProxyActor pid=368) callback(arg) (ProxyActor pid=368) File "/usr/local/lib/python3.10/site-packages/ray/serve/_private/router.py", line 416, in update_deployment_config (ProxyActor pid=368) self._metrics_manager.update_deployment_config( (ProxyActor pid=368) AttributeError: 'Router' object has no attribute '_metrics_manager' ``` ## Related issue number N/A ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Integration test: this fixed the issue in our environment --------- Signed-off-by: tungh2 <105205092+tungh2@users.noreply.github.com>
- Loading branch information