Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Serve] Return error string from deploy_serve_application task #36744

Merged
merged 10 commits into from
Jun 28, 2023

Conversation

shrekris-anyscale
Copy link
Contributor

@shrekris-anyscale shrekris-anyscale commented Jun 23, 2023

Why are these changes needed?

The controller runs the deploy_serve_application task to build and run the user's Serve app. If the task raises an error, the controller will try to deserialize it when it calls ray.get() on the task's reference. If the error contains a custom dependency, the deserialization will fail, and the controller will log an error about the deserialization failing instead of the actual error itself.

This change catches any error in the deploy_serve_application task itself and returns it as a string to the controller. The controller then simply logs the string.

Related issue number

Closes #35677 and #35678.

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
      • This change adds a new test to test_controller.py.

Signed-off-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
Signed-off-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
Signed-off-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
Signed-off-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
@shrekris-anyscale shrekris-anyscale changed the title [WIP] [Serve] Return error string from deploy_serve_application task [Serve] Return error string from deploy_serve_application task Jun 26, 2023
@@ -846,6 +846,8 @@ def deploy_serve_application(
name: application name. If specified, application will be deployed
without removing existing applications.
route_prefix: route_prefix. Define the route path for the application.
Returns:
Returns None if no error is raised. Otherwise, returns error message.
"""
try:
from ray import serve
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are these delayed imports here btw?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably move call_app_builder_with_args_if_necessary out of the task, but moving the others would cause the controller to depend on api.py (to access serve.run() and serve.build()) which seems circular.

Comment on lines 68 to 71
"working_dir": (
"https://github.com/ray-project/test_dag/"
"archive/e552be913ffb7fb5e36b1e63d97f5c354d45e219.zip"
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use a local file instead please to avoid an external dependency

you can import from this file as from ray.serve.tests.test_controller import my_app

you can check it isn't serialized by raising an exception in __reduce__ or by including another non-serializable object as an attribute (and also may as well verify it isn't serializable using ray.cloudpickle in the test case)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I'll move it to the Ray repo. I can't put it in test_controller.py though, since I need to raise the Exception in the file itself (outside the deployment), in order for the error to get propagated back to the user.

I'll create a file in test_config_files and import it from there.

Signed-off-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
Copy link
Contributor

@edoakes edoakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, lint failing

Signed-off-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
@shrekris-anyscale
Copy link
Contributor Author

The test failures are unrelated:

Screen Shot 2023-06-27 at 4 09 21 PM
  • Serve tests (test_standalone3) is flaky on master. I retried the test and it succeeded.
Screen Shot 2023-06-27 at 4 05 53 PM

@shrekris-anyscale
Copy link
Contributor Author

@edoakes This PR is almost ready to merge. The test_object_store_metrics failure looks a bit suspicious to me since it's not very flaky. I'm rerunning it. We can merge it afterwards.

@shrekris-anyscale
Copy link
Contributor Author

The retry succeeded. @edoakes @architkulkarni This change is ready to merge.

@architkulkarni architkulkarni merged commit 0fe1149 into ray-project:master Jun 28, 2023
shrekris-anyscale added a commit that referenced this pull request Jun 28, 2023
shrekris-anyscale added a commit to shrekris-anyscale/ray that referenced this pull request Jun 28, 2023
…ask (ray-project#36744)"

This reverts commit 0fe1149.

Signed-off-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
arvind-chandra pushed a commit to lmco/ray that referenced this pull request Aug 31, 2023
…-project#36744)

The controller runs the deploy_serve_application task to build and run the user's Serve app. If the task raises an error, the controller will try to deserialize it when it calls ray.get() on the task's reference. If the error contains a custom dependency, the deserialization will fail, and the controller will log an error about the deserialization failing instead of the actual error itself.

This change catches any error in the deploy_serve_application task itself and returns it as a string to the controller. The controller then simply logs the string.

Related issue number
Closes ray-project#35677 and ray-project#35678.

---------

Signed-off-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
Signed-off-by: e428265 <arvind.chandramouli@lmco.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Serve] Controller errors when attempting to deserialize error from deployment graph task
3 participants