Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Serve] Controller errors when attempting to deserialize error from deployment graph task #35677

Closed
shrekris-anyscale opened this issue May 23, 2023 · 0 comments · Fixed by #36744
Assignees
Labels
bug Something that is supposed to be working; but isn't P1 Issue that should be fixed within a few weeks serve Ray Serve Related Issue

Comments

@shrekris-anyscale
Copy link
Contributor

What happened + What you expected to happen

When a Serve application raises an error while constructing the graph, the Serve controller attempts to deserialize the error to include it in the serve status output. However, if the error itself has a third-party dependency, the controller raises and error since it cannot deserialize the dependency. See reproduction script.

The controller should never deserialize the error. Instead, it should simply store the error string.

Versions / Dependencies

Ray on the latest master.

Reproduction script

A repro can be found in the test_dag repo.

# broken_dag.py

from ray import serve

class CustomException(Exception):

    def __init__(self, *args):
        import matplotlib
        self.version = matplotlib.__version__

raise CustomException("This is a custom exception!")

@serve.deployment
def f(*args):
    return "Hi there!"

app = f.bind()
  • Serve config yaml
# config.yaml

import_path: "broken_dag:app"
runtime_env:
  working_dir: "https://github.com/ray-project/test_dag/archive/81c1912273e512a3756e359e08900f0e5b2e1811.zip"
  pip: ["matplotlib"]
  • serve status output:
% serve status

name: default
app_status:
  status: DEPLOY_FAILED
  message: |+
    Unexpected error occured while deploying application 'default':
    Traceback (most recent call last):
      File "/Users/shrekris/Desktop/ray/python/ray/serve/_private/application_state.py", line 202, in update
        ray.get(finished[0])
      File "/Users/shrekris/Desktop/ray/python/ray/_private/auto_init_hook.py", line 18, in auto_init_wrapper
        return fn(*args, **kwargs)
      File "/Users/shrekris/Desktop/ray/python/ray/_private/client_mode_hook.py", line 103, in wrapper
        return func(*args, **kwargs)
      File "/Users/shrekris/Desktop/ray/python/ray/_private/worker.py", line 2532, in get
        raise value
    ray.exceptions.RaySystemError: System error: Failed to unpickle serialized exception
    traceback: Traceback (most recent call last):
      File "/Users/shrekris/Desktop/ray/python/ray/exceptions.py", line 46, in from_ray_exception
        return pickle.loads(ray_exception.serialized_exception)
      File "/tmp/ray/session_2023-05-23_13-16-57_538656_15390/runtime_resources/working_dir_files/https_github_com_ray-project_test_dag_archive_81c1912273e512a3756e359e08900f0e5b2e1811/broken_dag.py", line 6, in __init__
        import matplotlib
    ModuleNotFoundError: No module named 'matplotlib'

    The above exception was the direct cause of the following exception:

    Traceback (most recent call last):
      File "/Users/shrekris/Desktop/ray/python/ray/_private/serialization.py", line 385, in deserialize_objects
        obj = self._deserialize_object(data, metadata, object_ref)
      File "/Users/shrekris/Desktop/ray/python/ray/_private/serialization.py", line 291, in _deserialize_object
        return RayError.from_bytes(obj)
      File "/Users/shrekris/Desktop/ray/python/ray/exceptions.py", line 40, in from_bytes
        return RayError.from_ray_exception(ray_exception)
      File "/Users/shrekris/Desktop/ray/python/ray/exceptions.py", line 49, in from_ray_exception
        raise RuntimeError(msg) from e
    RuntimeError: Failed to unpickle serialized exception

  deployment_timestamp: 1684873025.7622342
deployment_statuses: []

Issue Severity

High: It blocks me from completing my task.

@shrekris-anyscale shrekris-anyscale added bug Something that is supposed to be working; but isn't P1 Issue that should be fixed within a few weeks serve Ray Serve Related Issue labels May 23, 2023
@shrekris-anyscale shrekris-anyscale self-assigned this May 23, 2023
architkulkarni pushed a commit that referenced this issue Jun 28, 2023
)

The controller runs the deploy_serve_application task to build and run the user's Serve app. If the task raises an error, the controller will try to deserialize it when it calls ray.get() on the task's reference. If the error contains a custom dependency, the deserialization will fail, and the controller will log an error about the deserialization failing instead of the actual error itself.

This change catches any error in the deploy_serve_application task itself and returns it as a string to the controller. The controller then simply logs the string.

Related issue number
Closes #35677 and #35678.

---------

Signed-off-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
arvind-chandra pushed a commit to lmco/ray that referenced this issue Aug 31, 2023
…-project#36744)

The controller runs the deploy_serve_application task to build and run the user's Serve app. If the task raises an error, the controller will try to deserialize it when it calls ray.get() on the task's reference. If the error contains a custom dependency, the deserialization will fail, and the controller will log an error about the deserialization failing instead of the actual error itself.

This change catches any error in the deploy_serve_application task itself and returns it as a string to the controller. The controller then simply logs the string.

Related issue number
Closes ray-project#35677 and ray-project#35678.

---------

Signed-off-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
Signed-off-by: e428265 <arvind.chandramouli@lmco.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't P1 Issue that should be fixed within a few weeks serve Ray Serve Related Issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant