You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When creating and running async workflow, Id expect it to be ran in background as in docs it states that:
The lifetime of the workflow is not coupled with the driver. If the driver exits, the workflow will continue running in the background of the cluster.
However, when the python scripts spawns workflow and exits, the workflow fails. I can find in the logs something like this:
53The object's owner has exited. This is the Python worker that first created the ObjectRef via .remote() or ray.put(). Check cluster logs (/tmp/ray/session_latest/logs/*05000000ffffffffffffffffffffffffffffffffffffffffffffffff* at IP address 127.0.0.1) for more information about the Python worker failure.
Workflow manager is called as detached, so it should be kept alive as I understand it.
What is even more confusing is that Ray Dashboard in "Jobs" shows that the job was successful.
Attaching reproduction script, called twice in the span of 30 sec will show that first workflow submitted failed (could comment out create_workflow function to not spam with new ones)
Before the script is called for the first time, I used
ray start --head
--temp-dir="$HOME/ray"
--storage="$HOME/ray_storage"
to spawn local cluster
It doesn't matter if i use ray.init with proper local address (commented out) or with 'auto'
Versions / Dependencies
Using pure ray 2.0 on Mac M1
Reproduction script
Hint: If I remove time.sleep(20) from multiply function, workflow passes successfully
The text was updated successfully, but these errors were encountered:
SebastianMorawiec
added
bug
Something that is supposed to be working; but isn't
triage
Needs triage (eg: priority, bug/not-bug, and owning component)
labels
Oct 12, 2022
SebastianMorawiec
changed the title
[Workflow] Workflow fails if python script calling it exists
[Workflow] Workflow fails if python script calling it exits
Oct 12, 2022
hora-anyscale
added
P1
Issue that should be fixed within a few weeks
and removed
triage
Needs triage (eg: priority, bug/not-bug, and owning component)
labels
Nov 14, 2022
What happened + What you expected to happen
When creating and running async workflow, Id expect it to be ran in background as in docs it states that:
The lifetime of the workflow is not coupled with the driver. If the driver exits, the workflow will continue running in the background of the cluster.
However, when the python scripts spawns workflow and exits, the workflow fails. I can find in the logs something like this:
53The object's owner has exited. This is the Python worker that first created the ObjectRef via
.remote()
orray.put()
. Check cluster logs (/tmp/ray/session_latest/logs/*05000000ffffffffffffffffffffffffffffffffffffffffffffffff*
at IP address 127.0.0.1) for more information about the Python worker failure.Workflow manager is called as detached, so it should be kept alive as I understand it.
What is even more confusing is that Ray Dashboard in "Jobs" shows that the job was successful.
Attaching reproduction script, called twice in the span of 30 sec will show that first workflow submitted failed (could comment out create_workflow function to not spam with new ones)
Before the script is called for the first time, I used
ray start --head
--temp-dir="$HOME/ray"
--storage="$HOME/ray_storage"
to spawn local cluster
It doesn't matter if i use ray.init with proper local address (commented out) or with 'auto'
Versions / Dependencies
Using pure ray 2.0 on Mac M1
Reproduction script
Hint: If I remove time.sleep(20) from multiply function, workflow passes successfully
Issue Severity
High: It blocks me from completing my task.
The text was updated successfully, but these errors were encountered: